What is the most complicated AI robot

An interview with robot researcher Peter Dürr: What other reasons for everyday robots fail

Bringing sensor technology together with artificial intelligence: that's what Peter Dürr is working on as Head of Sony AI Zurich. Dürr has been involved in robotics for over 15 years; he did his doctorate on the subject at EPFL Lausanne and later researched aerial drones at Sony in Tokyo. c’t talked to him about robot development.

c’t: Mr. Dürr, how long will it be before you can buy a robot that loads and clears the dishwasher for, say, 2000 euros?

Peter Dürr: Such predictions are very difficult. If you go back a bit in history, the people who looked at artificial intelligence in the 1950s thought everything would be solved in ten years.

c’t: Just like with self-driving cars, a few years ago you also believed that the time would come soon.

Dürr: Right. So I'm holding back on making predictions, but I think there are a lot of technologies coming together right now that make us very positive. We think that many things will be possible that have been difficult so far.

c’t: What are currently the biggest obstacles with household robots, for example?

Dürr: On the one hand, there is perception. Such a household robot is not in an environment like an industrial robot, where everything is clearly defined and where everything stays exactly as it was yesterday, down to the millimeter. There are people who walk around and move furniture. And who don't always put the dishes in the cupboards right away. The robot with its sensors must constantly dynamically understand the state of its environment.

Now we come to the field of artificial intelligence; the robot has to make decisions that humans consider useful and helpful. And then the robot must also be able to implement these decisions mechanically. In all of these three areas - perception, AI and mechanics - there are hurdles that have not yet been overcome.

c’t: In which of these three areas has the most to happen?

Dürr: You shouldn't make such a sharp distinction. You yourself gave the example of the self-driving car. An experienced driver could probably remotely control a car if he or she saw a live camera image. For a robot, on the other hand, that would be very difficult. But we are working hard on the sensor technology: If you look at an image sensor, it has so far been made for people - it virtually replaces a person's eye, it is there to produce images for people. It's not designed to give robots the information they need to solve these problems. At Sony we are now working on new camera technology: We keep an eye out for machines and not for people.

c’t: What does that mean in concrete terms? That the infrared filter is left out?

Dürr: Yes, for example. Another example are event-based cameras instead of conventional sensors based on intensity images. The image you see during a video conference: This is created by the fact that the image sensor is exposed to light for a certain period of time, integrating it during this time and measuring how great the intensity is at each pixel. This has the advantage that we can display a picture at 30 Hertz - it looks like a moving picture to us.

But what you can also do: expose the photodiodes on the image sensor to the light constantly and look at the changes in intensity instead of the intensity. We then no longer have a moving image for people, but extremely finely resolved information about changes in intensity as a data stream with event data - which people would not be able to interpret at all. The advantage is that we no longer have to wait 33 milliseconds for the next piece of information, but can in principle perceive the world in microsecond resolution. And with this information you can control a robot that reacts much faster than if you first had to evaluate an image.

c’t: How much of the robotic sensor technology do you have to develop yourself and how much can you simply buy? A household robot probably needs sensors similar to, for example, a car with driver assistance.

Dürr: You can buy a lot today and there are also many excellent manufacturers. It is extremely important to combine the development of sensors and the development of AI. In this way you can pick up sensor information very early on and ignore what you don't need. Last year we released the IMX500, an image sensor with two layers: on top you have the pixels that measure an intensity image, and just below there is a logic layer that can run AI algorithms. You then have a normal picture for humans, but you also have a neural network that determines what can be seen in the picture; for example "Three people with mouth and nose protection". This information only exists for a short time, the image itself is directly discarded - the sensor then basically does not need an output at all to output the image. This not only has data protection advantages, it also has technical advantages: If you only get the data you really need from the sensor, you don't even have to process and transfer the other data.

Many of these systems are designed in such a way that they record the image data and copy it somewhere via cellular network or cable. That causes costs and also energy consumption - this is a problem especially with battery-operated devices. Another benefit: If you bring the logic right behind the pixels, you avoid latency. So you can make decisions faster.

c’t: Would you say that data protection should be taken into account during development?