# How does an AI learn its voice

## How we get the AI ​​to follow our wishes

In "Human Compatible" Russell writes that the switch-off problem is "the core of the problem of controlling intelligent systems." “If we can't turn off a machine because it won't let us, we have a serious problem. But if we can, then maybe we can control it in another way. "

The key to this could be that a machine is permanently in the dark about what our preferences are. How, that demonstrates the breaker game, a formal model of the problem in which Harriet the human and Robbie the robot are involved. Robbie decides whether to act on Harriet's behalf - whether or not to book her a nice, expensive hotel room - but is unsure what she would like. Robbie estimates that the payout for Harriet could be between -40 and +60, with the average being +10 (read: Robbie thinks she'll probably like the chic room, but isn't sure). Nothing to do, has a payout of 0. There is a third option: Robbie can ask Harriet whether she would like to continue playing "or would rather switch off", ie take Robbie out of the decision about the hotel reservation. If she lets the robot go, the average expected payout to Harriet will be greater than +10. So Robbie will decide to consult Harriet, even if he runs the risk of being knocked out.

Russell and his co-workers have shown that Robbie will generally always prefer to let them decide what to do, unless he's perfectly sure what Harriet wants. "It turns out that uncertainty about the target is the key element with which we ensure that we can shut down such a machine," writes Russell in "Human Compatible," "even if it is smarter than us."

### People with role models

In Scott Niekum's laboratory at the University of Texas in Austin, no abstract scenarios are played out, but real robots are let loose to learn about preferences. For example, Gemini, the laboratory's two-armed robot, observes how a person is setting a table: To do this, the person places a fork to the left of a plate. At first Gemini cannot say whether forks are always to the left of the plates or always in this particular place on the table; however, new algorithms developed by Niekum and team enable Gemini to learn the correct pattern after a few demonstrations. The researchers focus on how to get AI systems to quantify their own uncertainty about a person's preferences. This would allow them to gauge when they know enough to act safely. "We hypothesize what the true distribution of goals in a person's head could be and what uncertainties arise in relation to such a distribution," says Niekum.

Recently, Niekum and his team found an efficient algorithm that robots can use to learn to perform tasks far better than their human demonstration objects. It can be computationally very demanding for a robotic vehicle to learn driving maneuvers by simply observing human drivers. But Niekum and his colleagues discovered that they can improve the learning process and accelerate it dramatically by showing a robot demonstrations that are sorted according to how well the human is doing them. "The agent can then look at the ranking list and ask himself, 'If that is the ranking list, what does the ranking list explain?'" Says Niekum. "What happens more often when the demonstrations get better, what happens less often?"

The latest version of the learning algorithm, called Bayesian T-REX (for "trajectory-ranked reward extrapolation") searches the ranking demos for patterns that indicate which reward functions people may be following. The algorithm also measures the relative likelihood of various reward functions. A robot running with Bayesian T-REX can efficiently deduce which rules people are likely to follow when setting the table or what one has to do to win an Atari game, "even if it has never seen a perfect demonstration," says Niekum.

### (At least) two major challenges

Russell's ideas "are slowly finding their way into the minds of the AI ​​community," says Yoshua Bengio, scientific director of Mila, a leading institute for AI research in Montreal. The Montreal scientist believes that deep learning, the most powerful tool in the latest AI revolution, can help with implementation. A complex neural network searches vast amounts of data to find patterns. “Of course, more research is needed to make all of this a reality,” says Bengio.

Russell himself sees two major challenges. "One is the fact that our behavior is so far from being rational that it could be very difficult to reconstruct our true preferences behind it." AI systems would have to think about the hierarchy of long-term, medium-term and short-term goals - about the countless preferences and obligations that drive us. If robots are to help us (and avoid serious mistakes), they would have to find their way through the jungle of our unconscious beliefs and unspoken desires.

The second challenge is that human preferences are changing. We change opinions in the course of life, but sometimes also suddenly, depending on the mood or situation. A robot can probably only register all of this with difficulty.

In addition, our actions do not always correspond to our ideals. People can have two mutually exclusive values ​​at the same time. Which of the two should a robot optimize for? How do you avoid that he chooses the worst traits and feeds the darkest instincts (or worse, amplifies them so that he can fulfill them even more easily, such as the Youtube algorithm)? The solution might be for robots to learn what Russell calls "metapreferences": "preferences about what kinds of processes that change our preferences are acceptable to us." Quite a lot for a poor little robot!

### Of the true, the beautiful, the good

Just like him, we would like to know what our preferences are or what they should be. And we too are looking for ways to deal with ambiguities and contradictions. Like the ideal AI, we strive - at least some of us, and perhaps only sometimes - to understand the "idea of ​​the good," as Plato called the goal of all knowledge. AI systems could also get entangled in endless questions and doubts - or remain completely in the off position, too insecure to do anything.

“I'm not assuming that we will soon know exactly what 'the good' is,” says Christiano, “or that we will find perfect answers to our empirical questions. But I hope that the AI ​​systems we are building can answer these questions at least as well as a human can and that they can participate in the same processes that people - at least on their good days - step by step for better answers search."

However, there's a third big problem that didn't make it on Russell's list of concerns: What about bad people's preferences? What's to stop a robot from satisfying the reprehensible goals of its malevolent owner? AI systems are just as good at circumventing bans as some rich people find loopholes in tax laws. Simply forbidding them to commit a crime is unlikely to work.

Or, to paint an even darker picture, what if we're all bad in some way? It was not easy for YouTube to correct its recommendation algorithm, which ultimately did nothing other than orient itself to the omnipresent human needs.

Still, Russell is optimistic. Although more algorithms and game theory research are needed, his gut instinct tells him that the programmers can downgrade the influence of such harmful preferences - a similar approach may even help in raising children or education, says the Berkeley researcher. In other words, by teaching robots to be good, we could find a way to teach us the same thing. "I have a feeling this could be an opportunity to set things in the right direction."