The Allen Institute CEO and computer scientist talks GPT-3's capabilities and limitations, a better AI Turing test, and the real signs we're approaching artificial general intelligence.
In this latest Directors’ Conversation, HAI Denning Family Co-director John Etchemendy’s guest is Oren Etzioni, Allen Institute for Artificial Intelligence CEO, company founder, and professor of computer science. Here the two discuss language prediction model GPT-3, a better approach to an AI Turing test, and the real signs that we’re approaching AGI.
John Etchemendy: Welcome to HAI’s Directors’ Conversations. These are informal conversations that allow us a chance to discuss the latest developments in AI with leaders in the industry. I’m John Etchemendy, HAI’s co-director, and with me today is Oren Etzioni. Oren is a distinguished professor of computer science at the University of Washington, and he’s also the director of the Allen Institute for AI in Seattle. Oren’s a pioneer in the areas of meta search, online comparison shopping, and machine reading, and he now focuses on creating high impact AI that benefits humanity. And that’s a mission that is very much in line with HAI’s own mission, so we’re fellows at arms. So thank you for joining me today, Oren.
Oren Etzioni: Thank you, John. It’s a real pleasure to be here.
Etchemendy: I thought it would be fun to start by talking a bit about GPT-3. There’s been a huge amount of attention, both in the popular technology press and within the AI community about GPT-3. So let me explain first a bit about what GPT-3 is. GPT-3 is a very large language model and it is a model that generates texts, that generates text by taking input of a sequence of words, then it predicts what is the most likely next word and thereby generates text. Now, GPT-3, the reason it’s gotten so much press is that there are many surprising characteristics. Not surprisingly, it does generate a sort of superficially coherent text, quite remarkably coherent text, but it has surprising characteristics. For example, on some of the natural language processing benchmarks, it achieves really surprisingly good performance without being fine-tuned. And that’s given rise to questions about whether GPT-3 actually understands the language. Is it intelligent or is it at least on the path to intelligence?
So I want to ask you, Oren, first of all, just generally, what do you make of GPT-3?
Etzioni: It’s a great question, John, and a very timely one, because the entire AI community and industry and academia in the popular press, in the student halls, they’re all a Twitter, I would say, both literally and figuratively with examples of GPT-3’s prowess, the kind of remarkable things generates even a code, write even a software. And at the same time with discussion of what it cannot do.
Before I answer your question about what I make of it, it is helpful to put it in a historical perspective. For as long as I can remember being in the field that I’ve been doing this for 30 years or so, there’s always some mechanism that ... By mechanism, I mean an AI program, an AI approach algorithm that people are very, very excited about and have, I think somewhat overblown expectations for.
So when I was an undergraduate, it was explanation based learning, never mind what it is, but it was EBL, it was a TLA, a three letter acronym. And then at some point it was expert systems. If you go all the way back to before the inception of the field, it was the logic theorist. The idea that we built a program in the ’50s, early ’60s, that could prove a theorem and Principia Mathematica. If we can prove a logical theorem, then surely human level intelligence is just around the corner. And it’s been a good 50 years that hasn’t happened.
So I think the biggest point to make is that we’re seeing the same thing. It’s déjà vu all over again. We do have a remarkable mechanism and it does exhibit impressive behavior as did the logical theorist. I mean, imagine in the ’50s, being able to automatically prove a theorem. We thought at the time, computers were just good for crunching numbers. And here it is doing something that only highly trained logisticians can do. So here we are in the same situation doing remarkable things, but at the same time, we are just so far short of “a true intelligence.”
Etchemendy: Theorem proving, that is actually my field. I think that was over-hyped. It was clear that it was not, that the performance very quickly exploded. I mean, the difficulty of the problems very quickly exploded. So what about today? What about GPT-3?
Etzioni: The first warning signal should be the notion when something is so simple, all we’re doing with a language model, a generative model is as you said, analyzing a very large amount of text and computing the probability that given what we’ve seen before, what’s the probability that the next word is “architecture” versus “elephant” versus “the.” Now, you do that over a huge amount of text, that doesn’t mean you’re reading, doesn’t mean you’re understanding it, doesn’t mean you’re really representing its contents.
So do we have a program or is GPT-3 able to understand a single sentence in the kind of rich way that you or even a ten-year-old child understands? And the answer is resoundingly no. It’s remarkable what we can do with it, particularly at a superficial level, like generate a fluent text and so on, but we have to be very careful to distinguish these impressive performances, I don’t want to call them trick, these impressive behaviors from a genuine intelligence. Would you want to rely on what GPT-3 generates when you’re formulating foreign policy or when you’re taking the, I don’t know, the SATs? And the answer is no, it has no responsibility. It makes huge errors in a very strong sense.
Etchemendy: So Oren, I was thinking the other day about GPT-3, and about the difference between the number of words that GPT-3 has processed in its training, compared to the number of words that a human receives or processes in a day or a year, a lifetime. If you think about it, so GPT-3 is trained on 570 billion megabytes. And that is roughly speaking 57 billion, billion words. So that’s the training set.
If you think about what a human, a human probably in a human’s lifetime, 70 years, processes probably about a half a billion words, maybe a billion, let’s say a billion. So when you think about it, GPT-3 has been trained on 57 billion times the number of words that a human in his or her lifetime will ever perceive. And given that, when you see GPT-3’s performance, it does produce remarkably coherent text, usually, not always, but usually, but then it makes, as you pointed out, it makes huge mistakes. And it makes sometimes grammatical mistakes, oftentimes somatic mistakes that are quite laughable. When you think about the difference in the amount of training, it seems to imply that there’s something we’re not going to get to actual intelligence or actual understanding of language with this kind of approach. Something’s wrong about what we’re doing. So anyway, I want to throw that out there and see what you think about that.
Etzioni: Yeah. So John, I think there are several threads in what you said that are worth highlighting. The first one I would say is about just data efficiency. So it’s clear that what’s happening in our heads is far, far more data efficient than what’s happening with our deep learning systems. And again, people point to evolution, you have to defect that into account, et cetera, et cetera.
But what’s remarkable is that people can even learn from a single example. Kids hear a word once and they can already start to use it in context, recognize it, et cetera. So that’s point one. But even more so, we learn very interactively. Even though we’re exposed to a lot of words, we don’t just read them and assimilate the probabilities. And so it’s really clear that what people are doing and what GPT-3 or deep learning in general is doing, is very, very different. And the question we need to ask ourselves is, is that an alternative route to intelligence, right? It’s not going to be like people, but neither is a Boeing 747 like a bird, right? They’re both engaged in flight, but very different technologies, very different specifications and requirements.
Etchemendy: Although they both really depend on aerodynamics, there are a lot of things that are the same. As you say, a lot of things that are different. One last thing on GPT-3, and that is, what are your thoughts about potential abuses, particularly when open AI released GPT-2, they said they were not releasing the system because of the potential abuses that it could be put to. Does it worry you for example, that there are systems or that there’s a system that is so good at producing apparently coherent text?
Etzioni: Yeah, I do think there’s a very important reason to be worried and I think that it goes a lot broader than GPT-3 or even texts. So I wrote a piece for the Harvard Business Review a while back talking about AI-based forgery. So there’s the term floating around deep fakes, which means we can see images produced by the machine that are not real. Even videos, it’s starting to happen that video and audio that sounds completely genuine, isn’t. So the fact of the matter is when you’re contacted online and you get an email message, you get a news article, you get even an audio or video, you really don’t know whether that’s genuine or a forgery. There’ve been documented cases of criminals using that to persuade people to transfer funds. You hear your boss with his inimitable accent saying, “I need the money right away, go John.” So you follow their concerns.
So I think that there’s a very major problem and of course, we saw it in the previous election with the bots in social media. We have an election around the corner where, again, both news stories, misinformation, disinformation, and bots engaging with people in nefarious ways. So I would say that, this is a bit of a dramatic way of putting it, but really the fabric of reality is under siege and we need to figure out proactively technologies and methods of dealing with them.
Etchemendy: I agree completely. So let me change topics for a bit. You’ve written several pieces actually on ethics in AI and on incorporating ethics into AI, including a paper by that very title that I happen to use last year in a recent undergraduate seminar that I taught.
So I want to talk to you about that, the paper. That paper, you focused on autonomous vehicles and specifically self-driving cars. First of all, I’m curious why of all the ethical issues that come up in AI, you chose to focus on that? I should say by the way that it was co-authored with your father, right?
Etzioni: Yes. My father Amitai Etzioni was the first author and really the driving force behind that, I was the AI part and he was the ethics part. I don’t consider myself an ethicist or a philosopher.
The answer to your question without getting into the whole piece, which I’m happy to discuss is actually a very simple one. Whenever I talk to people about AI, about autonomous vehicles, self-driving cars, the conversation inevitably turns to what’s called the trolley problem, to this ethical dilemma, of if a car is driving on the road, it has a choice of running over one person or four people, an old lady or a Nobel laureate, there’s many different variations. But that ethical dilemma seems to really tantalize and even scare people.
Etchemendy: So I have to say that one of the reasons that I picked the paper was that you say in the paper that we shouldn’t focus too much on trolley problems, they are really edge cases and this is a view that I also have or had. I mean, one of the interesting things that one of my students pointed out is that once you get into autonomous vehicles because of their processing power and potentially their ability to sense the environment better than the average driver, we may in fact encounter more trolley problems than we would with just a human driver, because we can calculate various alternatives and we’ll have to make choices between those alternatives.
But let me get back to your paper. Now let me describe and you correct me if I’m wrong, but the ultimate conclusion of the paper was that self-driving cars should be designed really to first of all, obey the law and you say that obeying the law that handles most of the ethical questions that an autonomous vehicle will encounter. Then for other ethical issues that are independent of the laws, not constrained by the laws, you advocate developing what you call ethics spots that observe the users, ethical preferences from other sources of evidence, and then implement or guide the car accordingly. You give the example of if you’re a member of a Greenpeace, maybe you will then refuel at refueling stations that use a lot of high percentages of ethanol or something like that.
I want to ask you about that conclusion and there are a couple of challenges I want to put to you. First of all, I think it’s kind of naive to think that legal behavior corresponds to ethical behavior, that there’s a direct connection between the laws and obeying the laws and behaving ethically. I think the examples, you can come up with lots of examples where swerving into oncoming traffic or swerving out of your lane is actually the ethical thing to do to avoid causing an accident for example.
Now, secondly, I’d like to hear more from you, but I find it kind of implausible that ethics bots are going to be able to draw useful, relevant conclusions that apply to the kinds of ethical quandaries that will be encountered while driving. So your example of the Greenpeace example or whatever it was is a very limited example. I’d like to hear more thoughts about that, why should I believe that that is going to be a useful way to approach things?
Etzioni: So to the first point, it’s definitely not the case that if we just build a legal driver, one that obeys the law, that gets us even 80% of the way there, ethically. So I completely agree that that’s a constraint and it’s a helpful constraint. I also very much agree with you that with the nuances of the real world, designing an ethics bot, that would actually exhibit, act, according to my preferences, even if it has a lot of data is very tricky. So to use your example of energy efficiency, what if there is a place that sells ethanol based fuel, but it’s 40 miles away, it’s a hundred miles away. At some point, if I’m concerned with energy efficiency and climate change and such, I’m still not going to drive a hundred miles just to fill with a different kind of fuel. So there’s all kinds of nuances and subtleties that go into that. Current AI systems are not able to model that.
The paper tries to call for more research in that direction and tries to suggest that the same way that we learn all kinds of things, inductively from examples, which is the current paradigm that GPT-3 and other things operate on. Ethics, or at least approximations to ethics, are not immune to that. But it’s by no means a solved problem.
TLDR, I agree with everything you said. But I do want to step back and make what I think is the most important point and that’s what motivates, I think a lot of the article, which is the thought of autonomous vehicles has a lot of people worried. There’s a worry about jobs, of course, which is a very legitimate one, but there’s a lot of worries about how is that car going to behave? Am I going to be comfortable with all these autonomous vehicles on the road? I think that the appropriate philosophical perspective here is a utilitarian one. We have actually struck a Faustian bargain with transportation technology, that it leads to 40,000 deaths, more than a million injuries on the US highways alone each year. If we’re able to reduce those numbers by rolling out any technology and certainly autonomous vehicle technology over time, then it’s a moral imperative to do that.
Whatever happens with the trolley problem, whatever all these kind of edge cases do. If we can save lives using technology, we really should do that. I think that’s the primary point and that’s the way to think about it. So the article attempts to ward and kind of bracket the discussion of these edge cases because we need to focus on the main event.
Etchemendy: It’s interesting. I also agree with what you’ve just said, but something that’s very interesting is going to happen once we do get an autonomous vehicles on the roads and that is the following, they’re going to make mistakes. So they’re going to be in accidents and perhaps kill a child for example. We will have recordings of exactly what happened and here’s the worry that I have. It can be the case that an autonomous vehicle is statistically far, far safer, far, far safer, but we’re going to have cases where they make mistakes and looking at the mistake, a jury, for example, will be able to say, well, if an attentive human driver had been in control that wouldn’t have happened. Then the question is, how persuasive is the statistical argument that this technology is far, far better compared to the gut feeling that, really, this was a mistake that the AI made, and so the company presumably, should be punished for it. And I’m worried that we will end up as a society going the wrong direction because of that. So rather than go with the statistical fact that this is a much safer technology, we will go with the intuitive gut feeling that we don’t want a mistake made by the car to then end up with a lawsuit, a major, major lawsuit against the car manufacturer. So, some reflection.
Etzioni: Yeah. I’m very worried about that too. I think what you’re saying is we could have a technology that in the aggregate, would save many lives, and its progress, or its proliferation would be retarded by our kind of litigious system and the jury’s kind of visceral response to particular event, rather than thinking about this statistics. My answer to that, or the reason I’m optimistic is I do think it’s going to be a gradual process. So first of all, these full blown autonomous vehicles, we’ve learned in recent years, they’re further away than we think. And again, it’s part of this sometimes overblown expectations. And so we’ll have time to get used to them. And in the meantime, when we have semi-autonomous, different degrees of autonomy like we do in current Teslas or what have you, I think that it’s essential to have clear human responsibility for any mishaps.
So the car has certain capabilities. If a person, the driver, misuses them, then the driver’s at fault. The human driver. And then if the technology doesn’t operate the way the manufacturer specified, the manufacturer could be at fault the same way if a seatbelt or an airbag doesn’t work. I think we do have a rational path forward.
Etchemendy: I’d like to talk about another article that you wrote, which I found really fascinating, and that’s the canary article. So obviously there’s been a lot of talk, a lot of hype about artificial general intelligence, and the possibility of a malevolent AGI, as they call it. So artificial general intelligence is when we get an AI that is as capable as humans, and presumably potentially more capable, more intelligent. You wrote a paper basically saying, “Don’t worry. Don’t worry at this point. And don’t worry because there are certain indications that we will see before we actually get to that point.” And you called these canaries in the coal mine. I don’t know if you can remember. Can you tell me about the canaries that you chose, and why you chose those particular ones?
Etzioni: The context is that there’s a set of very smart people, like Steward Russell, who did his PhD at Stanford, and Nick Bostrom, and others who have been focusing on the fear of AI, as you said, being malevolent and taking over. And they say, this is in some sense the greatest of all risks. The mother of all fears. There are many concerns about AI, but they’re focused on that one because they think potentially it could spell out the extermination of the human race. And other folks like Rod Brooks and myself and many others say these fears are overblown, and actually they’re distracting us from real concerns, like unemployment or privacy, or fairness. Again, Andrew Ng, former Stanford faculty, said worrying about AI turning evil is like worrying about overpopulation on Mars. It’s just too early and it’s too hypothetical. It ignores all the issues.
Another really important point is if we get obsessed with this cataclysmic fear, we might not think about the potential benefits of AI. And I’d love to talk about what some of those are. But, so in that context, I got tired after a while of this speculative back and forth. I think it’s a ways off and other people say, “But it could happen in five years.” How do you know? It’s very hard to prove a negative. I said, “Can we ...” And I’m a scientist. I’m not a philosopher. “Can we take this and put this on a more empirical footing? Can we identify these canaries in the coal mines of AI, these harbingers or trip wires, these warning signals that if they happen, they tell us that AI is a lot closer than we thought. And if they don’t happen, then we feel, look, it’s still just hypothetical. Decades, maybe even centuries out.”
And in that context, probably the most interesting one that I identified, and it’s very closely related to the mission of your institute, is to highlight that the role that humans continue to play in every success of machine learning. In fact, I go as far as to say the machine learning is really a misnomer. When we say the machines learn, it’s kind of like saying that baby penguins fish. Would baby penguins really do is they sit there, and the mom or the dad penguin, they go, they find the fish, they bring it, they chew it up, and they regurgitate it. They spoon-feed morsels to their babies in the nest. That’s not the babies fishing, that’s the parent’s fishing.
Well, the same thing is here with machine learning. We define the problem, we define the representation, we do everything except the quote, “last mile.” The finding the statistical irregularities in the data. And that we give to the machine, and it does a superhuman job at that. So I give an example. If we had a machine learning program that could actually formulate its own machine learning problems, this is something I want to learn, label the data, formulate the loss function, et cetera, et cetera, et cetera. Then that would be a canary in the coal mine of AI.
Etchemendy: Yeah. I think of all the canaries you picked out, that’s the best one. That’s the very best. You also said ... Let me just, one other thing. Besides the canaries that you mentioned, you also said that the Turing test is not a good canary, because if we get an AI that successfully passes the Turing test, then it will already be there. We will already have genuine intelligence. I actually don’t buy that.
I think that the Turing test, there are different ways of interpreting the Turing test, but it is so dependent on the idea, the ability to fool the human participant. So the Turing test, you have a machine, and you see whether or not the machine can fool the interlocutor about whether it’s a machine or whether it’s a real human. And all of the systems that have performed well on the Turing test are systems that are quite obviously designed to fool, to mislead the interlocutor. To change the subject, for example, in order to avoid giving a sensible answer, which it can’t do. So I actually find the Turing test to be sort of artificial.
Now, if what you mean is passing the Turing test for arbitrary lengths of time, then maybe you’re right. But any finite amount of time interacting with such a system, the system may just be using techniques to fool us. And I don’t consider that even an approximation of intelligence. It’s intelligence on the part of the programmers that designed the system. Anyway, so we could talk about the Turing tests.
Etzioni: Just to respond really quickly. I do agree with you that the Turing test as implemented, and John Markoff said this brilliantly, it’s a test of human gullibility. And so yeah, it’s very easy to fool. What I mean by that as an ultimate test of intelligence, and I take that’s what Alan Turing himself meant, is a true test. I don’t necessarily think it requires infinite time, but it does require being careful and methodical and comprehensive. So for example, if somebody came to me and asked me to administer a Turing test, the first thing I would do is I would give the program the SAT. The full SAT, including the essay questions. I would say, “Do well on that. And then we’ll talk about the weather, we’ll talk about movies, we’ll have all that chit-chat where you’re so able to fool me with social gambits.” But if you’re not able to score reasonably well on the SATs, and if you’re not able to make the same mistakes that humans make, and you’re not able to write an essay that doesn’t just sound coherent, but is coherent, and, and, and. So I know how to probe the weak spots of the technology, and that helps. So again, a true full-blooded Turing test, not a-
Etchemendy: I’ll call that the Etzioni test. That is far more elaborate. And there, I think, I probably agree with you. That’s far more elaborate than the setup that Turing described.
Let me go onto a different, a different topic. And a lot of people have talked about the fact that these showcase systems, the most prominent kinds of AI systems that get in the press are based on competitive or adversarial games. Reed Hoffman has said to me many times that unless we want to end up with an adversarial AGI, we better start thinking about how to train cooperation or build cooperation into the system. I was really pleased when I heard that the Allen Institute had recently published a project that is fundamentally based on cooperation. I was wondering if you’d tell us about that. Maybe even show us a little bit of the system.
Etzioni: Sure. Again, you’re absolutely right that the interaction between humans and machines is very fundamental to AI. Of course, that’s the mission of your institute, which I very much applaud. That was part of the message with the canaries example, too, where if you drill down into what’s called machine learning, and kind of this bastion of autonomy is full of human intelligence ... In fact, if we go to what’s considered one of the landmarks of AI, the victory of AlphaGo over Lee Se-dol, an AI program beats the world champion in Go. What I said is, “This is a human victory.”
Just like your point, John, about intelligence of the programmer. It’s this human team at Google DeepMind that did a tremendous job using their technology to defeat Lee Se-dol. We became interested in collaborative games, and a natural game to think about is Pictionary. Because in Pictionary, you’re drawing trying to convey a phrase to me that I’m trying to guess, or the other way around. Now I’ll pause and see if I can bring up an interactive demonstration of the game.
John, what you can hopefully see on the screen here is our version of Pictionary, which we call Iconary, both to not violate anybody’s trademark and also because we make extensive use of icons. But rather than talking about it, let’s play. John, you and I are going to play together with what’s called AllenAI, our teammate. This is an AI program. We’re not playing against it. And in this case to avoid my trying to draw with the mouse, which is going to be a disaster, we’re going to guess. What you’re seeing on the screen here ... Let’s go with easy phrases. We have limited time. What you’re going to see on the screen here is the AI program suggesting to us a phrase. It’s indicating it with a set of icons, and the phrase is blank. Some word, the blank. John I’ll go to you. What do you think the phrase might be based on what the program has drawn?
Etchemendy: Drink the coffee.
Etzioni: Right. Let’s type that in. “Drink the coffee.” Oops. It’s not tolerant of misspellings. Okay. Now I type that in, and I press submit, and that’s my guess. And it instantly comes back and says, “Okay, the coffee is right. But we’ve got an incorrect word here.” Now before we do another guess, one of the fun things I can ask the program the way you can ask a human partner to draw again. And I really don’t know what it’ll draw in response to this incorrect guess. It’s going to try to get us to change the word drink. What it’s done is it’s kind of uncluttered the drawing, and has us focus on the nose and these squiggly lines arising from the coffee. John, did it clarify that for you?
Etchemendy: How about smell the coffee?
Etzioni: Smell the coffee. Yeah. Okay, so, “Smell the coffee.” Yeah. It’s not a stickler for smell versus smelling. Boom, it’s gotten it right. I want to show you one more thing. What I want to explain is that the program, AllenAI, is trained on humans playing each other, and it learns from that how to play with another human. It has a vocabulary that it’s figured out how to map to icons. But one of the really interesting things in that vocabulary is that it doesn’t have a word for doctor. If you look under B here where I’m highlighting on the mouse, the phrase is “doctor giving medicine to the old woman.” And interestingly, it doesn’t have a word for doctor. It doesn’t have a word for an old woman. But it automatically learns to represent the doctor as a person with a stethoscope and some pills, and represent an old woman as a woman with a cane. It’s learned that that helps people guess that.
I’ll also note ... because we often see examples of bias creeping up in machine learning programs. I’ll notice that I politely say "person," but this is really the icon for a man. It’s learned incorrectly, of course, that doctor is associated with man. It’s developed a sexist bias. But sexist bias aside, we’re quite proud of the fact that it’s able to deal with the ... it’s basically able to create novel compositions, and use those to encode words and phrases that were not in its training set, and use those to cooperate with a person to play the game. And just intended, this is really the point of this work. We’ve shown that the techniques that we know and love can be extended to cooperative games and can be extended to games that involve phrases and images. This, of course, goes way beyond games like chess and Go.
Etchemendy: Thank you. That’s terrific. I’m glad to see cooperation as the focus rather than competition.
Etzioni: Yeah. To tie that back to the kind of super intelligence and the canaries article, while the canaries haven’t yet fallen, I’m not suggesting that we rest on our laurels and be completely complacent. I’m thinking the research on cooperation, research on natural language, research on human AI together, human AI interaction is a way that will lead us to a future where hopefully machines and people can work together to benefit humanity.
Etchemendy: Well, I think that’s a wonderful place to end the conversation. As you say, it’s completely in line with our mission here at HAI. I’d like to see the Allen Institute and HAI partner even more closely than we have on certain things. I want to thank you for this really illuminating and fascinating conversation. It was good to get a chance to talk to you about some of these things. I also want to thank our audience for listening in. You can visit our website, or find us on YouTube to listen to other great discussions with leading AI experts. Oren, thank you. And thank you to the audience.
Etzioni: Thank you very much, John. It was real pleasure to have this dialogue we do. And I hope to host you on one at the Allen Institute for AI to also have a chance to hear more about your opinions. I’m afraid we didn’t get to that as much as I would’ve liked. But thank you so much.
Etchemendy: Anytime, Oren.
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.