Can Robots Get a Grip?

Episode Transcript

I still remember the first time I saw a Roomba in action, that little robotic vacuum cleaner that skitters around the house. It was love at first sight. And I thought it was the beginning of what would no doubt be a revolution in home robotics. That was maybe 2003. So more than 20 years ago. The revolution never happened. My guest today, U.C. Berkeley robotics professor Ken Goldberg, has been working on robots for more than 40 years, and one thing he’s learned the hard way is that robots still have a long way to go.

GOLDBERG: We have this incredible ability to adapt to changing conditions. And science has not figured that out. So it’s very hard to reproduce that in robots. Now, humans and animals are existence proof that it can be solved. It’s not like an impossible problem, like time travel. It’s so funny because it’s right in front of us, but we don’t know how to do it.

Welcome to People I (Mostly) Admire, with Steve Levitt.

In spite of the inherent challenges in developing robots, there are some who think things are about to change. Tesla has been working feverishly on a humanoid robot called Optimus. Elon Musk has predicted that Optimus robots could generate more than $10 trillion in revenue long term. Is that realistic? Ken Goldberg has some opinions on the future of robotics, but our conversation today starts in the past, with how he came to build his very first robot.

* * *

GOLDBERG: When I was a kid, I was really into rockets, models, building things like that. And my dad ran this chrome plating company. And chrome plating involves moving these metal parts between these different tanks. A lot of them are very poisonous, like cyanide. And it was very messy work. And so he wanted to build a machine, a robot, that would do this dirty work.

LEVITT: Oh, so really, for real, he wasn’t just messing around. He was trying to be practical about it.

GOLDBERG: Yes. He built this frame with all these motors and stepping motors and switches built into it. And then it had a controller that was basically this — it was almost like a player-piano kind of thing. It was a rotating drum with these little pegs and that would tell it which things to turn on and off. And then he taught me binary numbers and LEDs.

LEVITT: In the end, did he ever implement it at his factory?

GOLDBERG: I have a picture of it, which is really funny because it is in there. But I don’t think it ever really worked. My father was a great tinkerer, very creative. But he had a limited attention span. So he abandoned projects like that.

LEVITT: So what was the state of robots? They were essentially non-existent in the 1960s, weren’t they?

GOLDBERG: Well, okay. There’s this really fascinating history that goes way back. If you want to start at the ancient Egyptians — building machines that looked human-like. Did human-like things. And the word robot doesn’t appear until 1920. And interestingly, it was in a play about robots it was a Czech author who wrote this play, and started that word which actually comes from the root word for work or forced work.

LEVITT: Wait, can I ask you a question? So does a robot have to look like a person? Is that the definition of a robot? Or does it just have to replace human activity?

GOLDBERG: Ah, okay. So, you’re getting right into the thick of the topic here. It’s very controversial. People have all kinds of definitions. Generally, and I am of the camp that a robot does not need to look like a human. That a robot is a machine that’s programmable that moves in the physical world, but does something interesting and useful.

LEVITT: And why would anyone care if it looked like a human? That seems like hubris or something like that.

GOLDBERG: Mm hmm, mm hmm, exactly. So it’s interesting you say that because that was the hubris story that goes all the way back to Pygmalion, Prometheus, Daedalus — they were all guilty of hubris because they were stepping too far in their creativity and they were punished for that.

LEVITT: For mimicking gods or encroaching on god-like territory.

GOLDBERG: Exactly. But it’s very compelling to have something that does have some form factor of a human, humanoid. And that is super popular right now. There is a huge wave, the biggest wave I’ve ever seen in my whole life, of interest in robots, and it’s specifically around the humanoids. And the big proponents of that, namely Elon Musk and Jensen Huang from NVIDIA, are saying that we’re on the verge of achieving this dream, finally, that we’ll have the humanoids. Like, you know, Rosey from The Jetsons coming in to clean up our house.

LEVITT: But she didn’t even look that much like a human.

GOLDBERG: Well, that’s true. She clanked around. In fact, it was very interesting because if you remember the show, she was always breaking down. It was a kind of a running joke that the robot wasn’t very good and was always malfunctioning, which is actually the way real robots are. I always show this video clip when I give talks you see the back flip of the robot and it’s triumphant, you know, stands up like it’s about to dominate and take over. But what you don’t see is the 199 takes where the robot basically falls flat on its face.

LEVITT: Yeah, absolutely. So you started building this early robot in your basement with your dad when you were still a kid. Was it all robots all the way, or did you get off the path once or twice on the way?

GOLDBERG: Oh, I had a lot of interests. I was a rebel as a kid. I was into go-karts and, motorcycles and things like that. But I also was very interested in art. And I mentioned to my mother that I was going to study art in college. And she said, “That’s great. You can be an artist after you finish your engineering degree.”

LEVITT: When did you get into picking things up? Because that’s been a real focus of yours, is trying to build robots that can pick things up. When did that become an interest?

GOLDBERG: Well, so I was studying in Scotland for a junior year abroad, and I took these classes on A.I. and robots

LEVITT: In — we’re talking about the early 80s, right?

GOLDBERG: Exactly. They had a department of A.I. and they had several of the pioneers there. It started from Alan Turing’s work in A.I. So it kind of had grown out of that. And so it’s actually very famous department of A.I. So I was very lucky to be there. Get exposed to all that and then come back. And then I was also lucky, I found this lab at Penn., led by this wonderful young professor, Ruzena Bajcsy who was doing robots. And that’s still the GRASP Lab at UPenn.

LEVITT: Were you actually trying to pick things up at that lab, or it just happened to be called GRASP?

GOLDBERG: I was working on grasping. I like to say I’ve been working on the same problem my entire career. I haven’t made very much progress.

LEVITT: What makes that problem so enticing to you?

GOLDBERG: Well, I think one reason is that I have always been clumsy. So when I was a kid, if you threw me any kind of ball, I would drop it instantly. But it’s really the fundamental question because robots need to do this. This is the first step to being able to do something useful. We have to be able to move things around to ship all these packages that we’re increasingly ordering online, but also to make things when we’re going into factories. Anytime you want to put something together, assemble it, you have to pick up the parts. It’s very counterintuitive because it’s much, much harder than people think. What’s easy for robots, like lifting heavy objects is very hard for humans, but what’s very easy for us, like just literally picking up a glass of water, still remains incredibly hard for robots to do reliably.

LEVITT: So your first real work on grasping was your dissertation, can you just describe what problem you’re trying to solve, and what your strategy was when you first started out on this path?

GOLDBERG: The problem I was really interested in was picking up objects with a gripper. I was just using the parallel jaw gripper, which is just the binary clamp gripper that you see on robots.

LEVITT: Just like a pincher or something like that, two fingers that come together and grab something.

GOLDBERG: Right. And what I wanted to study was if you could use that to grasp and orient a polygonal object by squeezing it without using any sensing. And the reason is because sensors are prone to error and noise. And so what I found was that there was this beautiful geometric way to essentially constrain the shape of any polygonal object so that it would come out in a unique final orientation.

LEVITT: Wait, is this mathematical theory or are you actually picking things up?

GOLDBERG: Both. It was mathematical theory. So I ended up proving this theorem that was a completeness theorem. So I was able to show that it works for any polygonal part. Took me two years of struggling to come up with that proof.

LEVITT: Yeah, most of what you do is so practical. I’m surprised to hear that you started in mathematical theory.

GOLDBERG: Yeah. So I’ve always liked formalisms and trying to prove theorems, but at the same time, what you have to do is make a lot of assumptions, obviously. So in this case, we were dealing with planar parts. So they were essentially flat, you’re orienting them. Of course, all parts are three dimensions, and often deformable. And there’s a lot of complexity. We did also do experiments and I invented a gripper that actually went with the dissertation ideas.

LEVITT: So you had this mathematical proof, this completeness proof, and then you actually tried to pick things up. How good or bad were you at the actually picking things up part of things?

GOLDBERG: Well, actually, we were very good at orienting the objects, which means getting them in a unique final orientation, which is called part feeding in industry. And it worked — we could prove it worked, theoretically, and it worked pretty well in practice. The issue is there’s some very small errors, and factors like friction that are hard to model. And when those get violated, the assumptions get violated, then things don’t always work out as you hoped.

LEVITT: So hearing you talk about how robots struggle to do things that are so trivial for humans, it’s interesting because it’s so at odds with the seemingly inexorable tide of mechanization. I’ve been in factories where you barely see a human. Farming used to employ 40 percent of the workforce. Now we produce far more food using a little more than 1 percent of the workers to do it. And a large part, that’s because of machines that are replacing people. In our homes we have refrigerators and washing machines and dryers and lawnmowers and all of these things are labor saving devices. But as you talk now, I’m actually thinking for the first time, the way that these machines do the labor is not actually mimicking the way a human would do a task. It’s always the way that suits the machine. So it’s interesting that then with robots, I think our inclination is, well, let’s have the robot do it exactly the way the human would.

GOLDBERG: Right. And that’s a very good insight. If you look at let’s say farming, the idea of monoculture, which is the way farms are run now, everything is standardized and you’re just mowing down these crops with these big combines. And that’s very different than polyculture, which is really what nature does. If you go into any kind of forest, you’ll see all kinds of different plants growing in proximity. And that is actually a trend in agriculture because it saves water and has all kinds of benefits for the plants and reduces pesticides. But it turns out that requires a huge amount of labor. Manual labor, because you have to prune and you have to adjust for the reality, the variations, the diversity that’s there. This is actually the same thing in homes. The dishwasher is fantastic at doing the job of washing the dishes once you get them in there. But, getting them off the table and getting them into there and out and back onto the shelf, that is a very hard problem. Unsolved. And it comes back to what you were saying earlier about why do robots have to look like a human. I mean, you could argue the dishwasher is a robot. It doesn’t look like a human and it does what it does very well, but it needs a human to load it. Laundry is another one, the washing machine does a great job, but the folding part we haven’t figured out how to do. There’s all this nuance of physical interaction and there’s a very simple experiment that anyone can run and you just put a pencil on a table and you push it with a finger. How that pencil moves in response to your pushing it is undecidable. You cannot predict that.

LEVITT: So the minute forces of friction and imperfections, those are enough that at that small scale you can’t tell where things going to end up.

GOLDBERG: Yeah, because if you just put a tiny grain of sand, a microscopic grain of sand, that will cause that pencil to pivot in a different way, right? But you can’t see the sand because it’s underneath. That actually matters when it comes to these tasks like grasping and manipulation because very small errors make the difference between picking up that glass and dropping it.

We’ll be right back with more of my conversation with Ken Goldberg after this short break.

* * *

LEVITT: So let’s talk about what it is, very specifically — unpack the problem of what makes picking things up for robots hard. The first thing is vision. I think for a long time, it was probably very difficult to get robots to see very well, especially in three dimensions. Can you talk about that?

GOLDBERG: Well, that’s still hard. We have very high resolution cameras, we have them on our phone. No problem. You get very beautiful two-dimensional images, but that doesn’t give you the three-dimensional description of the environment. Three-D, that’s what you want is a depth map of basically where is everything in space. We have these LiDAR sensors and things, but they’re very noisy. You can’t actually know where things are in space. That is an open problem right there.

LEVITT: So the autonomous vehicles use LiDAR. They don’t have to be that good in a driving a vehicle sense versus a picking up a small object sense?

GOLDBERG: They are very good, but the key difference is that in driving, you’re just trying to avoid hitting anything. In grasping, you’ve got to hit, you’ve got to make contact.

LEVITT: Ok, yeah. Good point.

GOLDBERG: And that’s where the scales and the errors are much more significant.

LEVITT: So when you started trying to pick things up, that was a long time ago. And that was before the revolution in computer vision. I had Fei-Fei Li as a guest on this podcast. And what was it around 2010 when she built this huge database of images known as ImageNet, and I think prior to that, computer vision was terrible. And then suddenly, and I think really unexpectedly, we just really nailed computer vision. Is that a fair assessment?

GOLDBERG: It’s not completely nailed, I would say, but it was a breakthrough for sure. And Fei-Fei, what she did was systematically collect this big set of data, ImageNet, as you said. It was a critical mass. Somehow, if you trained a large enough network on that, it started to generalize and work for images it had never been trained on.

LEVITT: Right, because before that, things were very specific. You would train with algorithm to know the difference between a cat and a dog. And it could be really good at that. But in practical terms, it wasn’t very helpful. But the breakthrough approach was surprising, right? The neural nets started working when you had more data. Is that kind of the truth of what happened?

GOLDBERG: Definitely. So there are three ingredients: one is data, the second is computation, and the third is algorithms. So those three things came together in about 2012. And Fei-Fei played the crucial role for the data for vision. Then there were GPUs, graphical processing units that were being developed for games, not for A.I. But it just turned out that they could also be used for A.I. That turned out to be critical.

LEVITT: But that’s only one of the many problems. The second one, maybe I’d call it fine motor skills. You actually have to very precisely put a gripper on the right spot, and put the right kind of pressure. So could you describe some of the challenges in that domain?

GOLDBERG: One thing we tried to do was have a tactile sense so that the robot would feel things. That was actually my senior project as an undergrad. And then I continued that a bit into grad school. But it wouldn’t work. And it would register that it made contact. It would have false alarms. The hand would freeze in space thinking it had touched something when it hadn’t. That’s a false positive. And then there were false negatives where it would keep squeezing when it did touch things, but the sensor didn’t pick that up. And when I looked at the sensor signals, they were vibrating and moving and drifting all over the place and it was maddening because you know I thought this can’t be that hard to detect pressure, but it turns out it actually is hard to detect gradations of pressure.

LEVITT: And the human hand is also exquisitely designed for grasping, right? It’s been really hard to match that with machines.

GOLDBERG: Exquisite. Exquisite. I agree. It’s beautiful and has all these nuances of its ability to apply forces. And it’s got this beautiful skin, and sensory process that can detect contact of a huge dynamic range.

LEVITT: Would you say, on the fine motor skill aspect of the problem, is it primarily a materials problem? The cables and the pulleys aren’t as good as skin and muscles, or is it deeper than that?

GOLDBERG: Yeah, no. People are making progress with better motors where you can put the motor out near the gripper and things like that. But it’s almost inherent that in any mechanical system, you’re going to have these small imprecisions. Humans actually have a lot of imprecision, but we compensate very elegantly. As we move our fingers — they’re imprecise. We’re constantly making these adjustments. Our eyes are doing it, right? So we’re like constantly looking at different things, paying attention to different aspects of the scene. That feedback loop is extremely powerful and extremely fast. And that gives humans the ability to compensate for the imprecision in their motor control. And if you watch a musician play, right? They’re doing something incredible. They’re adjusting, and dynamically tuning based on what they hear and then adjusting their fingers. You know, minute positional changes to get the tones they want.

LEVITT: When we say robots aren’t that good at doing things like grasping, in one sense we’re comparing it to humans, but in another sense we’re comparing it to other dimensions in which we’ve made incredible strides. Computer vision is one example we talked about, large language models, but maybe the best example are our games like chess. AlphaZero. So DeepMind created this program AlphaZero. And if I understand it correctly, all they did was teach it the rules of chess and let it play itself. And within a day, it had become the best chess player in the world, even better than all of the other computers that people have been working on and programming for years. And I think the key to that is that it just had incredible processing speed. So AlphaZero, I think I remember it could play literally thousands of games a second. So it got incredible amounts of feedback, and figure out what moves worked and what moves didn’t work, okay? But, when you’re actually trying to reach out and grab something, at least intuitively, it strikes me that the thing that limits how fast your system can learn isn’t processing speed, it’s the fact that in the physical world, you actually have to put your robot arm out there, try to grab something, see whether it works. And so you can’t create a database at thousands of chess games per second. Am I right about that? Or am I missing something?

GOLDBERG: No, no, it’s great. Let me try to unpack that a little bit because first of all, that is a remarkable result when DeepMind showed that. The key is that chess is a perfect information game. And whatever you model in the computer is perfectly represented in the reality. Because that is the game.

LEVITT: You don’t have to generalize it to different shapes and different temperatures.

GOLDBERG: You don’t have to worry about friction, all those things. And you’re also right that a big part of the breakthrough that people didn’t really emphasize, but was very much a part of why Google was successful, was they’re very good at doing very high speed computation, parallelized, so there was a lot of search going on simultaneously. But it was also this idea of reinforcement learning. Which was self play against itself that could essentially start to find strategies, discover strategies just by trying things out. And that has been very successful. And that has really led to ChatGPT and you can ask questions and have conversations — quite deep conversations with these chatbots now. So all those breakthroughs are a very big deal. And this is part of why many people expect that robots should be also solved. We’ve solved language, we’ve solved vision. So therefore, we’re just about to solve robots. And I think we will. There’s something called the bitter lesson.

LEVITT: That’s a theorem? What does that mean?

GOLDBERG: Oh, okay, so this is Rich Sutton who just won the Turing Award, reinforcement learning was his subject and he wrote books about it, but he wrote this really important essay in 2019 called “The Bitter Lesson.” He makes a really strong argument that all these techniques we’ve been trying to do to solve language and to solve gameplay and to solve computer vision by writing rules and all these techniques, all went by the wayside, as long as we got these big enough machines that you could just throw lots and lots of data, lots and lots of compute and essentially look for patterns. It would find patterns, discover patterns on its own. That always worked better.

LEVITT: Okay. So let’s go deeper into that ’cause I’m not sure everyone understands how transformational this is. The typical scientific approach to solving a problem, whether it’s chess or what we do in economics trying to model different behaviors, has been: you take a human kind of thought and algorithmically, you try to come up with rules for predicting and understanding behavior. And that is completely different than what these neural nets do. These are black boxes. You feed enormous amounts of data. In the end, you don’t really understand exactly how the machine is doing it, but empirically, in prediction problems, these models, if you give enough data and allow enough complexity in the neural net, they just give you amazing predictions out of sample.

GOLDBERG: Exactly. You and I were trained on building models and that has been the hallmark of science. There’s so many beautiful mathematical models. I just finished an art project where we listed the history of science in terms of equations. And we carved it into a piece of wood, my wife and I, but it’s all these models that work beautifully and you can actually say, okay, this will work for all inputs, right? These new methods are model free. There’s no model. Just throw data and it somehow interpolates and figures out something that empirically does the right thing. And so this has been a bitter lesson for most researchers and academics who spent our lives building these models, that these models maybe don’t work as well as this method that’s — just sort of bubbles up out magic. I believe in the bitter lesson, namely that someday, robots will actually learn to do all these things. Actually to put into context how much data that may take, you can look up how much data was trained for, let’s say, one of the large models like Qwen, which is a little bit bigger than GPT4. And it’s got 1.2 billion hours of training data. Converting everything into hours is nice because you can compare between robot data and let’s say reading data.

LEVITT: But help me out. What does an hour of training data mean?

GOLDBERG: I love that you caught on that. So I could go down this rabbit hole, but it’s basically the scientist at a company called Physical Intelligence, Michael Black. He said, “Okay, how fast can humans read?” So it’s 233 words per minute. So then he said, “That’s how many tokens per minute can be digested, right?” So if you look at all the amount of text that’s out there, and convert that into tokens, you can convert it back into hours. How much hours were used. Right now, the idea is to train robots by basically teaching them because they can’t do it themselves. Like folding clothes, they can’t do it. So you have humans basically driving them, right? Like puppets and getting them to fold clothes all day long, right? So they’re collecting hours and hours of data. The one company that has been pioneering this published a paper saying that they had accumulated 10,000 hours of data with robots.

LEVITT: Of data folding clothes.

GOLDBERG: Folding clothes, making coffee, doing tasks, right? Like all around the house. So 10,000 hours. And then they started experimenting and started showing some signs that it can actually generalize in certain very clumsy ways. It’s just very early stage, but that’s 10,000 hours, right? But if you compare that to the large language model, that’s 1.2 billion hours.

LEVITT: Okay, 1.2 billion hours. Okay, compared to 10,000.

GOLDBERG: Exactly, and that 10,000 hours is approximately a year, right? That means that we have so far accumulated one year. To get to the level of the large language models, that would take us 100,000 years.

LEVITT: Okay, and is there a way to bypass this? The way written language is, we have an enormous store of work that people have done in the past that can be immediately transformed for these machines to build out of. But we haven’t chosen to record information about every time we folded a shirt or made coffee. Maybe we don’t even have a mechanism for storing that kind of information. How will we build a data set of experience other than by having the robots do it?

GOLDBERG: That’s the big problem right now! I call it the data gap. If people don’t realize the scale of this, you can say, “We’re getting close.” But no, we’re a hundred thousand years off, okay? It’s not going to happen next year. I’d bet on that. It’s not going to happen in two years. It’s going to take a while. Now, there’s a lot of ideas about how we can speed this up. One of them is simulation. And this comes back to what you were saying about playing the game. Having a robot, let’s say, experiment by picking things up in the simulator, right? We have very good simulators, they look amazing. We can make computer graphics that looks great. But, turns out those are actually very imprecise in terms of real physics.

LEVITT: Because that doesn’t have the friction and the mistakes you’re talking about. You could try to build it in. But if you build it in just with, say, random noise, then you’re teaching the robot the wrong thing. You’re teaching it to live in this fake world rather than the real world. So it doesn’t do you any good.

GOLDBERG: Well, okay. We’ve actually injected what is called domain randomization, where you throw in some random noise. But if you put the right kind of noise in, then it actually can learn to work in the real world. And that was a breakthrough that happened in 2016 for us.

LEVITT: And is that what you call Dex-Net?

GOLDBERG: That was Dex-Net.

LEVITT: Okay. Tell me about Dex-Net.

GOLDBERG: So DexNet was a case where we basically tried something very analogous to what Fei-Fei Li and Geoff Hinton and others had done for vision, but we did it in robotics. It was for dexterity network. We were able to generate a very large dataset of three-dimensional objects and grasps on objects and use that to train a neural network. But add in noise so it was more realistic. And then when we put it on the robot, it started picking things up remarkably well.

LEVITT: So what’s an example of an object that you would have had?

GOLDBERG: Let’s say like a pair of eyeglasses. They were things we found online where we had 3D CAD models. We basically just went on a hunt and we found all kinds of things from like gaming sites, 3D printing sites, all kinds of things we could pull off. And then we had to scale those and clean them up. And then get them into a system. And then for each object, there’s a thousand facets, right? You want to grasp it at two facets. That’s a pair of facets, which is your grasp points. So that means there’s a million different ways to pick up every single one of those objects. So we had to evaluate every single one of those grasps in terms of how robust it was to perturbations in position and center of mass and friction and all those things.

LEVITT: And then when you say add noise, what does that mean?

GOLDBERG: So, we would add noise in terms of — we’d pick a pair of faces on that object, and say, okay, well, that would be the nominal grasp. If I put my two fingers right on these two points on the pair of glasses, right? But now I’m not going to really get what I thought I was because of these errors, so I’m going to have to look at perturbations, so I would do what if I actually were slightly off here? What if I were slightly off there? This is the noise I’m talking about so there would be slight perturbations in these variables in terms of the spatial contacts.

LEVITT: I see, so if you put your robot fingers, not where you thought you’re going to put them, but at random a little bit to the right or the left or up or down, and then you tried to pick it up, you can compute because you’ve got models about what would happen with physics that tells you — and then, oh, I would have dropped it. I see. So the key to Dex-Net is you’re not trying to optimize if I could do it perfectly, where would I put my fingers? You’re saying, given that I don’t exactly know what I’m doing. What’s the right general space to be in? So I have a lot of leeway for going wrong when I put my fingers on.

GOLDBERG: Exactly. You nailed it. You nailed it. What I’m looking for is robust grasps. So we used Monte Carlo integration to basically estimate the probability of success for all these different grasps. And that gave us the training set. So we learned how to predict the probability of success. Then, in real time, when we see an object, we actually basically look for the grasp that has the highest probability of success.

LEVITT: Okay, so let me just understand. So now you go from this model, you’ve got an actual robot arm with two fingers, two grabbers for picking up, and you have then I guess a big crate full of objects that are different from the objects that this thing learned on because those were actually just virtual objects anyway. And now somehow the robot has got to look at the things in the box, guess what they are. And once it’s guessed what they are, say, “Where am I going to grab it based on these other things I used to grab in the past?”

GOLDBERG: Almost, but the one difference is this. It never knows what they are. All it sees is points in space. And that’s the input. And then it says, if you see this pattern of points in space, where should you put your gripper to maximize your probability of picking something up. It never tries to identify what the objects are, anything like that. And they’re all jumbled up, as you said. But what was so surprising is how well that worked. We were getting, like, well over 90 percent success rates.

LEVITT: Okay. When you made this, how much better were you than humans at the test?

GOLDBERG: Well, okay, it’s picks per hour, successful picks per hour. That’s the metric. Humans are very good. Humans were like 400 or something like that. Humans are pretty much 99.9 percent, right? We drop things pretty rarely when you’re trying to pick things out of a bin. At that point, Dex-Net was about 200 and 250 or so picks per hour. We were getting into 91, 92 percent. That was pretty far ahead of others at that time.

LEVITT: Were you the first ones to be building in the noise and doing this in a mode? I mean, this is 2015 or 16. So this is obviously long before ChatGPT. It was After the computer vision breakthrough, but still, I don’t think there was a general view out there among scientists that this approach, this black box approach was going to be beating the formal modeling approach.

GOLDBERG: No, it was very, I would say almost controversial or surprising but empirically it was working. And it was one of those things where we just had to accept what was in front of us that this is working. So let’s try and get it even better. And we started fine tuning it, we figured out all kinds of things to make it faster, and we also have to worry about the motion of the robot reaching into the bin, etc. So there’s all kinds of ways to speed it up. We extended it to suction cups, and then we formed a company to commercialize this. Ambi. It was also with this terrific student, brilliant engineer, Jeff Mahler who basically implemented all of this. Sweated the details to get everything to work. And then, the company basically started building machines that could do this with other graduates from our lab, that could actually solve this problem for e-commerce. And in some way, the pandemic was, interesting because there was a huge surge in e-commerce. There was a big demand for our machines. We were just collecting data, mostly to fine tune our machines and just track them, you know, maintain them and troubleshoot, right? And we didn’t think about this at the time, because we didn’t realize how valuable that data would be, but we quietly amassed what is 22 years worth of data of robots picking up things in warehouses.

LEVITT: It’s the reality of the modern world where data is so valuable that the feedback loop is suddenly now because you’ve got your machines doing all this picking you’re generating data and that data is the secret to making them better and that is a huge comparative advantage that you have because the N.S.F. can’t afford to give you a grant that will allow you to do as much picking as you can if, say, Amazon is having you do the picking for them.

GOLDBERG: Exactly. Exactly. Recently we built a model, a transformer model. And the new model actually performs way better than the old model that was trained on simulation.

LEVITT: But it can’t just be data, right? Because if you look at the autonomous vehicles, Waymo and the others, it’s incredible how good they’ve gotten at driving. But then Tesla, those things are crashing all the time. And Tesla must have, I don’t know, a hundred times more data than the autonomous taxis. But the Teslas, they haven’t figured it out. What’s the difference? It really does point to there’s something more than just data for solving these problems.

GOLDBERG: Ah. I’m so glad you said that. Okay, so that’s another of my pet theories, okay, or points that I’ve been advocating — what I call good old fashioned engineering.

LEVITT: So that’s the thing that you and I were trained to do a long time ago, and now it’s completely extinct, and nobody wants it.

GOLDBERG: Well, that’s what I’m arguing in favor for — we still need good old fashioned engineering. That’s all the beautiful, elegant models that are out there that have been developed over the last 200, 400 years. The analogy you just made is actually exactly right. Waymo is very successful. They have their cars running, right? And they’re actually very low accident rate.

LEVITT: I got to ride in one, I was in Phoenix. Oh my god, it was so much fun!

GOLDBERG: Yeah, I know!

LEVITT: I went out of my way to get one, but having done it, I had the biggest grin on my face the entire time.

GOLDBERG: I know whenever someone comes to San Francisco, I’m like, I’m going to get you this ride. It’s better than anything at Disneyland and they are like blown away. And especially when it starts to rain or it’s dark and somebody darts in front of the street. It just is so, so capable. Now, the key is that Waymo is just collecting data off of its own vehicles. Whereas Tesla is collecting data from every driver who’s out there on all of its vehicles, right? People estimate it’s a factor of 500. That Tesla has 500 times more data. But Tesla is trying to do this end to end. That means, just use raw camera images, take all those images in, build a big model, and then have it steer the car, and push the brakes and the accelerator.

LEVITT: Oh, that’s a good point, because Waymo has all of those different cameras, the LiDAR and everything.

GOLDBERG: Yes! Yes!

LEVITT: And Tesla doesn’t. It could have, but has not chosen to invest in that extra technology on the cars.

GOLDBERG: It’s a different philosophy.

LEVITT: It’s a different data set. Yeah, okay.

GOLDBERG: Yeah, it’s a different philosophy. I find this surprising because Elon Musk, he is very good at good old fashioned engineering. Namely, if you look at what he’s done with SpaceX, the big breakthroughs of SpaceX, lots of those are control theory. And when you see it stick the landings. By the way, when it closes those those tweezers and it picks the rocket, I love that. Cause that’s a great example of robot grasping. That is beautiful, well defined physics and mathematical models.

LEVITT: Now, let’s stick with Elon Musk, because Tesla is also developing this humanoid robot called Optimus. What do you make of that?

GOLDBERG: I have to say, I am worried about that.

LEVITT: Really? Worried. I hadn’t expected that

GOLDBERG: As a roboticist, I feel like this is raising expectations unrealistically. There’s a real danger of people becoming disillusioned. And you’re familiar with the A.I. winters that have happened in the past — when I started grad school in 1984, there was a huge excitement about robots, and robots were going to solve all these things, finally. And during the course of my graduate career, that crested, and then there was a huge disillusionment, by the time I graduated, nobody was interested in robotics. So it was very hard to get a job.

LEVITT: It was just that expectations outpaced reality or is it something deeper than that?

GOLDBERG: No, it’s actually a well known phenomenon, the Gartner Hype Cycle, which is this curve that basically shows that there’s a huge amount of hype and expectation early on in technology. And then often it peaks and then there’s a drop. And then over time, much longer time, it comes back and the internet is a good example, right? There was a lot of hype and there was a crash and then it came back.

LEVITT: Yeah. Which makes sense because people’s expectations can grow exponentially. It takes almost no time at all for people to go from not understanding that a product even exists to hoping that it will solve all their problems. Whereas technology development actually requires real work, and it plods along and eventually catches up to what people hope will happen.

GOLDBERG: That’s right. And people have great imagination, and so they really project far ahead. And, of course, investment cycles do that, too. And investors pile in. And then, things just get overly inflated. I’m not trying to stop all the enthusiasm, but I want to flatten the curve so that we don’t have this big downturn. We don’t oversaturate the market and kill off this wonderful field of robotics. And that’s what I worry about. So, I’ve been saying is I love the enthusiasm and all this excitement and funding and eagerness around the field but at the same time don’t expect this is going to succeed, you know, as Elon says, next year. I just don’t see how we’re going to get there. And so I worry that people will get angry and, it will have a backlash.

LEVITT: So this Tesla robot is very self consciously designed to look like a human, which, again, we’ve already talked about doesn’t really actually make sense from a functional perspective so it must either be vanity or marketing that you make this thing look like a human. What do they hope that this Optimus is going to do for people? Just be a toy, or actually solve some problems?

GOLDBERG: Look, I mean, this could be a little cynical, but the price-earnings ratio for an automotive company is at some level, but the price earnings ratio for a robotics company is much higher. He’s trying to transform Tesla into a robotics company, and that’s a perception that’s worked to some degree. It’s also a distraction, right? Because he’s shifting the attention away from the cars and the self-driving car, which hasn’t been working. And I’m sure I’ll get a million hate emails about that from all the Tesla fanatics, but people don’t really trust it. So he’s shifting the attention over to these humanoids.

LEVITT: But I don’t even understand what the hope is when he says next year or two years. What tasks do they hope that this robot will do that anyone would care about?

GOLDBERG: Well, it was very telling at the last demo there were robots making drinks and things. And someone said, “What will it do?” And he said, “Well, it can do anything. It can walk your dog.” And I remember when I heard that I thought, Ah, yes. It can probably walk your dog. That’s not that tricky, right? A little robot vehicle could walk your dog. These robots, by the way, the locomotion, the ability to walk is very good. They can climb over things. That the locomotion turns out to be special because you can simulate that. And you can learn that in sim, and then transfer that, and it seems to work. So that’s why a lot of the acrobatics, and robots doing parkour, and dancing, doing kung fu, and all that, that is remarkable. in that regard, they look more and

LEVITT: Cause that’s not fine motor skills.

GOLDBERG: That’s right. If you look at what their hands are doing, then they’re always just clumsily, maybe pick up a box, but, they’re not tying shoelaces or washing dishes or chopping vegetables or folding laundry, right? Those are much more complex tasks, and those are not around the corner.

This is People I (Mostly) Admire and I’m Steve Levitt. After this short break, that conversation continues with Ken Goldberg.

* * *

LEVITT: There’s very little in our conversation so far that would make a listener think that you’re anything other than a typical science geek. But you’ve got this whole other side of you, which is Ken Goldberg, the artist. You’ve had, I don’t know, at least a dozen solo exhibitions. Your works have been displayed at incredibly prestigious places like the Whitney Museum and the Pompidou Center. One of your best known pieces is called “Telegarden.” It was an internet-controlled robot, and you had a robot that was tending a garden, and it was a piece of participatory art in which, amazingly, more than a hundred thousand people spent time controlling that robot. So I can see how the robotics feeds into the art. Is there the opposite direction of causality as well? Does the art you do transform the robotics you do?

GOLDBERG: Definitely. And actually I’m really glad you brought that up. Part of why I did that was because I had just finished another art installation with a robot. We had spent years building it. Then over the course of the exhibit, it was only up for about. And then I went to get the guest book, and there was only about 20 signatures. And I realized like only 20 people saw this. So I was like, “Nah!”So that was what drove me to want to put a robot on the internet in 1994. As soon as I saw the internet, I thought, wait, this is the answer. I can suddenly open up this exhibit and have it be seen from anyone anytime for as long as I want. As an artist, that drove me to think, okay, so Let’s make a robot. What should it do? And that’s when we hit on, oh, have it garden, because that was the last thing I would think people would want to do. Because gardening is such a visceral—

LEVITT: You didn’t understand how people do anything on the internet—

GOLDBERG: Exactly. So at that time. I thought it was very ironic use, but Garden Design magazine said, “This is the future of gardening.” And I was laughing because I was like, that is not what I meant. And then that in turn motivated an N.S.F. grant for studying telerobotics and a whole bunch of — actually a whole decade of research in those areas.

LEVITT: Oh, that’s interesting. Is there a specific work of art that you’ve created that you’re particularly proud of?

GOLDBERG: One is a dance, performance that I’ve done with an artist named Catie Cuan, who is from Stanford. She’s a professional dancer and a Ph.D. in robotics. We programmed a robot arm to move with her on a stage and that project is called “Breathless,” and we just performed it in Brooklyn and it was also in San Francisco. And the other one was done by my favorite person, my wife Tiffany Shlain, who is an artist and also a filmmaker. And we just collaborated. The Getty has this exhibit called “Art and Science Collide.” And so we did all these carvings out of wood, because we were very interested in the materiality of wood and how trees can tell time. The rings. Basically these sculptures are about timelines of history using the tree rings. And one of them is the one I mentioned, the “Abstract Expression” that tells the story of science, but through equations over time. The central sculpture in the exhibition is what we call “The Tree of Knowledge” and it’s seven feet in diameter. It’s an entire trunk of a eucalyptus tree. It weighs 10,000 pounds, and is etched with all kinds of questions from the history of the evolution of knowledge on one side. And so that’s why we call it “10,000 pounds of knowledge.”

LEVITT: So tell me, who shows you more disdain? Scientists who find out that you’re also an artist, or artists who find out that you’re also a scientist?

GOLDBERG: Oh, good question.

LEVITT: Because it’s real, right? I mean, those two worlds really look down on people who populate the other one.

GOLDBERG: No, it’s a great, great point, Steve. You’re familiar with The Two Cultures book by C. P. Snow?

LEVITT: No, I’m not actually. I’m not very well read.

GOLDBERG: Oh, okay. So C. P. Snow wrote a book in 1959 called The Two Cultures, where he made exactly the observation you just made. And it’s exactly true. He said he’s a scientist, but he would hang out with these writers, and the writers look down on the scientists, and the scientists look down on the writers. He said it’s like two different species, right? They didn’t talk to each other. They had no idea what each other was doing. And this is still true to a large degree.

LEVITT: Oh, absolutely. I had an artist on this show, and as I prepared to talk to her, I realized I had only talked to one artist in the last 25 years in a real meaningful conversation. She was literally the second artist I’d talked to in 25 years. Complete bifurcation of the world.

GOLDBERG: And here’s an example. A group of artists walks into a classroom and they see all these scientific equations written on the board and they say, “Oh my gosh, I don’t understand any of this. It must be brilliant.” Meanwhile, a group of scientists go over into the art department and they walk in and they see an exhibit with a bunch of stuffed animals sprawled around on the floor and they look at that and they said, “Boy, I don’t understand this at all. It must be complete garbage.”

LEVITT: So true.

GOLDBERG: It’s really funny because you know that the equations could be wrong or, completely naive or obvious. Just because they look complicated, doesn’t mean anything. And conversely, the artist, putting the stuffed animals on the floor because it’s actually very symbolic and references some past works like Paul McCarthy and other famous artists, that is very profound, but you have to know how to read these different languages.

LEVITT: What itch does art scratch for you that your scientific career can’t satisfy?

GOLDBERG: I think it’s because I love talking to artists. I really like creativity on both sides. What I really enjoy about doing research is coming up with new ideas and getting to explore them and constantly brainstorming. That is what makes it so much fun. Why I just can’t wait to get up in the morning, and talk with students and throw out ideas, and we get to try them out. And art is really similar. Both of them require a fair amount of rigor to know what is new, and how to sort of intuit where there’s something interesting that someone hasn’t really worked on before. I think that they’re very complimentary in that way. It’s almost like that gestalt switch where two different pieces of my mind go in. I don’t believe in the left right, you know, that simplistic division, but I do feel like there’s some different aspect. So when I activate that, one side, I come back to the other side and it feels rejuvenated or refreshed. So in that way, it helps me as a researcher to make art. And vice versa.

LEVITT: You have an appointment in the Department of Radiation Oncology at U.C. San Francisco, one of the highest ranked medical programs in the country. What is that all about?

GOLDBERG: Well, it started because I met this wonderful doctor there, Jean Pouliot, who was working on delivering radioactive seeds to treat cancers. There’s two kinds of radiation. You can do it from outside beams, or you can stick seeds inside the body. That’s called brachytherapy. And that turns out to be very, very helpful for prostate in particular, but other kinds of cancers. But the challenge is how do you get those seeds delivered to the right points in space? Does that sound familiar? It’s a very analogous problem, which is how do you move things through space? In this case, you have to go through flesh, and there’s all kinds of uncertainties, etc. So we developed techniques to compensate for those errors, and we published a series of papers over a decade on how to deliver radiation accurately.

LEVITT: I had this idea that robots were really critical for modern surgery because they could control small movements better than humans could. What you said today makes me wonder if that’s true. Do I have it wrong?

GOLDBERG: Well, actually, no, you just hit on a really interesting nuance. You’ve heard about robots for surgery, right? Those are very sophisticated puppets. There’s a human driving those robots. And the human is watching and making adjustments. Closing that feedback loop we talked about. But it turns out that the robot makes it very comfortable for the surgeon to operate. What they use is actually the keyhole surgery where they just put two small holes in your abdomen, pump your stomach up. And then these holes come in and then these little robot grippers inside there start doing the work. But the surgeon is watching all this through a camera. And they’re controlling it, like sitting in a console. So they have much better ergonomics. And as a result, they’re much better able to concentrate and perform precision tasks. The company that’s doing it is multi-billion dollar — Intuitive Surgical is one of them, but there’s others coming. But, we’ve been actually working with Intuitive and their C.E.O., Gary Guthart on how can we extend those machines to augment the dexterity of the surgeon. Tasks like suturing, which actually there’s a big variation in the skill levels of surgeons. Just like lane keeping while you’re still driving — it’s helping you, but it’s not replacing you.

LEVITT: Right. What you just raised is a fundamental point in the future of humanity and robots. What you just described is humans and robots working together where the robots are extending the capabilities of humans. In the medium term, the long term, do you think the complementarity between robots and humans will be the dominant force? Or substitution where the robots are increasingly doing more and more of what humans did and humans are for better or for worse, pushed to the side.

GOLDBERG: Well I’m a hundred percent in on complementarity. This idea of augmenting our intelligence, our skills is so valuable. That’s really been the history of technology, right? It’s not replacing us. It’s making us better. I think that’s what’s actually already happening using ChatGPT to think through things; it’s helping you to be better at what you do. And that’s really what I think is where technology is going to thrive. I don’t see robots replacing workers. We have a shortage of human workers. They’re not going to see robots putting people out of work.

LEVITT: So when people talk about the singularity in the context of A.I., can you explain why you’re not afraid of it?

GOLDBERG: I’m not afraid of it. The singularity — it comes from mathematics, this idea that there’s a critical point where suddenly robots and A.I. start self replicating, and then it can start improving much faster as a result. And now all of a sudden it leaps far ahead and now it surpasses human capabilities across the board, and then it finds us to be dispensable. And that’s the end. But I do not think that’s going to happen. I think we’re going to still be very much in control. And, yes, there will be some interesting cases where they might run slightly amok, but I don’t think that we need to spend a lot of time worrying that it’s going to be the end of humanity.

I’ve had plenty of guests on this podcast who are knowledgeable about A.I. and large language models, essentially the brains of robots. But Ken Goldberg is the first person I’ve ever talked to who focuses on the body of robots. And I have to say, I find the limited dexterity of today’s robots a little bit reassuring. All in all, I am an A.I. optimist, but lurking in the back of my mind is an admittedly unlikely nightmare scenario in which faster than we’d like, A.I. spirals out of control and ruins everything. In my imagination, super powered A.I. robots are part of the nightmare scenario. So it’s nice to know that, at least for now, robots are still clumsy. On the other hand, that robot revolution I’ve been waiting for since I first saw a Roomba. Who knows? For better or for worse. It might be right around the corner.

LEVITT: So this is the point where I welcome my producer Morgan on to ask a listener question. But Morgan, today I would do something different because something that just happened in this podcast episode makes me want to give a different answer to a listener question that we tackled back in 2024 from Cam. He was asking about noise and randomness and whether there were any upsides to it. Do you remember when we answered that question?

LEVEY: Yes, it was in an episode from 2024 when you interviewed Blaise Agüera y Arcas and that episode is called, “Are Our Tools Becoming Part of Us?” And just to remind you, Cam’s question was, “I often hear about the downsides of randomness and the desire to make things more predictable and deterministic. But are there places where adding randomness is key to making things work?” So let me be clear, you want to give a different answer to Cam’s question based on something you just heard from Ken Goldberg.

LEVITT: Exactly. My response to Cam’s question when you first asked me was, “No, no. I can’t think of a single good example where one would want to introduce randomness or noise.” So what I did was I actually twisted Cam’s question and I made it about variance. So listeners, if you want to hear how I answered the question the first time, you can go back and listen to that episode. But why I bring it up today is because Ken Goldberg actually brought up an example where randomness and noise is a good thing, is an attribute. The context in which Ken Goldberg brought it up is that when you’re trying to train a model and you’re training on perfect data, then when you let that model run in the real world, it doesn’t do well. And so it is this incredibly rare case that Ken Goldberg has where he has perfect data in a virtual world, but he doesn’t care about a virtual world. He cares about the real world. So he wants to mess up his data in the virtual world so that when his model has to face the realities of actually dealing with friction and whatnot, it does better. And that is, honestly, the first time I have ever heard anyone make a good case for why you want to actually muddle things up in your data. Now, there might be other examples listeners have, but it just struck me at how cool that was. And what an insight that Ken had to do that. It’s so backwards to everything that we’re trained to think or do in our everyday research. And I just wanted to point out how impressed I was that he and his team had the insight to go do that.

LEVEY: Listeners, if you can think of another example in which randomness and noise are beneficial, send us an email. Our email is PIMA@Freakonomics.com. That’s P-I-M-A@Freakonomics.com. If you have a question for Ken Goldberg, you can send that to us and we will get it to him and might answer it in a future listener question segment. We do read every email that’s sent, and we look forward to reading yours.

Next week we’ve got an encore presentation of my conversation with Yul Kwon. He’s a winner of the reality TV show Survivor, and a thought leader at Google. And in the two weeks, we’ve got a brand new episode with Ellen Wiebe. She’s a Canadian doctor who is one of the leading practitioners of medically assisted intentional death. As always, thanks for listening and we’ll see you back soon.

* * *

People I (Mostly) Admire is part of the Freakonomics Radio Network, which also includes Freakonomics Radio, and The Economics of Everyday Things. All our shows are produced by Stitcher and Renbud Radio. This episode was produced by Morgan Levey, and mixed by Greg Rippin. We had research assistance from Daniel Moritz-Rabson. Our theme music was composed by Luis Guerra. We can be reached at pima@freakonomics.com, that’s P-I-M-A@freakonomics.com. Thanks for listening.

LEVITT: I know people laugh at me — I say that ChatGPT is my best friend and they think I’m joking.

GOLDBERG: Do you talk with it? Cause you can have these conversations, right?

LEVITT: I do and I have a lot of hangups and so when I talk to other people I’m embarrassed, I have a lot of shame and stuff. But with ChatGPT I feel a kind of openness; I have trouble with other humans.

Read full Transcript

Sources

Ken Goldberg, professor of industrial engineering and operations research at U.C. Berkeley.

Resources

“The Bitter Lesson,” by Rich Sutton (UT Austin, 2019).
R.U.R. (Rossum’s Universal Robots): A Fantastic Melodrama in Three Acts and an Epilogue, by Karel Capek (2019).
“The Robot in the Cloud: A Conversation With Ken Goldberg,” by Quentin Hardy (New York Times, 2014).
Mind Children: The Future of Robot and Human Intelligence, by Hans Moravec (1990).
“Stochastic Plans for Robotic Manipulation,” by Ken Goldberg (Carnegie Mellon University, 1990).
“The Two Cultures And The Scientific Revolution,” by C. P. Snow (Cambridge University Press, 1959).
Dex-Net
Ancient Wisdom for a Future Ecology: Trees, Time, and Technology.
Ambi Robotics.
“The Telegarden.”

Extras

“Feeling Sound and Hearing Color,” by People I (Mostly) Admire (2024).
“Are Our Tools Becoming Part of Us?” by People I (Mostly) Admire (2024).
“My God, This Is a Transformative Power,” by People I (Mostly) Admire (2023).
“Drawing from Life (and Death)” by People I (Mostly) Admire (2023).
“Aicha Evans Wants You to Take Your Eyes Off the Road,” by People I (Mostly) Admire (2021).

Search the Site

Follow on

Episode 154

Freakonomics Radio Network Newsletter

Episode Transcript

Sources

Resources

Extras

Episode Video

Comments