Norbert von der Groeben
If you follow Stanford tech spinouts, you likely know Christopher Ré. The associate professor of computer science in the Stanford AI Lab and HAI faculty member has co-founded four machine learning systems companies – two snapped up by Apple – and advises and invests in a half-dozen others.
The catalyst to this commercial success, in many ways, is a paper he co-authored 10 years ago that he describes as little more than an “art project.” That paper, though, redefined researchers’ approach to building ML systems and helped launch a new field combining machine learning, optimization, and systems.
Hogwild! (that exclamation point is a nerdy inside joke) threw out a conventional ML assumption taught in every classroom and showed researchers that this new breed of systems needed a new set of rules.
This month, Ré’s 10-year-old approach won the “Test of Time” award at NeurIPS’ annual conference, the main gathering spot for machine learning and computational neuroscience scholars. Here, he explains what the paper, written with UC Berkeley’s Benjamin Recht and the University of Wisconsin’s Stephen J. Wright, did for the ML community and his career, what he’s working on now, and what that inside joke is all about.
What is Hogwild!?
Most of machine learning and artificial intelligence basically boils down to solving large optimization or math problems, and your goal is to try and find the right setting of all these variables in these huge models that allow you to make predictions. Hogwild! is a method to be able to find those variables very efficiently.
What was innovative about it 10 years ago?
At the time for computer science, the way we taught people how to take advantage of multiple computers and run things in parallel is that they had to very carefully order each operation. So you imagine you have 100 machines working on a problem, and each one of those machines has its little piece. They have to coordinate and talk to each other. The problem is that when they go to update each other, they have to make sure that they know who’s writing in the ledger. You don’t want them to write over in the same spot on each other. So we taught in our introductory courses that you have to follow these locking protocols.
If you have all these parallel processes running, maybe they get to the right answer, but the fact that they’re overwriting each other all the time means that you make no forward progress, or worse, you get the wrong answer.
What Hogwild! said was, “Hey, actually, when you’re solving these large, big statistical models, something simpler than that may work.”
Our first theory said roughly, “If these conflicts in the ledger are rare, then we can think of them like noise.” Later, we established even weaker conditions for it to be true. What’s going on is a kind of a nice statistical trick – the values in the ledger are estimates of the true value, and if you’re wrong, the model kind of nudges you in the right direction. Maybe you take a false step, but if most of your steps are good, you’ll still get there. If you could update the model once a second with locking, you could now update p times in a step.
So Hogwild! showed that you can basically not do any locking whatsoever. What was shocking to me as a computer scientist was not only did it get the right answer, it actually got the right answer fast.
So you took an industry rule and said, “We can throw it out. It doesn’t matter.”
The reason it had such a crazy name to begin with was it was intentionally a crazy idea. It was really more like an art project. What it really meant is that these new types of systems that we were building didn’t follow the old conventional rules of how people were going to build computer systems. I started to re-examine a lot of what I figured were basic assumptions.
So I think the paper, rightly or wrongly, gets a little bit of credit for showing people these new breed of machine learning systems are going to be constructed by a new set of rules.
What’s with the exclamation point?
This is a very weird insider nerd joke. Basically, machine learning papers would torture themselves to make acronyms and then call it a name. As I was writing a paper, I kept putting Hogwild! with an exclamation point in. [Co-author] Steve [Wright] was like, “What does this even mean? It’s not an acronym. It’s just a stupid name.” I thought the phrase “going hog-wild” was hysterical to describe what we were trying. So I thought an exclamation point would just make it better.
In what ways did you see companies and researchers start picking this up?
Because it was a concept, it was really easy for folks to adapt and use. Most companies played with it. Google had a system which followed on it, which was called Downpour SGD. Microsoft had something. A lot of downstream companies use it as an arsenal in their technique. I think it’s in most popular packages. But we were one little tiny brick of this mega-artifice that all these folks were using.
To me, the bigger impact was pushing people to rethink their approach to computer systems. I remember just a couple years ago, it struck me when people would cite our paper to justify things that were crazy ideas, like, “Hey, Hogwild! worked. Why wouldn’t this work?” I think we got really lucky that there was this precise, technical result that we could show something interesting. But the consequence of it was lots of other smart people piled in to take this idea in new and different directions.
What impact has Hogwild! had on your work?
When this paper was written, we didn’t have that many machine learning products around us. At the time, Google Search didn’t use machine learning. Siri had yet to be launched. We were thinking about what the world was going to be like when machine learning was all around. That’s what my research was really all about, trying to understand how these systems were going to be built. It really convinced me that they were going to be different than the last generation of systems in foundational ways, not just taking the old systems and repurposing them, but we had to really reconsider some foundational trade-offs.
That was my work for a number of years. It ended up spinning off companies like SambaNova, which was a hardware and software company where we said, “Look, you need a different model if you want to be able to process machine learning and data at really large scale.” All those investigations started from this paper where I started to question what were the foundations of those systems. That was a really personal impact on me.
The other thing was, because of this paper, I ended up meeting a bunch of fantastic folks. At the time, there wasn't really a machine learning and systems community. It was kind of this weird hybrid that spanned different areas, and people were thinking about it, certainly, but there wasn’t a conference to go to or any of those things. So through this paper people reached out, and I got pulled into that area. That’s been tremendous for my work and set me off in a new and different direction for the next number of years.
That paper was quite the catalyst.
Machine learning and systems would have happened whether or not we had written this paper. For me, personally, it was the lightbulb moment where I could see how I can contribute. It was really an engineering problem to me, and I could see that there was a way that we could contribute that was meaningful from the academic side.
Honestly, I did receive career advice after this: “You’re making a career mistake because you’re going into a new, weird area where who knows who’s going to tenure you.”
What does a NeurIPS’s “Test of Time” award mean for you?
It’s exciting to me now that people looking back appreciate such a weird and oddball contribution and one that didn’t try to look like the other kids. It was very clear that this was a weird idea. They could have very easily rejected that on all kinds of different grounds, and the fact that now, the core machine learning conference, which has gotten massive, is accepting this and saying it’s something they value, that was shocking to me. That was awesome.
What are you working on now?
Two things. Our most recent work has been changing how people program these machine learning systems. We think that there’s a way to make programming them dramatically easier. How do you teach the machine learning system and get your domain knowledge into it as fast as you can? That’s a project called Snorkel, which was led by my phenomenal student Alex Ratner and a host of different students and now has become a company as well. That’s actually in systems that you probably use today, thanks to great collaborations with Google, the folks over in Ads and YouTube, and other places at Apple as well. So that’s really exciting.
We’re also really interested in how people’s knowledge about the world can be best transmitted to machines. Some of this knowledge machines can learn from just reading huge corpora, called self-supervision. But sometimes there are key bits of reasoning they still miss. We’ve been really interested in this boundary. What is the minimal amount that people need to tell these systems to get them to work? To improve them when they fail? That’s something that we’re in the very early stages of. We’ve just published a couple of papers. But I hope to talk more about that in a Stanford HAI seminar this winter.
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.