Selective State Space Models: Solving the Cost-Quality Tradeoff
As AI is increasingly used in production scenarios, costs are mounting. Are alternative architectures the solution?
Technology development is unpredictable. You have to build things if you want to understand them.
There is no shortage of think pieces making comparisons between artificial general intelligence (AGI) and the Manhattan Project, especially with the popularity of Christopher Nolan’s “Oppenheimer” this summer. However, one of the most interesting comparisons can’t yet be made: We know exactly how the research program for the Manhattan Project succeeded, but we don’t yet know the full story for artificial intelligence. With the Manhattan Project, we know with hindsight which scientific and technical discoveries were needed, why they were hard, and how they were eventually made; with AI, of course, we only know the story up until today.
The reason this matters is that technology development only looks obvious in retrospect. As a way to understand what the Manhattan Project might have felt like at the time, I want to recount a few hard problems that might have delayed the project indefinitely, or scuttled it altogether, were they not overcome as swiftly as they were. There are are two ideas that will come out of this, both of which are relevant to AI today: (1) It’s hard to predict tech timelines if you need a scientific breakthrough to continue development, and (2) the only real way to predict anything about technology development is to build things.
One way to read the history of fission bomb development is that the physicist Leo Szilard called the shot. He came up with the idea of a nuclear chain reaction in 1933, patented it in 1936, wrote the famous letter with Albert Einstein in 1939, and then saw his ideas validated with the Chicago Pile in 1942 and the Trinity test in 1945.
Another way to read the history is that we were “lucky” to discover fission so quickly, because without it we’d have no mechanism for creating chain reactions. In 1933, nobody was thinking about fission, and nobody knew how to achieve a nuclear chain reaction, including Szilard. The intuition for a nuclear chain reaction is geometric growth: If you can shoot one neutron into the nucleus of some element, and if this causes n > 1 neutrons to emit and shoot into other nearby nuclei, you might hope for a chain reaction. But how would you actually do this in practice? Szilard had some ideas, but none of them panned out. It seemed like a dead end for several years.
It wasn’t until the surprise discovery of uranium fission in 1938 that chain reactions looked feasible again. Otto Hahn and Fritz Strassmann made the experimental discovery in late 1938, and then Lise Meitner and Otto Robert Frisch explained it theoretically in early 1939. When Uranium-235 splits, it releases an average of 2.5 free neutrons, which allows the kind of geometric growth that Szilard conceived earlier.
If prediction markets had existed at the time, this is the moment when the “Will a self-sustaining nuclear chain reaction be achieved before 1945?” contract would have swung upwards. Szilard, of course, immediately saw what fission meant for a bomb. So did J. Robert Oppenheimer, who reportedly had a crude bomb design on his blackboard within a week of seeing the Hahn and Strassmann experiment.
In April 1939 (four months after the experiment was published), Nazi Germany banned uranium exports and began a research program for a bomb. In the Soviet Union, particle physicist Igor Kurchatov began arguing for a fission bomb development program that summer (six months after publication). Robert Serber of the Manhattan Project said “the possibilities were immediately obvious to any good physicist.”
Almost immediately, however, physicists ran into a practical problem. Naturally occurring uranium is almost entirely U-238, an isotope that doesn’t split when bombarded with neutrons. U-235 does split, but only makes up 0.7% of natural uranium. It seemed like the only way to move forward was to painstakingly sift out the U-235 from the U-238. This looked like an enormous roadblock at the time. Neils Bohr famously said that isotope separation on the scale needed for a bomb couldn’t be done without “turning the whole [United States] into a factory.”
He wasn’t wrong. About 600,000 people worked on the Manhattan Project at some point in its history, at a time when the population of the United States was only about 130 million. Physicists had to invent new methods for industrial-scale isotope separation, all of which were wildly expensive and plagued with failure and setbacks.
Eventually, they triumphed, but even with their success it would take years of fully-ramped production to get enough U-235 for a single bomb.Again, an unexpected discovery moved things forward. While U-238 doesn’t split when it absorbs a neutron, it turns out that it decays into the plutonium isotope Pu-239 — and Pu-239 does split under neutron bombardment. This was a huge stroke of luck. Instead of needing rare U-235, they could now use plentiful U-238 to make Pu-239. Even better, Pu-239 has a critical mass about half that of U-235; so not only is it much cheaper, but you need much less of it. Again, it would have been hard to foresee this in advance.
But it wasn’t all good news. When you try to make Pu-239 from U-238, you also end up with a lot of another plutonium isotope, Pu-240. Pu-240 fissions spontaneously, and releases neutrons in the process. If you have too much Pu-240 near your Pu-239, it’ll start a chain reaction prematurely. Fission bombs work by taking two subcritical masses and rapidly shoving them together, forming a critical mass where a chain reaction can rapidly occur. The simplest bomb design was “gun type” assembly, where one slug of subcritical mass is literally shot through a cannon at another subcritical mass.
Pu-240 spoils the gun-type assembly for Pu-239. The plutonium starts chain reacting before the masses can be fully brought together, causing a small explosion instead of a big one. It turns out that chemically separating Pu-240 from Pu-239 is even more difficult than separating U-235 from U-238. Again, this might have seemed like a dead end.
The proposed solution was a much more complex “implosion type” bomb. This kind of assembly works by using conventional chemical explosives to compress a subcritical mass into a critical one from the outside in. This is much harder to pull off than the gun-type assembly because it requires high-velocity explosives to work in near unison — otherwise the fissile material would be “squeezed” out before it could react.
No one had ever built anything approaching this level of precision explosives before, and many of the physicists on the Manhattan Project didn’t think it could work. As one of them put it, it was like crushing a beer can with explosives — without spilling the beer. But they did make it work: The Trinity test was an implosion-type assembly.
I’ll argue two points. The first is that it’s hard to predict technology timelines that stretch across unsolved scientific problems. In retrospect, the development of nuclear reactors and atomic bombs happened incredibly quickly: from Szilard’s idea in 1933 to the Trinity Test in 1945 and the EBR power plant in 1951. On the other hand, it’s easy to come up with examples of problems that proved much harder to solve than people initially thought. Attempts at heavier-than-air flight go back to at least the 1500s, and contemporaries were optimistic about their chances for success. Even before 1800, some commentators thought that human flight was an imminent risk to public safety, and called for all flying machines to be owned or regulated by the state. When you’re waiting on a scientific breakthrough, it’s hard to tell whether you’re 10 years away or 100 years away.
For AI, the upshot is that your confidence in artificial general intelligence (AGI) or artificial superintelligence (ASI) timelines should depend a lot on the number of major scientific discoveries you think stand between us and those achievements. If you think that we can get there by scaling current methods and solving a few “minor” problems, then you might have tight timelines. If you think that we need one or more major breakthroughs, then you have to be quite humble about your predictions.
The second point is that you can’t predict the winding path of technology development without actually trying to build something. Nobody thinking about chain reactions in 1933 foresaw uranium fission in ‘38. Nobody starting out on the Manhattan Project in 1942 foresaw the plutonium crisis of ‘44. And so on. Similarly, nobody was thinking about exploding gradients in 2000 (you can’t explode gradients if you don’t have deep neural networks). No one was thinking about transformers in 2015. And so on.
The hard problems, and ingenious solutions, that you encounter in practice are impossible to predict from afar — you have to build.
“John von Neumann as Seen by his Brother,” by Nicholas von Neumann (1987; published in journal 1989)
Von Neumann needs no introduction. This short biography written by his younger brother shares slices of their early lives together, and speculates on how those experiences shaped his brother’s approach to his work.
“Robert Oppenheimer: Letters and Recollections,” edited by Alice Kimball Smith (1995)
This collection of letters dates from Oppenheimer’s Harvard years in the early 1920s through the time he concluded his directorship of the Manhattan Project in 1945. The editor lived with her husband at Los Alamos during the Manhattan Project and was a friend of Oppenheimer’s and his wife, Kitty.
“The Making of the Atomic Bomb,” by Richard Rhodes (1985)
The canonical, Pulitzer-Prize-winning history of the Manhattan project and the development of particle physics leading up to it.
“Dark Sun,” by Richard Rhodes (1996)
Rhodes’ sequel to “The Making of the Atomic Bomb”, which covers both the hydrogen bomb program in the United States and the Soviet atomic program during the Cold War.
“Enrico Fermi: Physicist,” by Emilio Segrè (1970)
Fermi led the team that built the first self-sustaining nuclear chain reaction, Chicago Pile 1. This biography is written by Fermi’s lifelong friend and fellow Nobel Prize winner, Emilio Segrè.
“The Los Alamos Primer,” by Robert Serber (1943; declassified 1965; printed with commentary 1992 and updated with new introduction 2020)
Serber, a physicist on the Manhattan Project, gave a series of five lectures on the project’s underlying theory and goals that were then collected into the “Los Alamos Primer” pamphlet given to every member of the technical staff upon arrival.
“Adventures of a Mathematician” by Stanislaw Ulam (1983)
Ulam joined the Manhattan project in ‘43, where he worked on the hydrodynamics of the implosion bomb, and later designed the hydrogen bomb. Ulam was close friends with von Neumann, and there’s a trove of von Neumann anecdotes in Ulam’s autobiography. Ulam also gives a charming account of the pre-war mathematics community in Eastern Europe.
“The Recollections of Eugene P Wigner,” by Eugene Wigner, as told to Andrew Szanton (1992)
Along with Szilard, Einstein and Teller, Wigner was one of the physicists who convinced President Franklin D. Roosevelt to initiate the Manhattan Project. His memoir includes a look at the urgency and fear that animated the race to an atomic weapon during the war. Like Ulam and Segrè, he has interesting things to say about the abilities and personalities of the physicists at Los Alamos.
As AI is increasingly used in production scenarios, costs are mounting. Are alternative architectures the solution?
Cube is the standard for providing semantic consistency to LLMs, and we are investing in a new $25M financing after leading the seed round in 2020.
In this edition of “In the Lab,” Amit Aggarwal explains why he’s building an AI startup in BCV Labs after selling his company The Yes to Pinterest.