This essay is the text for a talk on how to wisely navigate risks from transformative technology, especially artificial superintelligence (ASI). It was given at Astera on January 28, 2026.
In 1954 the United States carried out its first full-scale test of a thermonuclear bomb, on Bikini Atoll in the Western Pacific. Known as Castle Bravo, the bomb's designers expected a 6 megaton blast. They were shocked when it yielded 15 megaton, an excess of 9 megatons, about 600 times the Hiroshima blast. The unexpected radiation fallout caused many deaths – the exact number is disputed – and serious radiation exposure to more than a thousand people, triggering a major international incident.
What went wrong? The bomb contained both the lithium-6 and lithium-7 isotopes of lithium. The bomb's designers believed only the lithium-6 would contribute to the yield, while the lithium-7 would be inert. But the lithium-7 converted to tritium far faster than expected, and that turned out to almost triple the yield. It's a case where a group of outstanding scientists failed to anticipate a deadly possibility latent in nature.
There's a curious coda to this story. A few years earlier, in 1946, physicists had seriously investigated whether a thermonuclear explosion would ignite the nitrogen in the atmosphere, causing it to catch fire and end almost all life on Earth. They concluded it would not. I'm glad it was the lithium-7 reaction they got wrong, and the nitrogen calculation they got right.
I knew one of the creators of Castle Bravo slightly. In 1997 and 1998 I worked on quantum computing in the theoretical astrophysics group at Los Alamos National Laboratory. One of the other people in the group was Stirling Colgate, who decades earlier had run the 3,000 person diagnostic team for Castle Bravo. In the 90s, Stirling would sometimes join us for lunch.
I liked Stirling. He was one of the most imaginative and daring people I've ever met. Among his many adventures, he'd taught himself to fly a plane so he could chase tornadoes, firing rockets from the plane, hoping to get diagnostic equipment inside a tornado, so he could study how they worked. According to lab legend he was one of the inspirations behind the movie Twister, but they'd had to tone his personality down for the movie. I never did ask him if that was true. He was the kind of person who, if he was 30 today, I'd expect to be running a successful company in Silicon Valley, or pursuing wildly ambitious technology research projects.
Which future?
Astera's motto is "The future, faster." A good question to ask is: which future? Obviously, we don't want it to be the bleak future imagined by some environmental activists, or those fearful of a large-scale nuclear exchange. No, we want it to be a good future, hopefully a future wildly better than today.
Unfortunately, good intentions don't ensure good outcomes. The inventors of asbestos, DDT, leaded gasoline, and CFCs all intended to help humanity. Naysayers got little traction at the time: the benefits seemed too large, the harms too easy to contest. Humanity took a laissez-faire approach and millions suffered. It's the Castle Bravo problem: dangerous capabilities, latent in nature, which we didn't sufficiently understand until too late.
One common framing of this issue is: how can we balance the risks and opportunities of science and technology? It's a fine question as far as it goes, and arises often in policy and engineering circles. But it's also often a platitude, something to say to sound wise, then do whatever you were going to do anyway.
The framing only makes sense informed by deeper analysis. What are our best models of risk and opportunity? What powerful ideas underlie the institutions we use to shape science and technology? Can we improve those ideas and institutions? These questions are becoming especially urgent as humanity develops artificial superintelligence (ASI). If we're about to instigate an explosion of posthuman intelligences, how can we ensure it goes well?
These questions are too large to answer comprehensively today. But what we can do is examine some models and historical examples, and use them to develop a conceptual armory which improves our understanding of these questions.
AI for virus design
The original talk included a section prior to this one, discussing specific viral pandemic agents. The material covered is all public knowledge, and it makes the talk more concrete, but on balance I see no good reason to collect and present such details in public. (There is a damned if you do, damned if you don't character to discussing risks: critics can dismiss the discussion as too-vague if you omit detail, or irresponsible if you include them. Regardless of critics, omission is the better call here.) The point of including it in the talk was as a second example, in the vein of Castle Bravo, of humans accidentally stumbling on unanticipated destructive capabilities latent in nature. It also illustrates a kind of scientific dysergy, where moderately concerning individual discoveries can be combined into something much worse – a pattern that could also be illustrated with the history of nuclear weapons. Most of all, it again raises the question: what other unknown destructive capabilities are latent in nature?
We'll table that question, and turn to discussing the more general (and much less specific) problem of tools to intentionally engineer viruses. That sounds horrid in the context of pandemic agents, but if we take a step back, there are many benefits to such tools. Those benefits have driven much work over the past few decades, and that work is now yielding fruit, with new viral delivery mechanisms for gene therapies – like Zolgensma, for spinal muscular atrophy – as well as therapies like T-VEC for melanoma, and progress on phage therapies for antibiotic-resistant bacteria. These are modifications targeting single genes, though there are also promising results from techniques such as directed evolution and rational design.
What's the long-term aim here? The examples I just gave are not de novo design, so much as minor alterations of existing viruses. But many groups are now aiming to design viruses, proteins, and other biological entities from much nearer to scratch. The vision is to develop predictive models good enough to enable rapid exploration of the design space, often so-called AI "biological foundation models".
The best existing example is protein design – AlphaFold 2 and subsequent models can predict protein structure well enough they're becoming useful for design, despite significant limits that are still being overcome. Efforts like Astera's Diffuse project may provide data that helps improve the design tools further.
Progress on viral design lags proteins, but there's a lot of effort and strong market incentives to improve. And as with proteins, there is a mad scientist ideal in which you say what properties you want in a virus, and if it's possible you'll be given a design and synthesis instructions.
I'm not a biologist, but I've noticed a striking divergence among experts: some tell me they are skeptical of this vision, perhaps because the biological foundation models have been so hyped; others seem strongly bought into the vision, bring on the artificial cell. Personally, I think the fundamental scientific and technological interest is so great, and progress so rapid, that we must take seriously the goal of predicting and designing biology, even if artificial cells from scratch certainly aren't coming tomorrow.
So let's suppose future AI-based models do enable de novo virus design. This will have many benefits, but the possibility of designing pandemic agents (and similar threats) will necessarily give rise to a field of AI biosafety engineering, similar to the safety engineering preventing misuse of large language models. What will safety in such AI models look like?
In the early foundation models, the safety engineering is primitive and likely easily circumvented. Let's use as an example Arc Institute's well-known Evo 2 model. This model was trained on sequence data from roughly 128,000 genomes. Their main safety measure was to exclude from the training data all viruses that infect eukaryotic hosts. This worked well, in the sense that the trained model regards actual viruses infecting humans as biologically implausible. The model also performs extremely poorly when prompted to generate human-infecting viruses.
While this is superficially encouraging, it seems nearly certain you can quickly and easily remove these guardrails, by finetuning the model with human-infecting viruses. They imply as much in the paper, saying: "Task-specific post-training may circumvent this risk mitigation measure and should be approached with caution." Of course, there's a strong economic incentive to do such post-training: for applications to gene therapy and the like you need human-infecting, immune-evading viruses. And it is possible to do such finetuning, since the model and training code were all released openly. This isn't a serious approach to safety.
Indeed, similar guardrails often have been easily removed in language models. For instance, a group led by synthetic biologist Kevin Esvelt found it cost only a few hundred dollars to finetune an existing open source language model so that it would be far more helpful in generating pandemic agents, concluding:
Our results suggest that releasing the weights of future, more capable foundation models, no matter how robustly safeguarded, will trigger the proliferation of capabilities sufficient to acquire pandemic agents and other biological weapons.
They were talking of language models, but it seems almost certainly true of the biological foundation models as well.
Suppose we succeed in building powerful foundation models to enable biological design. Even deeper than the technical safety problem is the social problem: no matter what safety measures are possible in principle, many organizations will build less- or un-guardrailed versions of the models anyway. That'll be true of military organizations such as DARPA; many companies will also have seemingly compelling cases. Guardrails are inherently a slippery slope: easily removed using finetuning, and depending in any case on subjective social consensus. By contrast, reality is an objective, stable target for investigation.
The real underlying issue is that such models aim to capture an understanding of how biology works. The deeper that understanding, the better the models will function. But the understanding is fundamentally value-free: there's nothing intrinsically "good" or "bad" about understanding, let us say, what makes a protease cleavage site efficient. That's just part of understanding biology well. "Good" and "bad" are downstream of such understanding. And we only get benefits by learning how to control things like immune evasion, the rate of lethal side effects, and the rate of viral spread. When you learn to control a system so as to improve outcomes, you can very often apply the same control to make things worse. Benefits and threats are intrinsically linked.
That is: a deep enough understanding of reality is intrinsically dual use.
This isn't just true in biology. We've seen a similar pattern play out repeatedly through history, across sciences. A personally resonant example is the development of quantum mechanics in the twentieth century. This helped lead to many wonderful things, including much of modern molecular biology, materials science, and semiconductors. But it also underpinned nuclear weapons. It's hard to see how you can get the benefits without the downsides. Should we have surrendered quantum mechanics and its benefits in order to avoid nuclear weapons? Some may argue the answer is "yes", but it's a far from universal position. Again we see the pattern: sufficiently deep understanding of reality grants tremendous power for both good and ill.
Instead of fragile safety engineering, a different response is to say: "look, it's near-inevitable we will soon build tools to uncover many deadly pandemic agents. Let's also use our improved understanding to defend the world."
There's a lot of work being done toward that end. I will mention just two ideas, both rather speculative. One intriguing idea is from my friend Hannu Rajaniemi, CEO of Red Queen Bio, who has suggested immune-computer interfaces. The idea is that people will wear devices which do real-time detection of environmental threats, and then develop and deploy countermeasures, also in real time. It'd be just-in-time immune system modulation, based on surveillance and response.
Another possibility is to secure the built environment. A lot of people are putting serious work into that, but I will just mention one amusing and perhaps slightly tongue-in-cheek observation, I believe first made by Carl Shulman: the cost of BSL-3 lab space is within a small multiple of San Francisco real estate (both currently in the very rough range of $1k / square foot). Obviously I don't mean we should all live in BSL-3 labs! But it does suggest that if biological disasters become common or severe enough, we may have both the incentive and the capacity to secure the entire built environment.
I mention this in part because it is very similar to the strategy humanity uses to deal with fire. As of 2014, the US spent more than $300 billion annually on fire safety. That means investment in new materials, in meeting the fire code, in surveillance to detect and respond to threats, and many other measures. The fire code is expensive and disliked by many people, but it provides a form of collective safety, somewhat similar to the childhood vaccine schedule. We don't address the challenge of fire by putting guardrails on matches, making them "safe" or "aligned". Instead, we align the entire external world through materials and surveillance and institutions.
Existential risk
Let's move from specific examples to broader patterns. The underlying issue is that as humanity understands the world more deeply, that understanding enables more powerful technologies, for both good and ill. You can express that via the following heuristic graph:
The positive curve represents life-enabling capabilities, the negative curve destroys. Today the destructive potential is contained, and the benefits of the upside have (for the most part) been worth it. Still, as time goes on, deeper understanding leads to unanticipated threats. Things like CFCs, DDT, and even anthropogenic climate change are all relatively low on the curve – serious, but not civilization-threatening. Higher up the curve, you have more dangerous technologies, like nuclear weapons, or speculative-but-plausible possibilities, like deadly engineered pandemics. Still higher are tools that would make it easy to systematically discover and create such viruses, and likely many as-yet-undiscovered things.
The worst case to date is the nuclear buildup. Despite post Cold War declines, we still have enough warheads to destroy every city in the world with more than 100,000 people. On at least two separate occasions a single individual chose to stop a full-scale nuclear exchange. Say a word of thanks tonight for Stanislav Petrov and Vasili Arkhipov. Ted Taylor, the leading American designer of nuclear weapons, said there is "a way to make a bomb… so simple that I just don't want to describe it". Those comments, published in a book by John McPhee, stimulated at least two people to develop plausible designs for DIY nuclear weapons. The bottleneck is fissile material, which remains somewhat well controlled by the cartel of nuclear countries. In 1950, Leo Szilard proposed cobalt-salted thermonuclear bombs; in his projection, a small number of such bombs could make the entire world uninhabitable. They've never been built, as far as we know, and I doubt would be as destructive as Szilard expected, but it's not something any of us should want tested. It's impressive that it's been 81 years since Hiroshima and Nagasaki, and the world has not used nuclear weapons in war again. Will we go another 81 years that way? What about a thousand more years? While people sometimes consider the nuclear threat over, they're confusing a lacuna for a cessation. It's really an ongoing threat which we've only partially addressed.
I'm not trying to tell ghost stories here! Rather, I'm trying to understand the situation we're in. People sometimes regard that as pessimism, but any "optimism" which refuses to acknowledge genuine threats is a foolish optimism, especially when those threats are systematic products of the institutions we use to understand and control the world. Wise optimism means truly understanding the situation we're in, and developing institutions and technologies to respond. Indeed: many people at Astera are working hard on exactly that. But I wonder: what other destructive capabilities are still latent in nature? Will we eventually discover some technology sufficiently powerful that it threatens civilization itself?
ASI and existential risk
The 800-trillion pound gorilla in the room is AGI and, especially, ASI. Many people believe ASI will radically accelerate science and technology. Anthropic's CEO Dario Amodei has captured part of this idea with his aspirational phrase "a country of geniuses in a data center". If such an acceleration occurs, the curve may look more like:
We are seeing very early signs such an acceleration may happen, and it's certainly a goal of the frontier labs. Still, I won't argue here about how and whether such an acceleration of science and technology will happen. That's a separate talk. The key point is that if it happens it will also likely mean a radical acceleration of both beneficial technologies and of threats, including the discovery of many currently unsuspected threats, latent in nature. Unfortunately, the ability to defend against threats doesn't inherently increase at the same pace as their discovery. New threats are often defended by society-wide co-ordinated responses that move at the speed of institutional change, not technological change. It's much easier to start a fire than to defend against it. If ASI is coming, we need to find ways for ASI to help greatly increase what we might call "the supply of safety", preventing the discovery or deployment of civilization-ending technologies. It matters whether we're in the world shown above, or in a world where those technologies are modulated by increased safety:
These images are heuristic intuition pumps, not rigorous models, and very different stories are possible. But they help suggest ways of thinking about the progress of technology. Absent ASI, I have considerable faith in human ingenuity to rise to the occasion, responding to problems as they arise. But we should seriously consider how rapidly we can absorb novel ability to control the world.
An extreme version of these questions is whether there exist simple, inexpensive, easy-to-make "recipes for ruin", that is, very simple technologies that would end humanity. This possibility was dubbed the Vulnerable World Hypothesis by Nick Bostrom. In this framing, the problem is that reality itself is unsafe. That is, the structure of the cosmos is such that there are very powerful, concentrated, hard-to-defend but easy-to-discover-and-make technologies.
Finding such recipes is what concerns me most about ASI. I find it both distressing and very likely. However, other people have very different intuitions. That difference in intuition depends heavily on people's prior expertise – people with certain types of background find the Vulnerable World Hypothesis very plausible, while people without those backgrounds do not. Such intuitions are rarely changed by a few brief examples. But something I hope gives skeptics pause is that the threats I've described today, and more I didn't have time to cover, were the threats discovered by a very slow-moving species. The promise of ASI is to discover unknown threats ten or a hundred or more times faster. Let's hope none is much worse than what we already know.
Now, calling it a hypothesis makes it seem as though the question is: is the Vulnerable World Hypothesis true or false? This is not the best framing. A better question is: how vulnerable is the world? It's a spectrum, and one we move through as technology and our institutions change. In 1900 enormous mayhem could certainly be caused by the Great Powers, but humanity was not at extinction risk; by 1960, the Great Powers had the ability to plausibly threaten humanity's existence, albeit maintaining that threat required a sizeable fraction of the world's total resources; in 2026 we have reached a point where bioengineering makes world-changing destruction plausibly available to smaller actors at much lower cost. Where will we be in 2030? 2040? Will we discover cheap, easy-to-implement recipes for ruin? How can we develop ideas and institutions to make the world less vulnerable, not more?
I have re-framed this as the vulnerable world problem. In this framing we've been treating the problem as relative to the set of technologies available at any given time. How that set changes over time depends strongly on the ideas and institutions we have modulating technological development. In this sense, the framing "how unsafe is reality" is imprecise. It's really: can we develop ideas and institutions to guide exploration of the technology tree in a way that prevents civilization-scale catastrophe?
People sometimes take a techno-determinist view, that exploration of the technology tree has a near inevitable quality, sometimes even down to timing. But even if that were true low in the technology tree, exponential explosion of the design space means it's almost certainly not true higher up. Almost all possible technologies will never be invented, no matter how long and aggressively we explore. So it genuinely matters what ideas and institutions modulate how we explore the technology tree. And how vulnerable the world is depends upon those ideas and institutions.
With these caveats, the approximate framing "how unsafe is reality?" is still useful. It points at the problem of very simple and easy-to-invent technologies that concentrate enormous destructive power. The more such easy-and-powerful technologies exist, the more challenging it is to explore the technology tree without self-extinguishing, and the more work must be done developing adequate ideas and institutions.
Loss-of-control, technical alignment, and external alignment
I've presented ASI existential risk (xrisk) as arising from ASI accelerating hard-to-defend technologies. Most discussion of ASI xrisk has focused on a specific version of this threat, the loss-of-control to ASI argument: roughly, ASI goes rogue, becomes extremely powerful, and destroys us, not out of malice, but because it's convenient and serves its goals. Not dissimilar to how human beings have used superior intelligence to wipe out many species not out of malice, but just because it was convenient for some humans. There's intense disagreement between well-informed people on how plausible ASI xrisk is, but many leading experts find it plausible. I do too. In this account, ASI itself is a kind of recipe for ruin, and loss-of-control is a special case of the broader argument.
Amongst people who take the loss-of-control threat seriously, a common response has been to work on technical alignment. This means working directly on the AI systems, developing techniques to make them more controllable, less likely to behave in rogue ways, so they do what the user intends, without undesirable side effects. It also means preventing the systems from doing things our culture has deemed unsafe, even when a user asks.
All the frontier labs put a lot of effort into technical alignment. This is in part a response to the arguments about rogue ASI, a way of keeping ASI under human control, or at least helping steer it. But much of this work – I believe nearly all of it – also serves their business goals: technical alignment techniques like RLHF and Constitutional AI ensure the systems do what customers want, and make them much more media- and government-friendly.
In this sense, (much) technical alignment is a kind of "market-supplied safety", aligned with corporate goals, and helping accelerate AI. In the short term this will bring enormous benefits, and be well rewarded by the market. But as a side-effect it will also accelerate the discovery of dangerous, hard-to-anticipate and hard-to-defend capabilities. Those may be briefly delayed using technical alignment, but as we saw with the biological models, guardrails are easily removed, and there is a slippery slope to doing so. The underlying issue is that such problems don't reside in the systems; they reside in any deep understanding of the structure of the world. And so the question is, again, whether reality itself is safe. Trying to make a powerful aligned AI system is like trying to make matches "safe" for fire. You may briefly "succeed" very narrowly with a single guardrailed system, but the situation is intrinsically unstable. The idea of powerful and stably aligned AI systems is an oxymoron.
When you discuss this with technical alignment people, the usual response I hear is: "oh yes, we also need to work on governance and policy". But the structure of reality is not legislated. Governance and policy is only a small part of the external alignment work that is required. And external alignment – that is, making reality outside the system safe – is historically far more expensive, far slower, and far less incentivized by the market.
Unsurprisingly, this argument does not seem to have much traction at frontier AI companies. The market is training a rapidly-increasing number of people who work on technically aligning systems, and who often believe that loss-of-control is the core threat. It is important, but really a special case of too much concentrated destructive power. And too much focus on loss-of-control concentrates power in other ways. It's a strange feedback loop: train more such technical alignment people, which accelerates the rate of adoption of the systems, which causes many more such people to be trained, which reinforces this mistaken focus and the collective concentration of power, at the expense of what I believe is the true primary threat.
Market-supplied safety
Let's get back to the broader problem: what ideas and institutions are needed to increase the supply of safety, so we end up in this world?
To answer that, it's helpful to understand a little about the history of safety. A detailed history would require a full course, but we can learn much from a few examples – when safety mechanisms have worked well in the past, and when they've struggled.
Much safety is supplied by the market in advance: companies have strong incentives to ensure your toaster doesn't electrocute you, your airplane doesn't crash, and so on. In this sense, market-supplied safety is a fundamental part of progress. We see this in the technical alignment work I just discussed. Indeed, when such work isn't done, it's often harshly punished by the market. In 2016, Microsoft released an AI chatbot named Tay on Twitter. It was a spectacular public failure, rapidly learning from users to use racial slurs, deny the holocaust, and so on:
This failure helped incentivize OpenAI and Anthropic to develop safety techniques such as RLHF and Constitutional AI to suppress such behavior. They are market-supplied safety, meaning safety techniques whose development and provision is aligned with short- to medium-term corporate incentives.
This same pattern occurs across industries. For an older example, consider the De Havilland Comet, the world's first commercial jet. In the first year after its introduction in 1952 the Comet suffered three fatal mid-air disintegrations. The issue was that it used square windows,
and each time the plane flew, tiny cracks formed at the corners, as the cabin expanded and contracted during flight. For a few flights this was okay, but eventually the cracks widened, and the fuselage catastrophically failed. These crashes were tragedies, but also brutally effective feedback, and the industry rapidly changed to rounded windows, which distribute stress evenly, and the crashes stopped.
Today, the aviation industry has internalized the costs of accidents well. Airline fatalities have dropped by a factor of roughly 50 per passenger over the last 50 years:
People sometimes point to events such as the Boeing 737 MAX scandal as evidence aviation safety is struggling, but the billion dollar losses really illustrate safety working well, not badly. I once asked the deputy head of safety at a major airline what single change he'd make if he had carte blanche. He immediately said it'd be very slightly safer to make passenger seats rear-facing, but airlines won't do it because passengers hate it. To me, that illustrates just how strong the market's appetite for safety is in the aviation industry.
How to increase the supply of safety?
Market-supplied safety works well when costs are borne immediately and legibly by consumers, as in aviation safety and much technical AI safety. But market-supplied safety tends to struggle when: costs are illegible because of long timelines (e.g., asbestos, cigarettes, sugar); or are borne by third parties or collectively (e.g., polluting waterways, fire, air pollution, CO2 emissions). In such cases, we must either develop mechanisms to provide non-market safety, or the problem will persist.
Unfortunately, many of the downsides of ASI will be illegible for a long time, hidden in the models; harm will be a side effect of collective progress, not attributable to any single actor; and will arise in dual-use ways, creating mixed incentives. In other words, while market-based approaches will provide some safety, they will struggle in other areas. Think about the AI-for-therapies material discussed earlier: we are starting to see AI companies take the credit for advances (as in DeepMind spinoff Isomorphic Labs, which hopes to use biological foundation models to design new drugs); it's doubtful they will take responsibility for potential dual-uses like prion design, especially when done using third-party models that build on the know-how they developed, but which they don't directly control.
What I'd love to do here is outline a grand program convincingly addressing this. Unfortunately, we're nowhere near having such a program, despite many imaginative people working on pieces of the problem. Indeed, I often meet people saying ideas for addressing ASI safety are "unrealistic" or "too slow" to make a difference. But if AI causes major disruptions and disasters in the next decade or two, that will radically change what is realistic. It's important to seed imaginative approaches now, so they're ready as windows of opportunity open. When you don't yet see a path to a full solution, you fall back on imagination and improving the ideas you have. I don't think there's any choice: we're all going to spend the rest of our lives wrestling with this.
It'd need a full class to survey the ideas people are exploring, so I'll restrict myself to just one brief observation today: across areas where we've made progress securing reality, surveillance often plays an important role. It's fundamental to effective fire safety, biosafety, nuclear safety, aviation safety, and in many other areas. Seeing problems is often at the root of solving them. But surveillance also creates a problem: the surveillors gain power over the surveilled. Healthy surveillance systems balance the needs and rights of multiple parties in ways enforced (ideally) technically or by the laws of nature. That is: you want humane values by design, not relying on trusted governing authorities – that's a single point of failure, and a recipe for authoritarianism.
Is this possible? Not with current understanding. But there's many promising ideas to develop surveillance techniques which are provably beneficial – things like the use of homomorphic encryption in DNA synthesis screening, physical zero knowledge proofs in nuclear inspections, and many more. Many striking ideas from the cryptocurrency community are in this vein, attempting to achieve political goals through design. An ongoing project for me is to understand what (if any) design principles enable surveillance that enables safety, while balancing all parties' need for flourishing; and what leads it to fail, leading to authoritarianism? While this sounds horrid, we are de facto sliding into a surveilled world in response to crises, often with regimes designed by law enforcement or other powerful entities. There is good activist work in response, defending humane values, but also more need for underlying principles which enshrine those values by design, cutting out authoritarians. Can we secure reality, while preserving humane values?
The real alignment problem
Let's return to alignment. I've said the idea of an aligned AI system is too unstable to make sense – it's too much like making matches safe for fire. A better way of posing the alignment problem is: how can a civilization systematically supply safety sufficient to flourish and defend itself from threats, while preserving humane values? This version of the alignment problem includes technical alignment – you still don't want AI systems going rogue – but it's now subsidiary to an overriding goal.
This perhaps seems like abstract motherhood and apple pie, but this formulation has real consequences. For a good future, civilization must solve the alignment problem over and over again. It's not a problem which can be solved just once. Indeed, even if ASIs were to wipe out humanity, any posthuman successors would also face the alignment problem with respect to their successors, worrying about some powerful new posthuman kid on the block going rogue. For any society to flourish, whether it be human, posthuman, or some mix, it must supply adequate safety in an ongoing fashion. It won't be a single Singularity from the inside.
This argument applies not just to ASI, but also to BCIs and uploads and other approaches to human uplift – an approach some at Astera are keen on. Like technical alignment, uplift may be viewed as an approach to dealing with an ASI takeover. But whereas technical alignment people want to tame AI, uplift aims to increase human power in order to compete or merge with ASI. An issue common to both is that both BCI and much technical AI alignment speed up the runaway technology race, potentially exacerbating the fundamental issue: concentration of hard-to-defend power. I think uplift is interesting because it changes the conditions of the alignment problem, but it's not clear to me whether it makes the underlying problem better or worse.
Conclusion
I remain an optimist! There's too often hubris in pessimism, in assuming that just because we don't see a solution to a problem now, no solution is possible. Better to be an optimist who is trying to understand and respond to the predicament we are in.
Let me conclude by describing how I came to be interested in this. I wrote my first neural net as a teenager, trying and mostly failing to understand the hot "new" algorithm, backpropagation. I returned often to AI over the years, and between 2011 and 2015 concentrated on it, writing a book about neural networks, which Chris Olah and Greg Brockman credit with helping get them involved in the field. I participated in OpenAI's 2015 founding meeting, but decided to stop working on neural nets, and declined any further involvement. Still, I got to see a lot of exciting work going on – for instance, in 2018 I witnessed Alec Radford invent GPT-2, while sharing a house with Alec and John Schulman. While this and other work going on in AI at the time was exciting, I couldn't shake a feeling that the long-run effects might well be terrible. I occasionally participated in some small AI-related projects, but found I couldn't work effectively on something I have such mixed feelings about.
I'm aware many people will consider this a major misjudgement. I certainly have doubts. I continue to feel the temptation of working toward ASI: it would be fun and lucrative; and maybe it's the only way to help create a good future. I wonder especially about that last point. While I have little confidence in the path being taken by OpenAI or Anthropic, I wonder what a healthy approach to ASI looks like. I hope Astera can contribute in a healthy way.
Two of my heroes are the physicists Joseph Rotblat and John Wheeler. Both worked on nuclear weapons, but they took very different paths. Rotblat was the only physicist who left the Manhattan Project after it became clear the Nazis were no longer pursuing the bomb. He was a young man, alone in the United States, his wife imprisoned by the Nazis; he would later learn she'd been murdered. Leaving Los Alamos led the bomb project's Head of Security to make a serious (albeit false) accusation that Rotblat was a spy. And it meant turning his back on the most venerated members of his professional field.
At the time, it must have seemed like throwing away his professional life for no real gain. He knew it wouldn't change whether the Allies developed or used the bomb. But I wonder how the decision changed him personally, and his subsequent growth? Years later, after Castle Bravo, the United States initially denied it had caused any harm to the Marshall Islanders. Rotblat published a paper proving them wrong to the world. Even more consequential: later in the 1950s he cofounded and then led the Pugwash conferences on nuclear safety. Those conferences helped enable the nuclear treaties that are likely why we in this room are alive. They are some of humanity's grandest successes at external alignment. Arguably Rotblat, not Oppenheimer nor Groves, was the hero of the Manhattan Project.
That said: while I believe Rotblat was right, I don't have complete certainty. Another of my heroes is the physicist John Wheeler, one of the most imaginative scientists of the twentieth century. Wheeler's brother Joe was killed in a foxhole outside Florence in 1944. When asked later about whether the bomb project should have ended with the Nazi threat, Wheeler said his regret was that they didn't make the atomic bomb faster, since it would have saved his brother's life. This perhaps also motivated Wheeler's later work on the Hydrogen bomb.
Both Rotblat and Wheeler are valuable models for scientists and technologists. Both thought imaginatively and courageously; neither relied on social consensus about "success" as the arbiter of what they should do. In my opinion, the progress of civilization is ultimately grounded in people who believe in their own moral imagination, while retaining humility. We're going to need many such people to help develop ideas and institutions to navigate any possible posthuman transition and, I hope, solve the alignment problem in an ongoing way.
Acknowledgements
Thanks to Steve Byrnes, Edwin Kite, Eric Michaud, and Hannu Rajaniemi for comments.
Citation information
For attribution in academic contexts, please cite this work as: Michael Nielsen, "Which future?", https://michaelnotebook.com/whichfuture/index.html (2026).