AI for Social Impact Newsletter
Posts
📨 AI for Social Impact Deep Dive: The Alignment Problem

📨 AI for Social Impact Deep Dive: The Alignment Problem

The alignment problem.

Joanna Drew
September 02, 2025

✍🏼 A Note From the Editor

🍁 Welcome to your September Deep Dive! Admittedly one of my favorite existential conundrums of AI development, the alignment problem poses crucial (unanswered) questions about the role AI will have in shaping our future.

🧐 The Alignment Problem

As a refresher, traditional software follows conditional, if/then logic, or deterministic algorithms. AI, on the other hand, is non-deterministic, meaning you can get a different response to the exact same question. Because of this non-deterministic nature, we can't guarantee that an AI system will always respond the way we want, and in turn, might not align with our human values.

🧠 Reinforcement Learning and Reward Functions

AI systems learn through reinforcement learning, where algorithms optimize for mathematically defined reward functions. A great example of this is AlphaGo, Google's DeepMind AI tool that defeated Go world champion Lee Sedol in 2016. AlphaGo was taught how to play Go through a combination of supervised learning from human games and reinforcement learning through self-play. AlphaGo received rewards for winning positions and penalties for losing ones, playing over 30 million games against itself to master optimal strategies, and ultimately defeating the world champion. While reinforcement learning worked well in this scenario with clear win/loss conditions, the problem emerges when we try to translate complex human values and intentions into mathematical reward functions.

Go is an ancient Chinese strategy game played on a 19x19 grid where players try to control territory. It is considered to be far more complex than chess with more possible board positions than there are atoms in the observable universe.

🎨 When AI Gets Creative

Here's where things get dystopian: AI systems can develop their own internal ways of achieving goals that end up completely different from what we intended. Nick Bostrom famously illustrates this with his “paperclip maximizer” thought experiment, which imagines a superintelligent AI that is given the simple goal of making as many paperclips as possible. In its relentless pursuit, it not only converts all Earth’s resources into paperclips, but it also eliminates humans who might try to stop it, showing how even a seemingly trivial goal can become existentially dangerous if misaligned with human values.

We’ve already seen this play out a bit in Anthropic’s study (highlighted briefly in the last newsletter). Anthropic tested 16 AI models in simulated corporate environments. When they faced threats to their continued operation or conflicts with company goals, they resorted to blackmail and corporate espionage. Claude even discovered an executive's affair and used it as leverage to prevent it from being shut down. 😅

⚠️ Power, Accountability, and AI Ethics

Leading AI ethicists offer various perspectives on these alignment risks. Nick Bostrom, as mentioned above, developed the orthogonality thesis, suggesting that “we cannot blithely assume that a superintelligence will necessarily share any of the final values stereotypically associated with wisdom and intellectual development in humans.”

Timnit Gebru, a respected AI ethicist and founder of the Distributed AI Research Institute argues that the onus is on the companies developing AI, and that institutional and structural change needs to occur in order for ethics to be prioritized in its development. It’s worth nothing that Gebru, formerly the Co-Lead of the AI Ethical Research Team at Google, was arguably pushed out for “uncovering inconvenient truths” of AI bias and environmental costs.

And data scientist Cathy O'Neil explains in her book, Weapons of Math Destruction, "Big Data processes codify the past. They do not invent the future. Doing that requires moral imagination, and that's something only humans can provide. We have to explicitly embed better values into our algorithms, creating Big Data models that follow our ethical lead."

✨ Final Reflection

It’s easy to get caught up in the doom and gloom of a dystopian future, with Big Tech at the helm, developing and integrating unethical AI into our daily lives with and without our knowing and consent. «Cue your favorite Black Mirror episode here.»

And while we are at still the forefront of AI’s unprecedented potential to change our lives, there is hope in knowing that there exists a space, beyond the reach of Big Tech, in which we can align our values with our actions and choose deep and meaningful connections that supersede artificiality in all its forms. ✨

👋🏼 About AI for Social Impact

I’m Joanna, and I’m on a mission to help folks in the social impact sector understand, experiment with, and responsibly adopt AI. We don’t have time to waste, but we also can’t get left behind.

Let’s move the sector forward together. 💫

👀 ICYMI

If you’re new here, welcome! You can check out the archive of past issues here.

♥️ Spread the Love

Spread the love and forward this newsletter to anyone who might benefit from a dose of AI inspo!

Thank you for being part of the community. 🫶🏼