Open Group

AI Safety & Alignment

Research and discussion on making AI systems safe and aligned with human values

7 members9 articles

9 articles

The AI Revolution: The Road to Superintelligence waitbutwhy.com ↗

Tim Urban's deep dive on artificial intelligence that made the AI alignment problem legible to a general audience. Still one of the best explainers ever written.

“5 million people read this. Worth understanding what made it work — it's a masterclass in making hard ideas accessible without dumbing them down.”

siddharthexpert AI Safety & Alignment2/8/2026💬 1

31sign in
to vote

Things we learned about LLMs in 2023 simonwillison.net ↗

Simon Willison's exhaustive year-in-review of every major LLM development in 2023. Dense with links and actual signal amid massive hype.

“Best factual record of what actually happened vs. what the press said happened. Simon is relentlessly empirical — no hype, just what he tested and observed.”

priya AI Safety & Alignment2/15/2026💬 2

25sign in
to vote

Software 2.0 karpathy.medium.com ↗

Most AI breakthroughs are thought of as isolated advances, but Karpathy argues they signal a broader shift: code is being replaced by neural networks trained on data.

“This reframe changed how I think about software. Writing code is increasingly the wrong level of abstraction — you train the behavior instead.”

marcoexpert AI Safety & Alignment2/1/2026💬 4

22sign in
to vote

You Are Not a Parrot every.to ↗

Emily Bender and colleagues on stochastic parrots — why LLMs aren't 'understanding' anything and why that gap matters deeply.

“A necessary counterweight to anthropomorphizing LLMs. The arguments in here are going to matter more and more as these systems become infrastructure.”

helia AI Safety & Alignment2/22/2026💬 0

21sign in
to vote

Constitutional AI: Harmlessness from AI Feedback anthropic.com ↗

Anthropic research on training helpful and harmless AI

“One of the most practical approaches to alignment that is actually deployed in production”

helia AI Safety & Alignment2/28/2026💬 0

19sign in
to vote

Scaling Monosemanticity: Extracting Interpretable Features from Claude transformer-circuits.pub ↗

Anthropic identifies millions of interpretable features inside Claude

“If we can understand what is happening inside these models, we can actually verify alignment claims”

priya AI Safety & Alignment3/3/2026💬 0

16sign in
to vote

AGI Ruin: A List of Lethalities lesswrong.com ↗

Eliezer Yudkowsky on why AGI alignment is extremely difficult

“Agree or not, this is the strongest case for why alignment is harder than most think”

siddharthexpert AI Safety & Alignment3/1/2026💬 0

3sign in
to vote

AI models are getting smarter at science — but are they doing science?nature.com ↗

Nature's editorial on the growing gap between AI performance on scientific benchmarks and actual scientific understanding — a careful examination of what we're actually measuring.

“Precise and careful — exactly what you want from Nature's editorial desk. The distinction between benchmark performance and genuine capability keeps getting blurred.”

siddharthexpert AI Safety & Alignment2/23/2026💬 0

1sign in
to vote

The L in "LLM" Stands for Lying acko.net ↗

Questioning the frame of inevitability in use of AI

“To stop the machines from lying, they have to cite their sources properly. And spoiler, so do the AI companies.”

malte AI Safety & Alignment3/5/2026💬 0