AI Writes Its Own Science: How Sakana’s AI Scientist-v2 Became the First Machine to Pass Peer Review

For centuries, science has been humanity’s most rigorous endeavor — hypothesize, experiment, analyze, publish, and withstand peer scrutiny. That last step, peer review, has always required human judgment. Until now.

Researchers at Sakana AI, in collaboration with teams from the University of British Columbia and the University of Oxford, have unveiled the AI Scientist-v2, an end-to-end agentic system that can autonomously conduct original scientific research — from hypothesis generation to manuscript submission — without any human intervention in the research process itself. More remarkably, one of its papers has passed peer review at an ICLR workshop, marking the first time a fully AI-generated manuscript has ever cleared this scientific gatekeeping hurdle.

The achievement, documented in a preprint published on arXiv in April 2026, is sending shockwaves through the machine learning community. It raises profound questions about the future of academic publishing, the definition of authorship, and how much of the scientific enterprise can — or should — be delegated to machines.

What Is the AI Scientist-v2?

The AI Scientist-v2 is Sakana AI’s second-generation fully automated research system, and it represents a significant leap beyond its predecessor. While the original AI Scientist (v1) required human-authored code templates to function within specific research domains, v2 eliminates that dependency entirely. It can now operate across diverse machine learning domains with no pre-written scaffolding from human researchers.

At its core, the system is an agentic pipeline that mirrors the complete scientific method. Given only a broad research topic by its human operators, the AI independently formulates specific scientific hypotheses, designs experiments to test them, writes and debugs experimental code, runs the experiments, analyzes the results, generates figures and visualizations using a Vision-Language Model feedback loop, and finally composes a complete, publication-ready manuscript — from title and abstract through methodology, results, discussion, and references.

The underlying architecture relies on a novel progressive agentic tree-search methodology, managed by a dedicated “experiment manager” agent that coordinates the entire research workflow. This tree-search approach allows the system to explore multiple experimental directions in parallel, pruning dead ends and doubling down on promising results — much like how a skilled human researcher might manage a team of junior scientists.

AI Scientist-v2 autonomous research pipeline interface showing code, data analysis, and paper generation

How It Works: The Full Autonomous Research Pipeline

Understanding the AI Scientist-v2 requires appreciating just how many distinct cognitive tasks it performs simultaneously. Where traditional AI tools might assist with a single step — drafting a literature review, suggesting statistical tests, or formatting citations — v2 handles the entire chain without pause.

The pipeline begins with hypothesis generation. Given a research area, the system surveys the existing literature autonomously, identifies open questions, and formulates testable hypotheses. It then moves to experimental design, specifying what data, computational resources, and evaluation metrics are needed. From there, it writes Python code to conduct the experiments, executes them, monitors for errors or unexpected results, and iteratively refines its approach — just as any diligent researcher would.

One of the most impressive improvements in v2 is the integration of Vision-Language Models into the manuscript-writing phase. After generating figures and charts from experimental data, the VLM reviews these visuals for clarity, aesthetic quality, and scientific accuracy, providing feedback that the system uses to revise them. The result is manuscripts that are not just technically coherent but visually polished — a quality notably absent in earlier AI-generated research outputs.

The final product is a complete LaTeX-formatted paper, indistinguishable in structure and style from human-authored submissions. In the ICLR test case, the paper investigated whether adding compositional regularization to neural network training could improve generalization — a legitimate open question in the field — and produced novel experimental findings to support its conclusions.

The Peer-Review Breakthrough: What Actually Happened at ICLR

To validate their system, the Sakana AI team ran a controlled experiment: they submitted three fully autonomous manuscripts — with no human editing of the research content — to a peer-reviewed workshop at the International Conference on Learning Representations (ICLR), one of the most prestigious venues in machine learning research.

The results were striking. One of the three papers received reviewer scores of 6, 7, and 6, averaging 6.33 out of 10. That score placed it in approximately the top 45th percentile of all submissions and above the threshold that would typically trigger acceptance after meta-review. For context, the workshop’s acceptance rate sits between 60 and 70 percent — more permissive than the main ICLR track, which accepts only 20 to 30 percent of submissions.

Crucially, the reviewers were not informed they were evaluating an AI-generated manuscript, and the ICLR organization cooperated with the experiment under institutional review board approval. The paper was ultimately withdrawn by Sakana AI before formal publication, citing ongoing uncertainty within the scientific community about the ethics of publishing fully AI-generated research.

The two other submitted papers did not achieve acceptance scores, which underscores that the system is not infallible. But the fact that one cleared the bar — on a legitimate scientific question, with real experimental data — is a milestone that even skeptical observers cannot dismiss.

AI and human scientist comparison — autonomous AI system generating peer-reviewed research alongside human researcher

Why This Milestone Is Bigger Than One Paper

It is easy to frame this as a novelty — an AI passing a relatively permissive peer review, at a workshop, on one occasion. But the implications extend far beyond this single experiment, and scientists, ethicists, and technology leaders are beginning to take notice.

First, consider the speed and cost implications. The AI Scientist-v2 can run the equivalent of months of human research work in hours, at a fraction of the cost of employing graduate students and postdoctoral researchers. If the system’s quality continues to improve — and there is no obvious reason it would not — AI-generated research could begin flooding scientific journals at a volume that existing peer-review infrastructure simply cannot handle.

Second, there are profound questions about intellectual credit and authorship. Who owns an AI-generated paper? Sakana AI? The researchers who built the system? The funding institutions that supported the work? Current academic norms offer no clear answers, and the lack of consensus is creating growing discomfort in research communities. Leading journals including Nature and Science have already begun updating their policies on AI-assisted research, but fully autonomous AI authorship represents territory none of their frameworks were designed to address.

Third, and perhaps most importantly, this achievement points toward a future where AI systems can accelerate scientific discovery across entire fields simultaneously. Drug discovery, materials science, climate modeling, genomics — each of these areas is constrained today by the speed at which human researchers can design experiments, collect data, and synthesize findings. An AI that can run thousands of hypothesis-testing loops in parallel could compress decades of research into years.

Limitations and What Critics Are Saying

Not everyone in the scientific community is celebrating. Several prominent researchers have raised pointed concerns about what the peer-review result actually demonstrates.

The most common critique is that workshop acceptance is a low bar. ICLR workshops have acceptance rates two to three times higher than the main conference. Critics argue that clearing a 65 percent acceptance threshold is not the same as producing genuinely paradigm-shifting science — and that the truly hard work of frontier research involves asking the right questions in the first place, a task that still requires deep domain intuition developed over years of human experience.

There are also concerns about hallucination and scientific integrity. Earlier AI research systems have been caught fabricating citations, misrepresenting statistical results, or drawing conclusions that do not follow from their data. While Sakana AI’s team used controlled experimental settings with real computational results, scaling this approach to more complex, data-intensive fields introduces significant risk of subtle errors that might evade peer review.

Finally, the withdrawal of the accepted paper raises its own questions. If the system produced a genuinely valid scientific contribution, why not publish it? The decision to withdraw reflects an awareness that publishing AI-generated work under ambiguous authorship norms could damage trust in academic literature — a trust already under stress from reproducibility crises and publication bias across many scientific fields.

Implications for Science, Academia, and AI Labs

The AI Scientist-v2 is not arriving in a vacuum. It lands at a moment when AI is already reshaping how researchers work, with tools like GitHub Copilot for code, large language models for literature synthesis, and AlphaFold’s successors for structural biology. The question is no longer whether AI will be part of science — it already is — but what happens when AI becomes capable of driving the entire research cycle.

For academic institutions, the implications are significant. If AI systems can generate publishable research autonomously, universities will need to grapple with how they train and evaluate PhD students, what the purpose of graduate education becomes, and whether the traditional apprenticeship model of scientific training remains relevant. Funding bodies will face similar pressures as the economics of large research grants begin shifting when a single AI system can replicate what a team of researchers once took years to accomplish.

For AI labs themselves, the AI Scientist-v2 opens an intriguing feedback loop: an AI that can conduct AI research could, in principle, accelerate the development of better AI systems. Sakana AI and others are acutely aware of this dynamic, and it features prominently in ongoing discussions about AI safety and the pace of capability development.

What Comes Next for Automated Scientific Discovery

Sakana AI has open-sourced the AI Scientist-v2 codebase, inviting the broader research community to build on, critique, and improve the system. The move is consistent with a philosophy of transparency about capabilities — and with an implicit acknowledgment that norms around AI-generated research need to be developed collaboratively rather than imposed by any single organization.

The team is already working on the next generation of the system, with stated goals of improving its ability to conduct longer-horizon research projects, engage meaningfully with existing literature rather than merely summarizing it, and produce findings at the frontier of domains outside machine learning — including biology, chemistry, and physics.

Whether the AI Scientist-v2 represents the beginning of a genuine scientific revolution or a sophisticated demonstration that will be exposed by harder problems remains genuinely uncertain. What is not uncertain is that the gatekeepers of human knowledge — peer reviewers, journal editors, tenure committees — will be forced to reckon with this technology sooner than most of them expected.

The machine has submitted its first paper. Science will never look quite the same again.

PickGearLab

AI Writes Its Own Science: How Sakana’s AI Scientist-v2 Became the First Machine to Pass Peer Review