📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems are now capable of automating core engineering tasks in AI development, reaching near-saturation on key benchmarks. However, the automation of AI research itself remains uncertain, with some aspects possibly becoming engineering at scale.
Recent empirical data confirms that AI systems have achieved near-complete automation of core engineering tasks in AI development, marking a significant shift in the field. While engineering automation appears near saturation, the automation of AI research remains uncertain, with some aspects possibly becoming engineering at scale. This development could reshape the landscape of AI innovation and research productivity.
Multiple benchmarks measuring AI capabilities in research-related tasks show rapid progress toward saturation. For example, the CORE-Bench, which assesses AI’s ability to reproduce research papers, improved from 21.5% in September 2024 to 95.5% in December 2025, with some experts declaring it ‘solved.’ Similarly, the MLE-Bench, evaluating performance in Kaggle competitions, rose from 16.9% in October 2024 to 64.4% in February 2026, reaching a level comparable to mid-tier human practitioners. These benchmarks indicate that AI can now handle tasks traditionally performed by human researchers with high reliability. Meanwhile, advances in kernel design—such as automated GPU kernel generation and optimization—are increasingly integrated into production-grade AI infrastructure, further demonstrating the automation of engineering components. However, Clark notes that the automation of the research process itself remains less certain, as some aspects of research may be inherently distinct from engineering tasks, though the boundary is increasingly blurred. The key question is whether research can be fully automated or if it will evolve into an engineering problem at scale, which could accelerate AI development even further.Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.
![Claude AI for Beginners Bible: [5 in 1] The Ultimate Guide to Automate Your Work, Save Hours Every Week, and Use AI for Real-World Results](https://m.media-amazon.com/images/I/415+fSJacsL._SL500_.jpg)
Claude AI for Beginners Bible: [5 in 1] The Ultimate Guide to Automate Your Work, Save Hours Every Week, and Use AI for Real-World Results
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

CLAUDE AI UNLEASHED From First Prompts to Pro: The Complete Guide to Claude AI for Writing, Research, Coding, and Business (The Claude AI Mastery Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

GPU-Accelerated Computing with Python 3 and CUDA: From low-level kernels to real-world applications in scientific computing and machine learning
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

CLAUDE AI UNLEASHED From First Prompts to Pro: The Complete Guide to Claude AI for Writing, Research, Coding, and Business (The Claude AI Mastery Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications for AI Development and Research Automation
The rapid automation of core engineering tasks suggests that AI-driven development could soon become more autonomous, reducing reliance on human engineers for routine tasks. This shift could lead to faster innovation cycles and lower costs in AI research and deployment. However, the uncertain status of research automation raises questions about the future of scientific discovery in AI—whether AI can independently generate novel theories or if human insight will remain essential. Understanding this dynamic is critical for policymakers, organizations, and researchers planning for the future of AI innovation.Recent Progress in AI Capabilities and Benchmark Saturation
Over the past 18 months, multiple independent benchmarks have shown rapid progress toward AI systems automating core aspects of AI engineering. The CORE-Bench, measuring research reproduction, has seen a 4.4× improvement, with some experts declaring it ‘solved.’ The MLE-Bench, assessing Kaggle competition performance, has improved from 17% to over 64%, reaching levels comparable to mid-tier human practitioners. Advances in kernel design demonstrate that AI models are increasingly capable of generating optimized hardware code, with recent papers showing AI-generated GPU kernels and automated code conversion. These developments suggest that engineering tasks in AI are approaching full automation, while the research process remains less certain, with some aspects potentially becoming engineering at scale.“The pattern across multiple benchmarks indicates that AI is approaching saturation in core engineering skills, with some experts declaring certain tasks ‘solved.'”
— Thorsten Meyer
Unclear Scope of Full Research Automation
While engineering tasks are nearing full automation, the extent to which AI can autonomously conduct research remains uncertain. Some aspects of research—such as hypothesis generation, theory development, and experimental design—may be inherently distinct from engineering tasks. It is not yet clear whether AI will fully automate these components or if they will evolve into large-scale engineering problems. The timeline and feasibility of such automation are still under active investigation, and expert opinions vary.
Next Steps in Monitoring AI Automation Progress
Researchers and industry observers will continue tracking benchmark saturation levels, especially as new benchmarks are introduced or existing ones are refined. Focus will likely shift toward understanding the boundary between engineering and research, and whether AI can independently generate novel scientific insights. Additionally, developments in kernel design and infrastructure automation suggest that engineering automation will further accelerate, possibly outpacing research automation. Policy discussions and strategic planning will need to consider these evolving capabilities to prepare for a future where AI-driven development becomes the norm.
Key Questions
What are the key benchmarks indicating AI’s engineering automation?
Benchmarks like CORE-Bench, measuring research reproduction, and MLE-Bench, evaluating Kaggle competition performance, are showing near-saturation, indicating that AI can automate core engineering tasks effectively.
Can AI fully automate the research process now?
No. While engineering tasks are approaching full automation, the automation of research—such as hypothesis generation and experimental design—remains uncertain and is an active area of investigation.
What does this mean for human researchers?
Human researchers may see a shift toward focusing on higher-level scientific questions, with routine engineering and replication tasks increasingly handled by AI systems.
How might this impact AI development timelines?
If engineering automation continues to advance rapidly, AI development cycles could accelerate significantly, potentially reducing the time and cost required to produce new AI models and infrastructure.
What are the risks associated with AI automating research?
Risks include over-reliance on AI for scientific discovery, potential biases in automated hypotheses, and challenges in ensuring transparency and reproducibility of AI-driven research outcomes.
Source: ThorstenMeyerAI.com