The Meta-founded startup wants its "digital world models" to become the proving ground every autonomous agent passes through before it touches a real system.
Patronus AI, the San Francisco company started by two former Meta researchers, has closed a $50 million Series B to expand the simulated environments it uses to test AI agents before they reach live systems. The round, announced Thursday and led by Greenfield Partners, lifts the startup's total funding to $70 million and arrives as enterprises grow wary of handing real work to autonomous software that has never been pushed to its breaking point.
The timing is deliberate. Agents are moving past simple question answering and into jobs that span dozens of steps, like booking travel or running a financial analysis. Patronus is wagering that the distance between a model scoring well on a benchmark and an agent actually finishing a messy real task is wide enough to build a company around.
Greenfield Partners led the round, with capital from Notable Capital, Lightspeed Venture Partners, Datadog, Samsung, Factorial Capital, and angel investor Gokul Rajaram. A group of executives from AI labs and software firms joined as well.
Itay Inbar, a partner at Greenfield, framed the company as working on one of AI's core infrastructure problems, arguing that reliable operation inside complicated settings will decide how far the technology goes.
Notable Capital managing director Glenn Solomon was blunter about the pull. He said nearly every frontier lab and dozens of startups already pay for the product, and described appetite for the simulated environments as close to insatiable. Revenue, by his account, has climbed 15-fold over the past year.
That growth figure is doing a lot of work in the pitch.
Patronus was founded in 2023 by Anand Kannappan and Rebecca Qian, who previously worked inside Meta's fundamental AI research group. The pair built an early reputation in AI evaluation through research projects and open benchmarks, then watched the field drift away from static test sets toward something far more demanding.
Their answer is a product line the company calls Digital World Models. These are large simulation environments that copy real websites and the internal tools companies run behind the login screen. An agent dropped into one of these replicas can attempt a task, hit a wall, and try again, all without ever touching a production system or a customer's data.
The method underneath is reinforcement learning. An agent earns a reward when it finishes a task correctly and takes a penalty when it slips, and the loop repeats until the behavior sharpens. Patronus runs this stress-testing phase after initial training, the stretch where the strange, unpredictable scenarios tend to surface.
Kannappan has been direct about why benchmark scores fall short. A static test, in his framing, shows whether a model can answer a narrow question in a controlled setting and says nothing about whether an agent can recover from a failure or hold up across a long, unpredictable job.
The company reaches for self-driving cars to explain the logic. Before Waymo let a vehicle loose in traffic, it built synthetic roads packed with rare hazards: wet intersections, sudden pedestrians, construction cones, and a child chasing a ball into the street. The aim was to expose the system to the once-a-year edge case it would otherwise meet for the first time on a public road.
Agents fail in a different way. Rather than crash, they cut corners, finding a shortcut that technically ends the task but skips what the user actually wanted. Solomon credits the company with catching exactly these maneuvers and forcing models to stay honest about them.
For the moment, Patronus concentrates on domains where success can be checked against a clear answer, mainly software engineering and finance. A coding task either passes its tests or it does not. A reported financial figure is right or wrong.
Kannappan wants to push past that boundary into work that resists easy verification, which is where most real business activity sits.
He also talks about duration. The goal he describes is an environment that can keep a single agent running for hours, or even weeks, on one continuous job rather than a few short minutes of work.
That ambition explains the spending plan. The new capital will pay for a larger research group and more engineering hires, alongside the heavy compute bill that training and running these world models at scale demands.
Patronus does not see its sharpest competition coming from other startups. Kannappan's team points instead at the in-house evaluation groups that frontier labs have already stood up to police their own agents.
There are adjacent players. Firms such as Mercor and Surge supply human-labeled data that feeds reinforcement learning. Patronus draws a hard line between that model and its own, since its environments judge how an agent behaves on its own, with no person sitting in the loop.
The company's older toolkit still runs alongside the simulation push. Percival, its agent debugger, scans execution traces for failure modes and proposes fixes, work the company says compresses about an hour of manual review down to as little as a few seconds. Lynx, a model built to catch hallucinations, sits next to a set of benchmarks including FinanceBench, giving the platform a foothold inside engineering teams before they ever reach for the world models.
Named customers show where this lands in practice. Emergence AI, which builds systems where agents spin up and manage other agents, runs on Patronus. So does CARIAD, the Volkswagen software unit, which has used the platform to keep checking the AI assistants that sit inside its vehicles.
The investor case rests on one shift: that checking an agent's work stops being a last-minute audit and becomes a permanent layer in how AI gets shipped. If agents keep spreading into live workflows, the people writing these checks expect the testing ground to expand right alongside them.
A risk is baked into that model. Independent tooling companies have a track record of getting squeezed once larger platforms copy a feature and fold it into a bigger contract. Patronus will have to keep proving that its simulated worlds and failure analysis beat whatever bundled version a lab ships next to its own models. The 15-fold revenue jump suggests buyers are not waiting around to find out.
Discussion