What Anthropic's Vercept Acquisition Says About Computer Use

Anthropic on Wednesday acquired Vercept, a Seattle startup that spent the past year building an AI agent capable of controlling software the way a person would. Not through APIs. By watching the screen and moving the cursor.

The financial terms were not disclosed, but the deal includes roughly 20 engineers and researchers, many of them veterans of the Allen Institute for AI. Vercept's core product was a Mac application called Vy. The agent used computer vision to perceive the screen and translate natural language instructions into mouse movements and keystrokes. If you asked it to pull a number from a PDF, paste it into a spreadsheet, and format the row, it would figure out where those things were on screen and do it: no plugin required, no prior configuration for the specific software.

That's a different approach from most enterprise automation. Traditional robotic process automation tools work by identifying UI elements through structured hooks: accessibility APIs, DOM selectors, recorded click paths. They're brittle. Change a button label or update an app version and the automation breaks. What Vercept was building is closer to how a person learns new software: by looking at it.

Anthropic had already been working on computer use for Claude. They shipped the first version last October. Results were promising but rough: on OSWorld, a standard benchmark for evaluating AI computer-use performance, Claude landed under 15%. By the time of this acquisition, the latest Sonnet models are at 72.5% on the same benchmark. That's significant progress, and the Vercept team may be part of why it accelerates further.

The technical fit is hard to miss. Co-founder Ross Girshick helped develop R-CNN and related work at Microsoft Research and Meta: foundational contributions to how neural networks detect and localize objects in images. That expertise in visual perception applies directly to the problem of understanding what's on a screen. Co-founder and CEO Kiana Ehsani brought a robotics background focused on agents that navigate and manipulate physical environments. These aren't people who stumbled into the problem.

Vercept claimed 92% accuracy on computer automation benchmarks in testing done with Together AI. OpenAI's numbers on similar evaluations have been reported around 18.3%. These are probably not the same benchmark, and comparisons between labs' self-reported numbers are always noisy. But the directional gap is real, and it attracts attention.

What Anthropic hasn't said: how Vercept's technology will be integrated, or what the timeline looks like. Claude's computer use is currently available via API and in Claude.ai for certain subscription tiers. Whether this becomes a more prominent product feature, an enterprise offering, or something embedded in Claude's underlying models is an open question. The acquisition announcement was light on specifics.

The deal isn't happening in a vacuum. Building AI agents that can use software reliably is now one of the more actively contested problems in the field. OpenAI, Google, and Microsoft are all pushing from different angles. The ChatGPT agent rollout earlier this year moved OpenAI further in this direction, expanding from web browsing to taking actions inside connected apps. Google is integrating Gemini into on-device multistep tasks on Pixel and Samsung Galaxy hardware. The competitive logic suggests Anthropic couldn't afford to build this entirely from scratch.

The people who backed Vercept understood the problem. Former Google CEO Eric Schmidt, Google DeepMind chief scientist Jeff Dean, Cruise founder Kyle Vogt, and Dropbox co-founder Arash Ferdowsi all participated in the $50 million the company raised. That's not a list of investors making casual bets. Vercept's co-founder Matt Deitke had left earlier for Meta's Superintelligence Lab under a reported $250 million compensation package, which tells you something about how much the big labs valued this kind of expertise.

Oren Etzioni, co-founder and early investor, described the outcome as "sad" on LinkedIn. He said he was pleased to get a positive return but felt the company was "basically throwing in the towel" after just over a year with "so much traction." What Etzioni's comment captures is how fast the window is closing for independent startups in the computer-use space. Once a foundation model lab internalizes a capability, the economics get difficult for everyone else. A standalone computer-use agent needs to be dramatically better than what's built into Claude or GPT-4 to justify a separate contract. Vercept's Vy application will shut down in 30 days.

The same dynamic played out with coding assistants: as code generation improved in the base models, the standalone coding-assistant market bifurcated into "deeply integrated into development environments" and "fighting for the remaining gaps." Computer use appears to be following the same trajectory. The MCP protocol, which lets chatbots connect to external services, is part of what makes this infrastructure question matter more: as those integrations multiply, the agent that can navigate arbitrary software interfaces reliably has a significant edge.

The acquisition signals where Anthropic is betting the next capability improvements will come from: not purely better reasoning or longer context windows, but better visual perception. Knowing what's on the screen, correctly, across different applications and operating systems, is a different kind of problem from predicting the next token. Vercept spent a year grinding on exactly that. Anthropic now has that team.