Back to blog

Computer Use's Biggest Enterprise Blocker Isn't Capability

As Google and Anthropic ship capable UI agents, the bottleneck in production isn't the model's ability to click. It's the absence of a governance layer.

October 7, 2025 6 min read AI Governance
Computer Use's Biggest Enterprise Blocker Isn't Capability
Image: Google DeepMind

What the model can do now

Google's Gemini 2.5 Computer Use model is a purpose-built model for interacting with user interfaces. Unlike general models that happen to be able to describe what they see on a screen, it is trained specifically for the operational task of navigating interfaces — clicking buttons, filling forms, reading output, handling modal dialogs, recovering from unexpected states, and moving between applications to complete multi-step workflows. The capability is meaningfully better than what was available even twelve months ago.

Anthropic has been advancing computer use capability in parallel through Claude. The trajectory across both vendors is clear: models are getting significantly better at operating UIs reliably, at lower error rates, and with better ability to identify when a situation requires human input rather than continuing autonomously. The demos are impressive. The benchmarks on standard web automation tasks are genuinely moving.

So the capability question is largely settled — not in the sense that the models are perfect, but in the sense that they are capable enough for real workflows. The question that is not settled, and that almost every enterprise team we talk to is stuck on, has nothing to do with the model's ability to click.

Where the actual friction lives

The most common thing we hear from enterprise teams experimenting with computer use is a variation of the same sentence: 'We know it can do the task. We do not know what to let it do.' That is a fundamentally different problem from model capability, and it does not get solved by a better model.

Scoping a UI agent's permissions is genuinely harder than scoping an API's permissions. When you give an agent API access to a system, you can define exactly which endpoints it can call, with what parameters, and with what rate limits. The API contract is explicit and the boundaries are enforced by the system itself. When you give an agent UI access to a system, the contract is whatever the UI does, which is far less bounded. The same interface that lets you view a report also lets you delete it. The same screen that shows you customer data also lets you export it. The same workflow that approves a request also lets you reject it or modify it before approving.

This is why computer use governance is a permission graph design problem, not a model capability problem. You need to define, before deployment, which UI surfaces the agent can access, which actions within those surfaces it can take autonomously, which actions require human confirmation, and which actions it should never take at all. That design work is not hard in principle — it is the same risk-stratified access control design that enterprise security teams do for human employees — but it requires organizational clarity that most teams have not yet developed for AI agents.

The blast radius issue compounds the governance challenge. With most software bugs or human errors, the damage is bounded and visible relatively quickly. A UI agent operating continuously can make a large number of consequential actions before anyone notices something is wrong. If it is operating in a billing system, a customer record database, or an operations management platform, the reconciliation cost of a significant error is high. That asymmetry between the cost of acting and the cost of reviewing is why the governance layer cannot be an afterthought.

  • Define which UI surfaces the agent can access before deployment.
  • Stratify actions: autonomous, confirmation-required, and never.
  • Prefer scoped and time-limited credentials over persistent access.
  • Require explicit confirmation for any action that cannot be undone.
  • Log every session with sufficient granularity for post-incident review.

A framework for building the governance layer

The most useful mental model we have found for computer use governance is to treat the agent like a contractor on their first engagement: they have domain knowledge and genuine skill, but they need to earn the right to operate independently in specific contexts by demonstrating judgment and reliability. You start them narrowly scoped, with confirmation requirements, and expand autonomy as you observe their behavior in the specific environment they are operating in.

Practically, this means starting with read-heavy workflows — reporting, monitoring, data extraction — before expanding to write-heavy ones. It means building confirmation checkpoints at irreversibility boundaries, not at every step. It means logging at the action level, not just the session level, so that a review after a mistake can reconstruct exactly what happened and why.

Session isolation is also underrated as a governance tool. Running computer use agents in sandboxed environments with scoped credentials — rather than in the same desktop environment where sensitive credentials and systems are accessible — limits the blast radius of unexpected behavior significantly. The performance cost of sandbox setup is real but it is almost always justified by the risk reduction.

The competitive opportunity for teams that solve this first

Teams that build a working governance layer for computer use unlock a category of automation that is genuinely not accessible any other way. Back-office operations, legacy system interaction, interface-bound workflows in third-party enterprise software, quality assurance across UI surfaces — these are all high-value targets that are resistant to traditional automation because the systems are not designed to be automated. UI-capable AI agents change that equation, but only for organizations that have the governance infrastructure to deploy them.

The organizations that have solved this governance problem will have automated workflows that their competitors simply cannot replicate without doing the same design work. That is a durable operational advantage, and it comes not from having a better model — everyone has access to the same models — but from having the organizational and technical infrastructure to deploy those models responsibly in production.

Source signals

Official announcements behind this article.