Skip to main content

Computer Use

Your agent, operating your mouse and keyboard, by seeing the screen the way you do. The universal fallback for every app that has no API, no CLI, and no MCP server. In 2026 this capability stopped being a demo and became a working tool.


What It Is

Computer use is the capability for an AI agent to operate a computer the way a human does: by reading the screen, moving the cursor, clicking, typing, scrolling, and reading the results. The agent sees pixels, forms a plan, takes actions, sees the new state, and continues.

This is distinct from agent-accessible products. Agent-accessible products expose CLIs, APIs, or MCP servers that an agent can call directly. Computer use is what your agent does when the product you need to use has none of those things. Which, in 2026, is still the vast majority of software in the world.

Where It Is Today (April 2026)

The capability moved from research preview to practically useful in the last six months. The two production implementations:

Claude Computer Use (Anthropic). Originally shipped as a research preview in late 2024. The Q1 2026 rebuild brought action latency down roughly 50% via a rolling visual-context window. The Opus 4.7 release extended computer use alongside coding, long-context reasoning, and agent planning. Available on macOS and Windows to Claude Pro and Max subscribers, with permission prompts before consequential actions.

Codex Background Computer Use (OpenAI). Shipped April 16, 2026 as part of the "Codex for almost everything" release. Operates on macOS first (EU and UK rolling out), with 90+ plugins that combine skills, app integrations, and MCP servers in a single agent. Codex memory stores corrections and preferences and surfaces them automatically in future sessions. Currently the fastest and most practical computer-use implementation in the market.

The signal worth reading from a few weeks earlier: in February 2026, Peter Steinberger, the Austrian developer who built the open-source OpenClaw personal-agent framework, joined OpenAI to lead the next generation of personal agents. OpenClaw itself moved to an independent nonprofit foundation and stays open source. Two things are true at once: hyperscalers are paying for the talent that understands this space the deepest, and the open-source implementations are viable enough to be worth stewarding separately for the long run.

Why It Fills A Critical Gap

The world of software has two populations, and one is much larger than the other.

Agent-accessible software. CLIs, APIs, MCP servers. Growing fast. Still a minority. Most SaaS products that ship APIs do so because a technical co-founder felt strongly about it. MCP is a 2025-forward standard. Adoption is accelerating but uneven.

Everything else. Internal tools at companies. Legacy enterprise software. Consumer apps that chose never to open up. Government portals. Vendor dashboards. Banking interfaces. The vast middle of the economy. Most of it is not shipping an MCP server next quarter. Much of it will never ship one.

Computer use is the answer for that second population. When your agent needs to update a vendor portal that has no API, computer use is the only answer. When a reconciliation requires touching a field that is not exposed through Shopify's API, computer use touches it. When a government form must be submitted through a specific GUI and only through that GUI, computer use submits it.

The point: "digital work I cannot delegate because the tool has no API" stops being a category. There is just digital work. Some of it your agent does fast through the API. Some of it your agent does slower-but-universally through the screen. Either way, the work gets done.

The Pattern

You speak English into your voice-to-text. Your Jarvis routes the request. For the agent-accessible portion of the task, it hits the CLI, the API, the MCP server. For the portion that requires operating a GUI that was never built for agents, it spins up a computer-use session. You watch (or do not watch) while the cursor moves, forms fill, buttons click, screens advance.

The output is the outcome you asked for. The vendor portal is updated. The expense report is filed. The internal tool is configured. The PDF you could not get into your workflow any other way is now inside it.

This is what speak English to your computer and the result you want comes out actually means. Computer use is how that sentence becomes true for all digital work, not just the portion currently covered by agent-accessible products.

Priority Ladder

Computer use is a fallback, not a first choice. It is:

  • Slower. Reading pixels and moving a cursor is slower than hitting an endpoint. A task that takes 50ms through an API can take 30 seconds through computer use.
  • More fragile. A UI update breaks the script. A popup changes the flow. A loading spinner confuses the model.
  • More expensive. Vision inference costs more than text inference. A long computer-use session costs measurably more than the equivalent API calls.

Priority ladder when your agent needs to do a digital task:

  1. CLI if the product has a good one.
  2. API if it has a good one.
  3. MCP server if it has one.
  4. Computer use if it has none of the above.

As agent-accessible products proliferate, the computer-use surface shrinks. For any tool you use a lot, the long-term win is pushing the vendor to ship agent-accessible interfaces, or switching to a competitor that already did. Computer use is the bridge while that happens, and the permanent fallback for products that never will.

Sovereignty Implications

A computer-use agent operating on your behalf sees everything on your screen. Credentials. Customer data. Banking information. Every document you happen to have open. See the sovereignty stack for the broader frame.

Two guardrails that are load-bearing:

  1. Know what you are approving. Both Claude and Codex default to asking permission for consequential actions. Do not train yourself to click yes without reading.
  2. Prefer local when you can. As open-source computer-use matures, the path to running a capable agent entirely on your machine opens up. OpenClaw and successors are worth tracking as the alternative to indefinite hyperscaler dependency.

Computer use is an unusually powerful tool that is also an unusually powerful surveillance surface if it is pointed the wrong direction. The same capability that runs your boring admin work is the capability that human emulators are optimized to exploit. Stay on the correct side of that ledger.

What It Means For Your Jarvis

Once your Jarvis has computer use wired in, the question of "can I automate this?" stops depending on whether the vendor shipped an API. The new question is: "is this digital work I understand well enough to describe to my agent?" If yes, it can be delegated.

The follow-on effect: for any work that is purely digital, the set of "tasks I cannot delegate because the tool is not agent-accessible" collapses to near-zero. Every daily applied AI practitioner is a knowledge worker. For knowledge workers, computer use closes the last structural gap between intent and execution.

Strategy stays with you. Execution moves to the agent. Strategy is the new execution, and computer use is the piece that makes that sentence true across every corner of your digital workflow, including the corners the API economy will never reach.


The question stopped being whether computer use can do the work. In 2026 the answer to that question is yes. The question now is whether you have articulated the work well enough for your agent to run it.


Further Reading

  • Agent-Accessible Products: The first-choice alternative. Why CLIs, APIs, and MCP servers are preferable when available, and why the products with them will outcompete the ones without.
  • Personal Agentic OS: The system that computer use plugs into.
  • The Sovereignty Stack: The broader frame for why running capable agents on your own infrastructure matters.
  • Harness Engineering: What wraps the model. Computer use is a harness capability, not a model capability.
  • Human Emulators: The extractive twin of the same capability. Same instrument, opposite direction.
  • Strategy Is the New Execution: The meta-shift computer use makes complete for all digital work.
  • Remote Control for Coaching: Adjacent but different. A human-to-human take-control-of-my-screen tool for instructor-assisted workshops, not an agent capability.

Sources