Your AI Vendor Should Not Be Your Operating System
By Skaira Labs
The New Lock-In Is The Operating Layer
The model is no longer the whole AI system.
The production surface has moved upward: harnesses, sandboxes, tools, memory stores, skills, protocol adapters, review flows, and long-running agent sessions now shape how AI work actually happens.
That shift is genuinely useful. It also quietly creates a more durable form of lock-in.
When an AI vendor owns the model, the harness, the workspace, the memory, the tool runtime, and the review path, the vendor is no longer only a model provider. It is becoming part of the operating system for the work.
That can be exactly the right choice for some teams. Managed infrastructure is valuable when the work needs secure containers, long-running execution, built-in file tools, web access, MCP connections, compaction, and a simpler deployment path. Anthropic's Managed Agents documentation frames the product this way: a managed harness and infrastructure layer for autonomous Claude work. OpenAI's Agents SDK sandbox documentation is moving in the same direction from a different angle: workspace manifests, sandbox-native shell and filesystem capabilities, skills, memory, compaction, permissions, and snapshot-backed runs.
The mistake is not using these platforms. The mistake is letting the platform become the only place where the operating rules exist.
This post is for teams that have already chosen, or are about to choose, a strong agent platform and now need to decide what they should still own themselves.
Here is the architecture we recommend teams adopt: strong executors connected through a control plane the team owns.
The durable layer is not the model. It is the control plane that decides which executor can act, what context it receives, which memory it may touch, what must be reviewed, and what evidence remains after the work completes.
Vendor-Neutral Does Not Mean Vendor-Avoidant
Vendor-neutral architecture is often misunderstood as avoiding managed platforms. That is not the point.
The point is to keep the durable operating contract outside any single executor: the control plane for harnesses, memory discipline, routing, review gates, permissions, and evidence.
A good agent platform can run tasks, expose tools, manage a sandbox, compact context, and preserve state. A good internal operating layer decides which task class should use that platform, what data is allowed to enter, how the result is validated, and whether the state produced by the run can be audited, exported, revoked, or rehydrated somewhere else.
The difference matters because agent stacks are changing quickly. On the same April 2026 frontier, Claude Managed Agents added memory in public beta, OpenAI's sandbox agents documented file-backed memory and snapshot behavior, LangChain argued that memory ownership follows harness ownership, and A2A reached a production-ready 1.0 protocol with Linux Foundation backing. These are not isolated feature launches. They point to a category shift: agents are becoming execution environments.
Execution environments need governance.
A Common Before/After Pattern
A team starts by building dozens of agent workflows inside one managed environment because it is the fastest way to prove value. Then finance asks for a second model on cost-sensitive work, legal asks for a human approval step on regulated outputs, and security asks for an audit trail across tool calls.
If routing rules, memory contracts, and review requirements live only in prompts and vendor workspace state, the team has to reconstruct the operating layer under pressure. The healthier path is to define the durable artifacts before the workflows become too important to move.
What The Agent Operating Layer Owns
The operating layer is the part of the system that should remain inspectable even if the model or agent platform changes.
| Operating surface | What it decides |
|---|---|
| Task class | What kind of work is being requested, what consequence level it carries, and whether AI should act at all. |
| Context package | Which files, instructions, retrieved facts, user preferences, and business definitions enter the run. |
| Tool boundary | Which systems the agent can touch, under what identity, with what read/write scope, and with which approval rules. |
| Memory map | What should be remembered, where it lives, who owns it, and how it can be audited or removed. |
| Routing policy | Which model, sandbox, harness, or human path should handle the work. |
| Review gate | Which outputs can pass structural validation, which need second-pass review, and which require human approval. |
| Evidence trail | What inputs, outputs, decisions, tool calls, and handoffs remain after the run. |
This is not paperwork around the AI system. It is the system boundary.
If these rules live only in prompts, they drift. If they live only inside a vendor-managed harness, they may be hard to inspect or move. If they live only in one engineer's habits, they disappear when the team changes.
The operating layer should be expressed as versioned contracts: task definitions, routing profiles, memory classes, tool permissions, eval sets, review rules, and artifact standards.
Memory Is Not One Feature
"Memory" is becoming a product feature, but deployed teams should treat it as several different architecture choices.
Conversation state is not the same as business memory. A workspace snapshot is not the same as a decision log. A user preference is not the same as workflow state. A retrieval corpus is not the same as agent-learned behavior.
Before a team enables long-term memory, it should name the memory class:
- Conversation continuity: prior messages and tool results that help the current thread continue.
- Workspace state: files, outputs, patches, notebooks, or generated artifacts that need to persist across runs.
- User or team preferences: formatting, workflow, or interaction preferences that improve future assistance.
- Business state: facts about customers, cases, orders, projects, tickets, accounts, or assets.
- Decision memory: why a choice was made, what evidence supported it, and what supersedes it.
- Review history: what was approved, blocked, escalated, or changed after inspection.
- Retrieval corpus: source material the system should search but not silently rewrite.
These classes have different owners and risk profiles. Some belong in structured application state. Some belong in versioned files. Some belong in retrieval systems. Some should be short-lived. Some should never be model-written without review.
The most important memory question is not "does the platform have memory?" It is:
If we switched agent vendors next year, what context, memory, review history, and operating state would we lose?
That question surfaces the architecture. Can the memory be exported? Can it be redacted? Can individual versions be audited? Can stale or poisoned memory be repaired? Can the same task resume in another harness with enough context to continue safely?
Anthropic's Managed Agents memory docs now describe file-like memory stores, application read/write access, version history, redaction, and export considerations while still marking the feature beta. OpenAI's sandbox memory docs distinguish agent memory from conversational session memory and describe memory as distilled lessons in workspace files. LangChain's position is more pointed: if the harness owns the memory shape and hides it behind an API, portability suffers.
All three signals point to the same design pressure. Memory needs ownership rules before it becomes operationally important.
Protocols Reduce Friction. They Do Not Remove Governance.
MCP and A2A are important because they standardize different parts of the agent stack.
MCP is about connecting models and agents to tools, resources, and external systems. The MCP schema makes descriptions, resources, prompts, and tool input schemas part of the interface that clients can use to guide model behavior. A2A is about agent-to-agent communication: discovery, capability negotiation, collaboration, and task management across independent agents.
These protocols lower integration cost. They do not automatically answer the governance questions.
When an agent can discover tools, call external systems, delegate tasks, or coordinate with another agent, the architecture still needs to decide:
- Which agent identity is allowed to act?
- Which tools are available for this task class?
- Which data can cross the boundary?
- Which actions require approval before execution?
- Which outputs require validation after execution?
- Which logs are retained, and for how long?
- Which failure modes stop the workflow instead of continuing silently?
Interoperability without control can create a larger blast radius. The operating layer is what makes interoperability usable in production.
The Control-Plane Check
A team evaluating an agent workflow can start with a simple control-plane check.
1. Can we explain the task class without naming the model?
If the workflow only makes sense as "ask model X to do Y," the architecture is under-specified. The task should have an input contract, allowed tools, expected output shape, validation rule, and review class.
2. Can we move the workflow to a different executor?
It does not need to be effortless. But the team should know what would move cleanly and what would need rewriting: prompts, skills, tool adapters, state, memory, permissions, evals, and audit logs.
3. Can we inspect what the agent remembered?
If the workflow relies on memory, the team should know where that memory lives, who can edit it, how conflicts are handled, and how a bad memory is removed or superseded.
4. Can we prove what happened after the fact?
Production agents need evidence trails: source context, tool calls, outputs, review outcomes, and final handoff state. Without this, every incident becomes archaeology.
5. Can we say what should stay deterministic?
The most reliable agent system still uses code for known-answer work: identifiers, state transitions, schema validation, dedupe, retries, permissions, and artifact movement. The model should own bounded judgment, not structural truth.
What To Build First
The first version of a vendor-neutral agent operating layer does not need to be complex.
Start with five durable artifacts:
- A task-class register that names the recurring workflows, consequence levels, data sensitivity, and expected outputs.
- A routing profile that maps task classes to model, harness, sandbox, tool, and human-review choices.
- A memory map that separates conversation state, workspace state, business records, preferences, decisions, review history, and retrieval corpora.
- A review matrix that defines structural validation, second-pass review, human approval, and blocked-output conditions.
- An evidence contract that says which logs, artifacts, inputs, outputs, and approvals survive each run.
At small scale, these can be markdown files, schemas, test fixtures, and lightweight scripts. They do not need to start as a platform. The point is to make the operating rules explicit before the workflow becomes too valuable or too risky to reconstruct.
As the system grows, the contracts can become configuration, policy engines, workflow gates, eval suites, dashboards, and audit APIs. But the architecture starts with naming what the system must preserve.
The Long-Term Advantage
Model capabilities will keep improving. Agent platforms will keep absorbing more of the stack. Protocols will reduce integration friction. That is good for builders.
But capability does not remove architecture. It raises the consequence of architecture.
The teams that benefit most from agent platforms will not be the teams that avoid them. They will be the teams that use them deliberately: strong executors behind a control plane they can inspect, change, and carry forward.
That is the difference between adopting AI tools and building an AI operating layer.
If your team is moving from AI experiments into agentic workflows, start by finding the boundary: what your vendor owns, what your code owns, what your memory owns, and what your organization must still own to trust the system. Talk to Skaira about an AI Architecture Review.
For the broader architecture behind this stance, see why AI systems need harnesses, memory discipline, routing, and review gates, and why model upgrades are secondary to control-plane governance.