OpenAI is introducing sandbox execution that allows enterprise governance teams to deploy automated workflows with controlled risk.
Teams taking systems from prototype to production have faced difficult architectural compromises regarding where their operations occurred. Using model-agnostic frameworks offered initial flexibility but failed to fully utilise the capabilities of frontier models. Model-provider SDKs remained closer to the underlying model, but often lacked enough visibility into the control harness.
To complicate matters further, managed agent APIs simplified the deployment process but severely constrained where the systems could run and how they accessed sensitive corporate data. To resolve this, OpenAI is introducing new capabilities to the Agents SDK, offering developers standardised infrastructure featuring a model-native harness and native sandbox execution.
The updated infrastructure aligns execution with the natural operating pattern of the underlying models, improving reliability when tasks require coordination across diverse systems. Oscar Health provides an example of this efficiency regarding unstructured data.
The healthcare provider tested the new infrastructure to automate a clinical records workflow that older approaches could not handle reliably. The engineering team required the automated system to extract correct metadata while correctly understanding the boundaries of patient encounters within complex medical files. By automating this process, the provider could parse patient histories faster, expediting care coordination and improving the overall member experience.
Rachael Burns, Staff Engineer & AI Tech Lead at Oscar Health, said: “The updated Agents SDK made it production-viable for us to automate a critical clinical records workflow that previous approaches couldn’t handle reliably enough.
“For us, the difference was not just extracting the right metadata, but correctly understanding the boundaries of each encounter in long, complex records. As a result, we can more quickly understand what’s happening for each patient in a given visit, helping members with their care needs and improving their experience with us.”
OpenAI optimises AI workflows with a model-native harness
To deploy these systems, engineers must manage vector database synchronisation, control hallucination risks, and optimise expensive compute cycles. Without standard frameworks, internal teams often resort to building brittle custom connectors to manage these workflows.
The new model-native harness helps alleviate this friction by introducing configurable memory, sandbox-aware orchestration, and Codex-like filesystem tools. Developers can integrate standardised primitives such as tool use via MCP, custom instructions via AGENTS.md, and file edits using the apply patch tool.
Progressive disclosure via skills and code execution using the shell tool also enables the system to perform complex tasks sequentially. This standardisation allows engineering teams to spend less time updating core infrastructure and focus on building domain-specific logic that directly benefits the business.
Integrating an autonomous program into a legacy tech stack requires precise routing. When an autonomous process accesses unstructured data, it relies heavily on retrieval systems to pull relevant context.
To manage the integration of diverse architectures and limit operational scope, the SDK introduces a Manifest abstraction. This abstraction standardises how developers describe the workspace, allowing them to mount local files and define output directories.
Teams can connect these environments directly to major enterprise storage providers, including AWS S3, Azure Blob Storage, Google Cloud Storage, and Cloudflare R2. Establishing a predictable workspace gives the model exact parameters on where to locate inputs, write outputs, and maintain organisation during extended operational runs.
This predictability prevents the system from querying unfiltered data lakes, restricting it to specific, validated context windows. Data governance teams can subsequently track the provenance of every automated decision with greater accuracy from local prototype phases through to production deployment.
Enhancing security with native sandbox execution
The SDK natively supports sandbox execution, offering an out-of-the-box layer so programs can run within controlled computer environments containing the necessary files and dependencies. Engineering teams no longer need to piece this execution layer together manually. They can deploy their own custom sandboxes or utilise built-in support for providers like Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel.
Risk mitigation remains the primary concern for any enterprise deploying autonomous code execution. Security teams must assume that any system reading external data or executing generated code will face prompt-injection attacks and exfiltration attempts.
OpenAI approaches this security requirement by separating the control harness from the compute layer. This separation isolates credentials, keeping them entirely out of the environments where the model-generated code executes. By isolating the execution layer, an injected malicious command cannot access the central control plane or steal primary API keys, protecting the wider corporate network from lateral movement attacks.
This separation also addresses compute cost issues regarding system failures. Long-running tasks often fail midway due to network timeouts, container crashes, or API limits. If a complex agent takes twenty steps to compile a financial report and fails at step nineteen, re-running the entire sequence burns expensive computing resources.
If the environment crashes under the new architecture, losing the sandbox container does not mean losing the entire operational run. Because the system state remains externalised, the SDK utilises built-in snapshotting and rehydration. The infrastructure can restore the state within a fresh container and resume exactly from the last checkpoint if the original environment expires or fails. Preventing the need to restart expensive, long-running processes translates directly to reduced cloud compute spend.
Scaling these operations requires dynamic resource allocation. The separated architecture allows runs to invoke single or multiple sandboxes based on current load, route specific subagents into isolated environments, and parallelise tasks across numerous containers for faster execution times.
These new capabilities are generally available to all customers via the API, utilising standard pricing based on tokens and tool use without demanding custom procurement contracts. The new harness and sandbox capabilities are launching first for Python developers, with TypeScript support slated for a future release.
OpenAI plans to bring additional capabilities, including code mode and subagents, to both the Python and TypeScript libraries. The vendor intends to expand the broader ecosystem over time by supporting additional sandbox providers and offering more methods for developers to plug the SDK directly into their existing internal systems.
See also: Commvault launches a ‘Ctrl-Z’ for cloud AI workloads
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is part of TechEx and is co-located with other leading technology events including the Cyber Security & Cloud Expo. Click here for more information.
AI News is powered by TechForge Media. Explore other upcoming enterprise technology events and webinars here.

