Vercel Releases Eve: An Open-Source AI Agent Framework Where Each Agent is a Directory of Files Mapped to Capabilities

Vercel has released eve, an open-source framework for building, running, and scaling agents. The project is published as the npm package eve, licensed under Apache-2.0.

Building an agent should mean defining what it does. It should not mean assembling all the plumbing that an agent needs to run in production.

eve is the framework Vercel builds and runs its own agents on. According to Vercel post, it runs more than a hundred agents in production today.

What is eve?

eve is a filesystem-first framework for durable backend agents. You create an agent as a directory on disk. The directory is the contract.

Each file describes one component of the agent. At a glance, the tree shows what an agent is and does. It also shows where it lives and when it acts on its own.

The smallest agent that runs is two files. One sets the model. The other sets the instructions.

// agent/agent.ts
import { defineAgent } from “eve”;

export default defineAgent({
model: “anthropic/claude-opus-4.8”,
});

The model is one line, and provider fallbacks are supported through AI Gateway. The instructions.md file becomes the system prompt that eve puts in front of every model call.

An agent is a directory

Vercel’s core idea is that agents have a shape. Every team kept rebuilding the same structure to meet the same needs. eve makes that shape into a framework.

The directory layout maps each capability to a folder. Here is the contract:

PathRoleFormatagent.tsThe model it runs on, plus runtime configTypeScriptinstructions.mdWho it is, prepended to every model callMarkdowntools/What it can do; filename becomes the tool nameTypeScriptskills/What it knows; loaded only when the topic comes upMarkdownconnections/Secure links to MCP servers and OpenAPI APIsTypeScriptsandbox/Optional override of the agent’s sandbox; seeds workspace filesDirectorysubagents/Specialist child agents it delegates toDirectorychannels/Where it lives, like Slack or HTTPTypeScriptschedules/When it acts on its own, on a cronTypeScriptlib/Shared authored code used across the agentTypeScript

You add a tool, skill, channel, or schedule by adding a file. eve picks them up at build time and wires them in. There is no boilerplate to register them.

A tool is one TypeScript file with a Zod input schema. Its filename and place in the tree become its definition.

// agent/tools/run_sql.ts
import { defineTool } from “eve/tools”;
import { z } from “zod”;

export default defineTool({
description: “Run a read-only SQL query.”,
inputSchema: z.object({ sql: z.string() }),
needsApproval: ({ toolInput }) => estimateScanGb(toolInput.sql) > 50,
async execute({ sql }) { /* … */ },
});

What ships in the box

Vercel describes eve as ‘batteries included.’ Six production capabilities come with the framework:

Durable execution: Every conversation is a durable workflow, with each step checkpointed. A session can pause, survive a crash or a deploy, and resume where it stopped. This is built on the open-source Workflow SDK.
Sandboxed compute: Agent-generated code is treated as untrusted. Every agent gets its own sandbox for shell commands, scripts, and file reads and writes. The backend is an adapter, running on Vercel Sandbox when deployed and on Docker, microsandbox, or just-bash locally.
Human-in-the-loop approvals: Any action can be set to require approval. The agent pauses there and waits, indefinitely if needed, without consuming compute. Once approved, eve continues from where it left off.
Secure connections: A connection is a file pointing at an MCP server or an OpenAPI-compatible API. eve brokers the auth, and the model never sees the URL or credentials. At launch, agents can connect to Slack, GitHub, Snowflake, Salesforce, Notion, and Linear.
Channels: The same agent serves every surface. The HTTP API is on by default, with Slack, Discord, Teams, Telegram, Twilio, GitHub, and Linear included. One channel can hand off to another.
Tracing and evals: Every run produces a trace using standard OpenTelemetry spans. They export to Braintrust, Honeycomb, Datadog, or Jaeger. Evals are scored test suites you run locally or wire into CI.

Use cases, with real examples

Vercel published six agents it runs internally on eve:

d0, the data analyst: Its most-used internal tool, handling more than 30,000 questions a month. Every query is scoped to the asker’s own permissions.
Lead Agent, the autonomous SDR: It works every new lead and follows up on its own. Vercel says it costs about $5,000 a year and returns 32 times that, maintained part-time by one engineer.
Athena, the sales cockpit: RevOps built it in six weeks without engineers. It answers pipeline questions from Snowflake and Salesforce in plain language.
Vertex, the support engineer: It handles tickets across the help center, docs, and Slack. Vercel reports it solves 92% of tickets on its own and escalates the rest.
draft0, the content agent: It runs a review pipeline that catches glaring issues before a human editor sees the piece.
V, the routing agent: Tasks go to V in Slack first. V routes each one to the agent that can answer it.

Interactive Simulation

-1 ? key.split(‘/’).slice(1).join(‘/’) || key.split(‘/’)[0] + ‘/’ : key;
if (key.indexOf(‘/’) > -1) {
var parts = key.split(‘/’);
label = parts.length > 2 ? parts.slice(1).join(‘/’) : parts[1];
}
nodesHtml += ‘

‘ +
‘‘ + label + ‘‘ + FILES[key].c + ‘

‘;
});
tree.insertAdjacentHTML(‘afterbegin’, ‘

agent/

‘ + nodesHtml);
// remove the duplicate root we left in markup
tree.querySelectorAll(‘.root’)[1] && tree.querySelectorAll(‘.root’)[1].remove();

// —- Simulated runs —-
var RUNS = {
revenue: {
ask: ‘What was revenue last week?’,
steps: [
{ t:’load_skill revenue-definitions’, s:’Loaded the team\u2019s revenue rules into context’, d:’green’ },
{ t:’run_sql’, s:’SELECT date_trunc(\u2019week\u2019, created_at) … net of refunds’, d:’green’ }
],
answer: ‘Revenue for the week of June 1 was $4.2M net of refunds, up 6% from the prior week.’
},
region: {
ask: ‘Break last week\u2019s revenue down by region and chart it’,
steps: [
{ t:’load_skill revenue-definitions’, s:’Loaded the team\u2019s revenue rules into context’, d:’green’ },
{ t:’write_file analysis/by_region.py’, s:’Agent wrote its own code in its sandbox’, d:’blue’ },
{ t:’bash python analysis/by_region.py’, s:’Ran the script in an isolated security context’, d:’blue’ }
],
answer: ‘Revenue by region for the week of June 1. AMER $2.1M, EMEA $1.6M, APAC $0.5M. Chart saved to analysis/by_region.png.’
},
destructive: {
ask: ‘Re-run the full historical scan to rebuild the revenue table’,
steps: [
{ t:’load_skill revenue-definitions’, s:’Loaded the team\u2019s revenue rules into context’, d:’green’ }
],
approval: {
text: ‘run_sql needs approval: this query would scan ~180GB, above the 50GB threshold. The session is paused and consumes no compute until you decide.’
},
approvedSteps: [
{ t:’run_sql’, s:’Approved \u2014 resumed exactly where it paused’, d:’green’ }
],
answer: ‘Historical revenue table rebuilt across 3 years. 14.2M rows processed, all trial and internal accounts excluded.’,
declined: ‘Action declined. The agent stopped at the approval gate and ran no destructive query.’
}
};

var qrow = document.getElementById(‘eve-qrow’);
var term = document.getElementById(‘eve-term’);
var busy = false;

[[‘revenue’,’What was revenue last week?’],
[‘region’,’Break revenue down by region’],
[‘destructive’,’Rebuild the revenue table’]].forEach(function (q) {
var b = document.createElement(‘button’);
b.className=”qbtn”; b.textContent = q[1]; b.dataset.run = q[0];
b.addEventListener(‘click’, function () { runFlow(q[0]); });
qrow.appendChild(b);
});

function setBusy(v) {
busy = v;
root.querySelectorAll(‘.qbtn’).forEach(function (b) { b.disabled = v; });
}
function ck() { return ‘\u2713 checkpointed‘; }
function stepHtml(st) {
return ‘

‘ +
st.t + ck() + ‘‘ + st.s + ‘

‘;
}

function runFlow(key) {
if (busy) return;
var run = RUNS[key];
setBusy(true);
term.innerHTML = ‘

‘ + run.ask + ‘

‘;
var i = 0;
function next() {
if (i < run.steps.length) {
term.insertAdjacentHTML(‘beforeend’, stepHtml(run.steps[i]));
term.scrollTop = term.scrollHeight; i++;
setTimeout(next, 720);
} else if (run.approval) {
renderApproval(run);
} else {
finish(run.answer);
}
}
setTimeout(next, 350);
}

function renderApproval(run) {
var html=”\u26a0 ” + run.approval.text +
‘

Approve’ +
‘Decline

‘;
term.insertAdjacentHTML(‘beforeend’, html);
term.scrollTop = term.scrollHeight;
var box = term.querySelector(‘.approval’);
box.querySelector(‘.ap-yes’).addEventListener(‘click’, function () {
box.outerHTML = ‘

approval granted’ + ck() +
‘Resumed from the exact step it paused on

‘;
var j = 0;
(function adv() {
if (j < run.approvedSteps.length) {
term.insertAdjacentHTML(‘beforeend’, stepHtml(run.approvedSteps[j])); j++;
term.scrollTop = term.scrollHeight; setTimeout(adv, 720);
} else { finish(run.answer); }
})();
});
box.querySelector(‘.ap-no’).addEventListener(‘click’, function () {
box.outerHTML = ‘

‘ + run.declined + ‘

‘;
term.scrollTop = term.scrollHeight; setBusy(false);
});
}

function finish(ans) {
term.insertAdjacentHTML(‘beforeend’, ‘

‘ + ans + ‘

‘);
term.scrollTop = term.scrollHeight; setBusy(false);
}

// auto-resize for WordPress iframe embedding
function postHeight() {
var h = root.offsetHeight + 40;
if (window.parent !== window) {
window.parent.postMessage({ type: ‘eve-demo-height’, height: h }, ‘*’);
}
}
window.addEventListener(‘load’, postHeight);
new MutationObserver(postHeight).observe(term, { childList: true, subtree: true });
})();

What's Hot

Atopic Dermatitis in Skin of Color: Advocate for Better Care

Can Limiting Sugar in Early Life Reduce Dementia Risk Decades Later?

Major Eye Drop Recall Affects Over 12 Million Bottles, Per FDA

How to Decode the Temperature Parameter in LLMs

Zuckerberg explains Meta’s personal AI superintelligence strategy

Tencent Open-Sources AngelSpec: A Unified Training Framework for MTP and Block-Parallel Speculative Decoding on Hy3 Models

Avoiding Entity Key Drift in a Data Lake: Step 1, Normalization

Los Movimientos, Part II: Solving Large Pickup-and-Delivery Problems with Adaptive Large Neighborhood Search

Prompt Engineering vs Loop Engineering vs Graph Engineering: What Changes at Each Layer

Atopic Dermatitis in Skin of Color: Advocate for Better Care

Can Limiting Sugar in Early Life Reduce Dementia Risk Decades Later?

Major Eye Drop Recall Affects Over 12 Million Bottles, Per FDA

Google Will Now Let You Virtually Try on Clothes With Just a Selfie

What’s in a Name? How to Get Your Domain Right

Speed Across the Galaxy Next Year in Star Wars: Galactic Racer

News

Company

Services