How We Built Our Own Background Coding Agents
At Dexter, we’re building an AI-powered source-to-pay platform. We help businesses streamline procurement: from vendor onboarding and bid evaluation to contract management and invoice verification. It’s a complex product, and we ship it with a small team.
We don’t write code by hand anymore. We use Claude Code and Codex for everything. They’re great when you’re sitting in front of your terminal. But we kept running into the same friction: the work stops when you close the laptop.
We wanted agents that could pick up a Linear ticket, write the code, open a PR, and have it ready for review by the time we got to our desks. So we built Schwarm: our own background coding agent system.
Why not use what’s out there?
The existing tools didn’t fit how we wanted to work. We needed three things:
Async by default. We wanted to throw a task at an agent before going to bed and wake up to a PR. Not sit in a terminal babysitting a session.
Deep integration with our workflow. We use Linear for project management. We wanted to assign a ticket to an agent the same way we assign it to a person. We also wanted agents that could run on a schedule (keeping docs up to date, enforcing best practices, running refactors) without anyone having to remember to kick them off.
A human-in-the-loop, not a human-at-the-keyboard. We didn’t want to just fire-and-forget. We wanted to be able to check on an agent’s work from our phones, review what it built, and steer it with follow-up instructions, without sitting at a desk.
Nothing off the shelf gave us all three, so we built it ourselves.
The architecture
Schwarm is two components: a server and one or more nodes.
The server is a Rails 8 app running on a single EC2 instance. It manages tasks, projects, nodes, and scheduling. The entire state lives in a SQLite database. We replicate the main database to S3 every minute with Litestream. It’s simple and it works.
Nodes are Mac machines or EC2 instances that do the actual work. Each node registers with the server via an API key and runs a Ruby process that polls for tasks. The node is a Ruby gem — no Rails, just a focused agent runner.
One task, one universe
When a node picks up a task, it creates a fully isolated environment:
- A git worktree branched off main from a shared clone. Every task gets its own copy of the repo at
~/.schwarm/schwarm-<task-id>/. No conflicts between concurrent agents. The shared clone at~/.schwarm/repos/org--repo/avoids re-cloning the full repo each time. - A tmux session where Claude Code runs with the full prompt, system instructions, skills, commands, and MCP integrations.
- A Tailscale service that makes the running app accessible over our tailnet at a unique URL like
https://schwarm-<task-id>.<tailnet>.ts.net.
This means you can have five agents working on five different tasks simultaneously on the same machine, each with their own branch, their own running app, and their own Claude Code session.
The node: convergent reconciliation
The node doesn’t use webhooks or push notifications. Every 10 seconds, it asks the server “what should I be running?” and converges local state to match.
The reconciliation loop is simple: fetch the list of assigned tasks, compare with what’s running locally, start what’s missing, stop what’s gone. If the node crashes and restarts, it just re-converges. No lost events, no out-of-sync state.
Every 30 seconds, the node also sends a heartbeat with its hostname, version, and commit SHA. The server uses this to detect offline nodes and reassign their tasks.
This design was a deliberate choice over event-driven. It’s less responsive (up to 10 seconds of lag), but dramatically simpler and more reliable. The node is stateless between restarts and most of the state lives on the server.
Review from anywhere
The Tailscale integration isn’t for the agents, it’s for us. Every task gets its own URL where the app is running with the agent’s changes. You can open it on your phone, see what the agent built, and decide if it’s going in the right direction.
If it’s not, you queue up a message. The web UI has a message form on every task’s session log page. Type your feedback, hit send, and it gets queued as a session directive in the database. The session log is the full claude code session, you see every message and tool call.
Here’s how delivery works: the node checks each task’s pending_directives on every poll. When Claude is idle (detected via a status JSON file that Claude Code hooks write to), the node kills the current tmux session and re-launches Claude Code with --resume <session-id> "<your message>". Claude picks up right where it left off, but now with your feedback as the new prompt.
You can queue multiple messages while the agent is working. They get batched and delivered together as a numbered list at the next idle moment. It’s async in both directions: the agent doesn’t block waiting for you, and you don’t block waiting for it.
The GitHub feedback loop
This is the part that really changed things for us. When an agent opens a PR, Schwarm doesn’t just walk away. It watches.
Merge conflicts. Every five minutes, a scheduled job polls GitHub’s GraphQL API to check the mergeability of all open PRs from active tasks. If a PR has conflicts with main, a session directive is created telling Claude to rebase and resolve. The system tracks the HEAD SHA to avoid re-notifying about the same conflict, and clears the directive automatically when the conflict is resolved.
CI failures. When a check suite completes with a failure on a PR branch, GitHub sends a webhook to our server. We extract the task ID from the branch name (branches follow the pattern schwarm/<task-id>_<slug>), create a webhook event and a session directive telling Claude to investigate and fix. It reads the CI logs via gh pr checks, identifies the issue, and pushes a fix.
PR reviews. When someone leaves a review or a comment, those get forwarded as session directives too. The agent can read the feedback and push fixes.
All three of these use the same pipeline: something creates a SessionDirective record with a summary, the node picks it up on the next poll when Claude is idle, and delivers it via --resume. The same mechanism that delivers human messages delivers GitHub events. One pipe, many sources.
By the time you actually open the PR to review it, the build is green, there are no conflicts, and the agent has already addressed the easy feedback. You’re reviewing clean work.
How tasks get created
Tasks can come from several places:
- Linear. Assign a ticket to the agent and it gets picked up automatically. A Linear webhook triggers a server-side LLM agent that creates the Schwarm task with the right context. This agent can also manage tasks across multi-turn conversations in Linear. We leverage Linear Agents1 for that, and the Linear MCP is pre-configured on all nodes so the agent can get full Linear context.
- The web UI. Our Command Center lets you create tasks with a name, a project, and optional file attachments. Drag, drop, go.
- Schedules. Recurring tasks and shared agents run on cron expressions (more on this below).
Once created, tasks move through a state machine: draft → ready → claimed → archived. The server runs an assignment loop every 10 seconds that atomically matches ready tasks to available nodes.
Tasks can also have dependencies. You can say “task B depends on task A” and B won’t become ready until A is archived. The dependency graph is validated for cycles using Ruby’s TSort.
The agent’s toolkit
Each Claude Code session starts with a carefully assembled system prompt that includes:
- The task description and project context
- Git workflow instructions (branch naming, commit conventions, PR creation)
- Instructions to start the app on
$PORTfor the Tailscale service at$SCHWARM_URL - A self-archive mechanism — when Claude thinks it’s done, it writes a reason to
$SCHWARM_ARCHIVE_SIGNAL, and the node detects this file and archives the task via the API
We wire up MCP servers for Linear and Sentry, so agents can fetch ticket details and check for errors in production.
One subtle but important detail: GitHub tokens rotate every hour, but agent sessions can run for much longer. We solved this with a file-based credential helper. The node refreshes the token every 10 seconds and writes it to a file in the workspace. A shell-based git credential helper and a custom gh CLI wrapper both read from this file — completely transparent to Claude. It never has to think about authentication.
Session state tracking uses the same file-based approach. Claude Code hooks (on stop, pre-tool-use, and prompt submit) write a JSON status (idle, running, or elicitation) to a status.json file using pure POSIX printf. The node reads this file every 10 seconds to report task status to the server.
What we use it for
One-shot tasks are the bread and butter. Small, well-defined tickets — add a feature flag, fix a layout bug, update an API call. The agent picks it up, writes the code, opens the PR. These are remarkably reliable.
Research tasks are surprisingly useful. “Look into how we handle push notifications and write up what you find.” The agent explores the codebase, reads the docs, and produces a summary. No code changes, just knowledge.
Scheduled agents are where it gets really interesting. We have two scheduling mechanisms:
- Recurring tasks are simple crons tied to a project. “Every Monday at 9am, run this prompt against this project.”
- Shared agents are global agents that can be subscribed to multiple projects. Define the agent once — say, “audit for accessibility violations” — and subscribe it to every project. Each project can add its own additional instructions. The scheduler skips if there’s already an active instance, preventing pile-ups.
Both run via Solid Queue’s recurring job system. Every minute, the scheduler checks what’s due and creates task instances. They go directly to ready and get picked up by the next available node.
Every morning there are PRs waiting — docs updated, code cleaned up, small improvements applied. It’s like having a tireless developer who only works the night shift.
We also monitor the main branch for CI failures and automatically schedule tasks to fix them.
What we learned
Convergent reconciliation beats event-driven. We already covered this, but it’s worth emphasizing. The polling model means the node has zero local state that matters. Kill it, restart it, deploy a new version and it just re-converges.
File-based communication is underrated. Claude signals completion by writing to a file. Session state is tracked via a JSON file. The credential helper reads a file. There’s a pattern here: when one process (Claude Code) needs to communicate with another (the node), files are the simplest reliable interface. No complex IPC, no API calls from inside the sandbox, no process coordination. The filesystem is the message bus.
Small tasks win. The agent’s success rate drops significantly as task complexity grows. A focused, well-scoped ticket with clear acceptance criteria works great. “Rewrite the authentication system” does not. We’ve gotten better at breaking work down into agent-sized pieces, which honestly makes our tickets better for humans too.
Overnight agents are a superpower. The scheduled agents produce a steady stream of small improvements that would never get prioritized by a human. Nobody is going to spend a morning updating docstrings or aligning code with style guides. But an agent running at 3 AM? It’s free leverage.
The feedback loop matters more than the first pass. The initial code generation is table stakes. What makes background agents actually useful is what happens after — conflict resolution, CI fixes, responding to review comments, ability to run the app and verify its output independently. That’s what turns a draft PR into a mergeable one.
Keep it boring. The whole system is a Rails monolith, SQLite, polling loops, and file-based signals. No Kubernetes, no message queues, no microservices. It runs our entire agent fleet on a handful of machines and it’s been remarkably stable. Every time we’ve been tempted to add complexity, we’ve been glad we didn’t.
What’s next
We’re continuing to iterate — chaining tasks with dependencies so complex work can be broken into steps, improving the scheduled agents, and finding more ways to keep humans in the loop without keeping them at their desks.
If you’re thinking about building something similar: start small. Get one agent running one task end-to-end. The infrastructure you think you need is probably less than what you actually need.