aiinfrastructuremulti-agentarchitecture

How I Built a Multi-Agent AI Operating System

Every AI team eventually rebuilds the same infrastructure — routing, context, memory. Here's how I built a system that handles all of it, running daily across multiple production projects.

KBMarch 22, 2026

How I Built a Multi-Agent AI Operating System

Every AI team eventually rebuilds the same infrastructure. Routing logic. Context injection. Memory that persists between runs. Most teams hardcode it, live with the debt, then rebuild it six months later when the requirements change.

I built something different. BrewCortex is a multi-agent operating system I run daily across multiple software projects. It's not a framework, not a hosted product — it's a custom AI execution layer that handles intelligent dispatch, hierarchical context resolution, and accumulated agent memory. This post walks through why it exists, how it works, and what patterns made the difference.

The Problem

When you start building seriously with AI agents, three problems surface quickly.

Routing. Which agent handles which task? The naive answer is a big if/else chain or a prompt that says "figure it out." Both fail under real load. The if/else chain becomes unmaintainable. The "figure it out" approach is slow, expensive, and inconsistent at boundaries.

Context. Agents need to know things: which project they're working on, what conventions apply, what tools are available, what the user's preferences are. Injecting this manually into every prompt doesn't scale. Hardcoding it per-agent creates drift. You need a resolution layer.

Memory. A session-level context window is not memory. Memory is accumulated knowledge that improves future runs. Without it, every agent invocation starts cold. With it, the system gets smarter over time.

These three problems aren't independent — they're a stack. Memory informs context. Context informs routing. Routing determines what gets executed. If you solve them independently, you get fragile integrations. Solve them as a unified layer and you get a system.

The Architecture

BrewCortex has four logical layers.

System Diagram

At a high level, the system looks like this:

Task Input
    |
    v
[ Routing Layer ]
  Fast rules table (25 rules)
  -> If ambiguous: LLM router (structured JSON output)
    |
    v
[ Context Resolution ]
  Global defaults -> Project config -> Agent config -> Memory -> Session
    |
    v
[ Agent Execution ]
  Selected agent receives resolved context + task
    |
    v
[ Memory Write ]
  Agent appends non-obvious findings to memory layer
    |
    v
[ PR / Output ]
  Commit, branch, pull request — fully autonomous

Each layer is independent and replaceable. The routing layer doesn't know about memory. The context layer doesn't know about routing. They compose cleanly because they're designed as separate concerns.

1. Dynamic Agent Registry

Each agent is defined by its capabilities, not its name. The registry maps work types (code changes, content, research, infrastructure, design) to the appropriate agent, model, and cost tier. New agents are registered in one place; routing picks them up automatically.

This matters because hardcoded dispatch is a maintenance trap. Every time you add an agent, you have to update every place that might need to call it. With a registry-driven approach, the orchestrator discovers agents dynamically at decision time.

2. Hybrid Routing

The routing layer uses a two-pass approach. The first pass is fast: a rules table (~25 rules in the current implementation) that covers known work types. Pattern-match on the task description, get an agent back in microseconds.

The second pass is LLM-powered and handles ambiguous cases — tasks that match multiple rules, novel task types, or anything that crosses domain boundaries. The LLM gets the task, the registry of available agents, and a prompt to return structured JSON: agent selection, confidence, reasoning, model recommendation.

The result is a routing layer that's fast for common cases and intelligent for edge cases, without being expensive across the board.

3. Hierarchical Context Resolution

Context is resolved in five layers, each overriding the previous:

Global defaults — applies everywhere (coding standards, commit conventions, output format preferences)
Project config — per-project context injected from the project's CLAUDE.md
Agent config — agent-specific instructions, tools, and constraints
Memory — accumulated expertise from prior sessions
Session overrides — ephemeral context injected at invocation time

When an agent starts work, it doesn't receive a manually assembled prompt. It receives the resolved output of this stack, with the right context for the right job, automatically.

4. Agent Memory

Memory in BrewCortex is a four-layer model:

Project memory — decisions, patterns, and conventions specific to a project
Agent memory — domain expertise accumulated across sessions (routing corrections, gotchas, process learnings)
Session memory — what happened in the current invocation
Cross-project memory — patterns that apply globally across all projects

Memory is written explicitly — agents append findings to structured markdown files at the end of each session. It's not automatic vector embedding of everything (that creates noise). It's curated, human-readable, and searchable. The signal-to-noise ratio is high because the agents decide what's worth writing.

What Actually Made It Work

Three patterns made the difference between a proof-of-concept and something that runs reliably in production every day.

Explicit memory writes. The temptation is to auto-capture everything. Don't. Have agents append only non-obvious findings: routing corrections, project gotchas, process improvements. When memory is curated, agents actually read it. When it's everything, nobody reads it.

Sequential single-ticket work. Parallel agent execution sounds powerful. In practice, when multiple agents write to the same working tree, you get branch collisions, stale index problems, and hard-to-debug state corruption. The current model is sequential: one ticket, one agent, one branch, one PR. Simple. Reliable.

Results

BrewCortex runs across four active repositories. In a typical week, it opens and closes 15-20 GitHub issues autonomously — feature additions, bug fixes, content creation, infrastructure work. Each one goes through the full lifecycle: claim, branch, execute, PR, review.

Qualitatively: I spend almost no time on task dispatch or context assembly. I spend time on architecture decisions, review, and direction. The system handles the execution.

We Build This for Clients Too

BrewCortex is our internal proof. Every capability described here — intelligent routing, hierarchical context resolution, accumulated agent memory — is a deliverable we build for clients on their infrastructure.

If your team is hitting the walls described above (brittle routing, hardcoded context, agents that start cold every run), we can fix that. It's not a subscription. It's a custom build, delivered on your stack.

See our AI Infrastructure service →

← Back to Blog