Architecture Overview
Halos is a multi-agent AI platform with a model-agnostic native runtime. Agents run in-process on the Next.js server using the Vercel AI SDK — no containers, no external runtimes, no Fly.io machines.
System Diagram
┌─────────────────────────────────────────────────────────┐
│ Frontend (Next.js) │
│ useChat() ←→ /api/agents/[id]/chat (streaming) │
│ Home page ←→ /api/home/chat (group chat) │
│ Telegram ←→ /api/integrations/telegram/webhook │
└──────────────────────────┬──────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────┐
│ Agent Runtime (in-process) │
│ │
│ ┌────────────┐ ┌─────────────┐ ┌──────────────────┐ │
│ │ Providers │ │ Tools │ │ Prompt Builder │ │
│ │ │ │ │ │ │ │
│ │ Anthropic │ │ web_search │ │ Identity + Trust │ │
│ │ OpenAI │ │ web_scrape │ │ + Training Data │ │
│ │ Google │ │ emails │ │ + User Context │ │
│ │ (any) │ │ calendar │ │ + Knowledge │ │
│ └────────────┘ │ notion │ └──────────────────┘ │
│ │ knowledge │ │
│ ┌────────────┐ │ todos │ ┌──────────────────┐ │
│ │ Chat Loop │ │ send_mail │ │ Usage Tracking │ │
│ │ streamText │ └─────────────┘ │ (tokens + cost) │ │
│ │ genText │ └──────────────────┘ │
│ └────────────┘ │
└──────────────────────────┬──────────────────────────────┘
│
┌──────────────────────────▼──────────────────────────────┐
│ External Services │
│ │
│ Supabase · Pinecone · Resend · Apify · Firecrawl │
│ Brave · ElevenLabs · Google · Notion · Slack · GitHub │
└──────────────────────────────────────────────────────────┘
How It Works
When a user sends a message to an agent:
- Auth + fetch — Validate the user, load agent config, training data, and user profile from Supabase
- Knowledge search — Query Pinecone for relevant context from the user's personal knowledge base
- Resolve model — Map the agent's
personality_modelto a Vercel AI SDK provider (Anthropic, OpenAI, Google) - Build tools — Compose the toolset dynamically based on connected integrations and available API keys
- Build prompt — Construct the system prompt from agent identity, trust tier, training data, user context, and knowledge
- Stream — Call
streamText()with the model, system prompt, messages, and tools. The SDK handles the tool-use loop automatically (LLM calls tool → execute → feed result → repeat) - Background — After streaming completes, extract memories and store them in Pinecone for future conversations
Agent Lifecycle
Agents are created instantly. No provisioning, no containers, no job queue.
| Action | What Happens |
|--------|-------------|
| Create | DB insert → status: active → ready immediately |
| Sleep | Status → sleeping (chat returns 503) |
| Wake | Status → active (instant) |
| Delete | Status → destroyed |
Key Principles
- Model-agnostic — Switch between Claude, GPT, Gemini, or any OpenAI-compatible model per agent
- Native tools — No container shell/browser. Tools are TypeScript functions that call APIs directly
- In-process — Agents run inside Next.js API routes on Vercel. No infrastructure to manage
- Stateless compute, persistent memory — Agent code is stateless; all context comes from Supabase + Pinecone
- Dynamic tooling — Each request gets only the tools the user has connected (no Notion OAuth = no Notion tools)