Architecture Overview

Halos is a multi-agent AI platform with a model-agnostic native runtime. Agents run in-process on the Next.js server using the Vercel AI SDK — no containers, no external runtimes, no Fly.io machines.


System Diagram

┌─────────────────────────────────────────────────────────┐
│                     Frontend (Next.js)                   │
│  useChat() ←→ /api/agents/[id]/chat  (streaming)       │
│  Home page ←→ /api/home/chat         (group chat)      │
│  Telegram  ←→ /api/integrations/telegram/webhook       │
└──────────────────────────┬──────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────┐
│               Agent Runtime (in-process)                 │
│                                                          │
│  ┌────────────┐  ┌─────────────┐  ┌──────────────────┐  │
│  │  Providers  │  │    Tools    │  │  Prompt Builder  │  │
│  │            │  │             │  │                  │  │
│  │  Anthropic │  │  web_search │  │ Identity + Trust │  │
│  │  OpenAI    │  │  web_scrape │  │ + Training Data  │  │
│  │  Google    │  │  emails     │  │ + User Context   │  │
│  │  (any)     │  │  calendar   │  │ + Knowledge      │  │
│  └────────────┘  │  notion     │  └──────────────────┘  │
│                  │  knowledge  │                         │
│  ┌────────────┐  │  todos      │  ┌──────────────────┐  │
│  │ Chat Loop  │  │  send_mail  │  │  Usage Tracking  │  │
│  │ streamText │  └─────────────┘  │  (tokens + cost) │  │
│  │ genText    │                   └──────────────────┘  │
│  └────────────┘                                         │
└──────────────────────────┬──────────────────────────────┘
                           │
┌──────────────────────────▼──────────────────────────────┐
│                   External Services                      │
│                                                          │
│  Supabase · Pinecone · Resend · Apify · Firecrawl       │
│  Brave · ElevenLabs · Google · Notion · Slack · GitHub   │
└──────────────────────────────────────────────────────────┘

How It Works

When a user sends a message to an agent:

  1. Auth + fetch — Validate the user, load agent config, training data, and user profile from Supabase
  2. Knowledge search — Query Pinecone for relevant context from the user's personal knowledge base
  3. Resolve model — Map the agent's personality_model to a Vercel AI SDK provider (Anthropic, OpenAI, Google)
  4. Build tools — Compose the toolset dynamically based on connected integrations and available API keys
  5. Build prompt — Construct the system prompt from agent identity, trust tier, training data, user context, and knowledge
  6. Stream — Call streamText() with the model, system prompt, messages, and tools. The SDK handles the tool-use loop automatically (LLM calls tool → execute → feed result → repeat)
  7. Background — After streaming completes, extract memories and store them in Pinecone for future conversations

Agent Lifecycle

Agents are created instantly. No provisioning, no containers, no job queue.

| Action | What Happens | |--------|-------------| | Create | DB insert → status: active → ready immediately | | Sleep | Status → sleeping (chat returns 503) | | Wake | Status → active (instant) | | Delete | Status → destroyed |


Key Principles

  • Model-agnostic — Switch between Claude, GPT, Gemini, or any OpenAI-compatible model per agent
  • Native tools — No container shell/browser. Tools are TypeScript functions that call APIs directly
  • In-process — Agents run inside Next.js API routes on Vercel. No infrastructure to manage
  • Stateless compute, persistent memory — Agent code is stateless; all context comes from Supabase + Pinecone
  • Dynamic tooling — Each request gets only the tools the user has connected (no Notion OAuth = no Notion tools)