The Beakr agent engine

The engine that powers every conversation, workflow, and background task in Beakr. A streaming execution engine that runs AI agents in reproducible steps -- with built-in support for multi-agent handoffs, context compression, and real-time streaming.

This guide walks through the internals of the Beakr agent engine: how a chat request becomes a streaming graph execution, how agents hand work off to each other, how state is checkpointed, and how conversations stay within context windows even across long-running sessions. The engine is deterministic because it batches work into ordered super-steps, and resumable because every super-step produces a checkpoint that long workflows can pause, retry, or resume from.

In this guide

Pregel executor

Bulk Synchronous Parallel super-step loop -- plan, execute, update, checkpoint, repeat.

ReAct graph

The compiled graph: prepare, call_llm, route, execute_tools, handoff.

Agent catalog

Four stateless configs: researcher, analyst, coder, reviewer.

Meta-tools & handoff

delegate_to_agent, war_room, agent_pipeline, debate_ensemble.

Tool system

Registry, policy engine, context injection, parallel dispatch.

Checkpointing

Two levels -- super-step for ask_user, thread for cross-run continuity.

Compression

Four-layer working memory -- strip, surgical clear, summarize.

Key design decisions

Pregel / BSP model

Deterministic execution with built-in parallelism and checkpointing. Super-steps make every run reproducible and resumable.

Stateless agent configs

Agents are data (agent configs), not class hierarchies. Composable, swappable, testable.

Four-layer compression

Avoid expensive LLM summarization until absolutely necessary -- strip images, surgically clear compactable tool results, then summarize as last resort.

DB-writing tools serialize

Database-writing operations run one at a time for safety. Read-only operations run in parallel for speed.

Native streaming

The model's response streams to the frontend token-by-token with zero buffering -- the frontend sees tokens as the model produces them.

Two checkpoint levels

Mid-run checkpoints let the agent pause for user input and resume exactly where it left off. Thread-level checkpoints preserve conversation history across sessions.