BeakrGo to website

Compression

Working memory compression is how Beakr keeps long-running conversations inside a model's context window. The WorkingMemory object runs four layers in order -- each cheaper than the next is expensive -- and stops as soon as the token budget is under threshold.

The four layers

TOKEN BUDGETundernearoverway over0Image strippingremove images from old user turns -- keep last 2 turns intactcost: ~01Surgical clear -- compactable toolsclear cached tool results -- keep a summary as breadcrumbcost: ~02Surgical clear -- protected tools, old turnsclear old protected results (> N turns ago) -- REPL / writes always keptcost: ~03LLM summarization -- last resortsummarize old messages, archive full details to a file the agent can re-readcost: $$

Each layer runs only if the previous layer didn't bring tokens back under threshold. In practice, layer 3 fires rarely -- layers 0-2 handle the common case.

Checkpoint boundary

Compression respects a checkpoint boundary: messages from before the current run's first checkpoint are eligible for clearing; messages from the current run are protected.

Protected tools

Protected tools are calls whose outputs cannot be safely summarized away because later reasoning may depend on the exact result, a returned artifact identifier, or a durable write side effect. Beakr preserves these results longer than ordinary search snippets or cached reads.