The knowledge base compiler
The compiler is a sandbox-first agent that analyzes raw source files -- reading documents, searching text, running code, and interpreting images -- then writes structured pages with [[links]], temporal metadata, and cited sources. It is one of three knowledge-layer agents: the compiler creates and updates pages from sources, capture proposes knowledge from conversations, and health keeps the graph connected and current.
Raw files enter the sandbox, the agent reads them with its toolset, and structured pages come out the other side.
Page types
The compiler assigns a page_type to every page it creates. Page types carry semantic meaning that affects how content is structured, how links are weighted, and how the page appears in search results. The supported compiler set now matches ingestion: topic, person, organization, decision, meeting, overview, research_note, experiment, protocol, compound, dataset, and initiative.
topic
A concept, technology, process, or domain area. The most common page type.
person
An individual -- a team member, collaborator, author, or external contact.
organization
A company, institution, lab, partner, or vendor.
decision
A specific decision with context, rationale, alternatives considered, and outcome.
meeting
A meeting, discussion, or synchronous event with attendees, agenda, and outcomes.
overview
A high-level summary that synthesizes information from multiple related pages.
research_note
A research observation, analysis note, literature summary, or interpretation that should remain traceable to source context.
experiment
A specific study, assay, run, test, or trial-like activity with objective, setup, conditions, results, interpretation, and follow-ups.
protocol
A repeatable method, SOP, process, or workflow with purpose, inputs, steps, parameters, controls, outputs, and version changes.
compound
A drug, molecule, biologic, candidate, or formulation with aliases, modality, target or mechanism, indication, status, and evidence.
dataset
A result set, analysis output, assay readout, stability data, or measurement collection with source, scope, methods, findings, and limitations.
initiative
An ongoing program or body of work with goals, scope, owners, linked compounds, experiments, protocols, datasets, decisions, status, and next steps.
index
Index pages are created by maintenance/direct-write workflows when needed, but they are not in the compiler ingestion enum.
Temporal metadata
The compiler extracts date-anchored events from source material -- decisions, milestones, deadlines, experiment dates -- and stores them as structured temporal metadata on each page. This enables timeline queries, chronological ordering, and staleness detection.
event_start or left null.day, month, or year. A source that says "Q1 2024" produces month precision; one that says "March 15, 2024" produces day. Downstream queries use precision to avoid false specificity.Extraction guidance
An extraction profile tells the compiler what to focus on when processing source material. Profiles are scoped hierarchically: organization → project. A project-level profile overrides the org profile.
Profile fields include goals (what the knowledge base should help with), focus areas (topics to prioritize during extraction), and key entities (people, organizations, and concepts that should always get their own pages).
For example, a biotech research project might set focus areas to "clinical trial protocols, regulatory submissions, safety data" and key entities to researchers and partner organizations. The compiler uses these signals to decide what deserves a dedicated page, what level of detail to extract, and which links to create.