Ingestion
Five pathways feed the knowledge base. Each one ends at the same checkpoint -- an ingestion log entry recording what changed, what it cost, and which agent wrote it. Multi-modal data normalizes into a common source-record model before compilation. The compiler then produces domain-specific pages for experiments, protocols, compounds, datasets, initiatives, people, organizations, decisions, and more.
Five entry points, one knowledge base. Every ingest is attributed, cost-tracked, deduplicated, and compiled into the right page type. The diagram highlights domain-specific outputs; the full supported set is listed below.
What ingestion can create
Ingestion no longer ends at generic document pages. The compiler chooses a page type from the current knowledge model and captures the fields that matter for that type.
The five pathways
| Pathway | Mechanism | Write mode |
|---|---|---|
| Connectors | Background enumeration + file and multimodal ingest via OAuth. Provider-specific. 3-tier fan-out. | async |
| Uploads | Direct file upload -> parse -> S3 -> compile. | async |
| Ask Agent Capture | After substantive agent responses, the capture agent evaluates the synthesis for knowledge-base-worthiness. | proposal |
| Manual Creation | Users or agents create pages via the knowledge base proposal tool. | proposal |
| Health Maintenance | The background health agent creates index pages to organize orphans and reparent drifting pages. | direct |
Connector flow: the 3-tier fan-out
One trigger fans out to N items. Each item downloads, hits S3, then goes to the compiler for structuring.
Deduplication via source records
Every item ingested from a connector is tracked as a source record. The combination of provider and external ID uniquely identifies each source item, preventing re-ingestion of unchanged content and enabling delta sync when checksums are populated.
The infrastructure exists but isn't fully wired. Source records store fingerprints, modified timestamps, and source-to-page joins so Beakr can identify what changed, when it changed, and which downstream pages or graph edges may need re-evaluation. See Connector sync for the full sync lifecycle.