BeakrGo to website

Ingestion

Five pathways feed the knowledge base. Each one ends at the same checkpoint -- an ingestion log entry recording what changed, what it cost, and which agent wrote it. Multi-modal data normalizes into a common source-record model before compilation. The compiler then produces domain-specific pages for experiments, protocols, compounds, datasets, initiatives, people, organizations, decisions, and more.

CONNECTORSUPLOADSASK AGENT CAPTUREMANUAL CREATIONHEALTH MAINTENANCEKNOWLEDGEBASEexperimentcompoundprotocoldatasetinitiative

Five entry points, one knowledge base. Every ingest is attributed, cost-tracked, deduplicated, and compiled into the right page type. The diagram highlights domain-specific outputs; the full supported set is listed below.

What ingestion can create

Ingestion no longer ends at generic document pages. The compiler chooses a page type from the current knowledge model and captures the fields that matter for that type.

All page types
Topics, people, organizations, decisions, meetings, overviews, research notes, experiments, protocols, compounds, datasets, and initiatives.
Original knowledge pages
Topics, people, organizations, decisions, meetings, overviews, and research notes remain first-class outputs.
Domain pages
Experiments, protocols, compounds, datasets, and initiatives add life-sciences-specific structure on top of the original model.
Topic / org pages
Capture concepts, programs, companies, labs, partners, competitors, regulators, aliases, and how they connect to the rest of the graph.
Decision / meeting pages
Capture what was discussed, what was decided, who was involved, open questions, and follow-up actions.
Experiment pages
Capture objective or hypothesis, dates, setup, materials or compounds, conditions, results, interpretation, and follow-ups.
Compound pages
Capture aliases, modality, target or mechanism, formulation or dose, indication, status, and supporting evidence.
Dataset pages
Capture source, scope, schema or measures, generation method, date or version, key findings, limitations, and location.
Resource layer
The engine also tracks underlying resources such as artifacts, protocols, experiment runs, samples, and instruments for ACLs, provenance, and lineage.

The five pathways

PathwayMechanismWrite mode
ConnectorsBackground enumeration + file and multimodal ingest via OAuth. Provider-specific. 3-tier fan-out.async
UploadsDirect file upload -> parse -> S3 -> compile.async
Ask Agent CaptureAfter substantive agent responses, the capture agent evaluates the synthesis for knowledge-base-worthiness.proposal
Manual CreationUsers or agents create pages via the knowledge base proposal tool.proposal
Health MaintenanceThe background health agent creates index pages to organize orphans and reparent drifting pages.direct

Connector flow: the 3-tier fan-out

TIER 1 TRIGGERSync triggerTIER 2 ENUMERATElist items / scope/ quotaTIER 3 ITEMdownload + S3TIER 3 ITEMdownload + S3TIER 3 ITEMdownload + S3COMPILEcompiler agentINGESTIONLOG

One trigger fans out to N items. Each item downloads, hits S3, then goes to the compiler for structuring.

Deduplication via source records

Every item ingested from a connector is tracked as a source record. The combination of provider and external ID uniquely identifies each source item, preventing re-ingestion of unchanged content and enabling delta sync when checksums are populated.

Source system
Which service the content came from (Slack, Drive, Notion, etc.)
Source identifier
The service's native identifier for the document.
Content hash
Detects whether content has changed since last sync.
Metadata snapshot
Last modified time, owner, size, file type -- used for staleness detection.
Source link
Join record linking a source record to the knowledge base page(s) it produced. One source can produce multiple pages; one page can have multiple sources.
Delta sync and change awareness

The infrastructure exists but isn't fully wired. Source records store fingerprints, modified timestamps, and source-to-page joins so Beakr can identify what changed, when it changed, and which downstream pages or graph edges may need re-evaluation. See Connector sync for the full sync lifecycle.