Integrations

Beakr connects to the tools your team already uses. Connectors sync on update so knowledge stays current -- no manual re-uploads, no stale documents.

Overview

Integrations are how external systems become Beakr memory. A connector handles authentication, scope, enumeration, download or API fetch, source-record tracking, and handoff into the ingestion and compiler pipeline.

Supported integrations

Each connector pulls data from a source platform, processes it through the ingestion pipeline, and writes structured knowledge base pages into the knowledge base. The table below lists every supported integration grouped by category.

Cloud storage

File-based connectors enumerate folders, download documents (PDFs, DOCX, PPTX, XLSX, images, CSVs, and more), extract text, chunk it, and embed it. When files change in the source, the connector re-ingests only the delta.

Platform	What syncs	Notes
Google Drive	All file types in selected drives or folders	Supports Shared Drives and My Drive. Google Docs exported as HTML for richer parsing.
Dropbox	Files and folders in selected paths	Business and personal accounts. Recursive enumeration with path filtering.
OneDrive	Files and folders from user or SharePoint sites	Integrated via Microsoft Graph API. Supports both personal and organizational accounts.
SharePoint	Document libraries and site pages	Site-level scoping. Lists and library content extracted and structured.
Box	Files and folders in selected directories	Enterprise Box accounts. Folder-level access control respected during enumeration.

Communication

Communication connectors use snapshot-based ingestion. Rather than syncing individual messages as documents, they capture conversations, threads, and events as temporal snapshots with speaker attribution preserved.

Platform	What syncs	Notes
Slack	Channel messages, threads, and reactions	Snapshot ingestion. Speaker names attached to each message. Thread context preserved.
Gmail	Email threads from selected labels or all mail	Thread-level ingestion. Sender, recipients, and timestamps extracted as metadata.
Microsoft Teams	Channel messages and replies	Snapshot ingestion. Team and channel hierarchy maintained. Speaker attribution preserved.
Outlook	Email messages and calendar events	Integrated via Microsoft Graph. Folder-level scoping available.

Project management

Project management connectors sync issues, pages, and workspaces into the knowledge base. Each item becomes a structured knowledge base page with metadata (status, assignee, labels) preserved.

Platform	What syncs	Notes
Jira	Issues, epics, and project metadata	Supports Jira Cloud. Issue descriptions, comments, status, and custom fields extracted.
Confluence	Spaces and pages	Full page content with attachments. Space-level scoping for selective sync.
Notion	Pages, databases, and nested content	Recursive page tree traversal. Database properties mapped to structured metadata.

Development

Platform	What syncs	Notes
GitHub	Repository files, READMEs, issues, and pull requests	Supports public and private repos. Branch-level scoping. Markdown files prioritized.

Calendar

Platform	What syncs	Notes
Google Calendar	Events, attendees, descriptions, and meeting notes	Snapshot-based. Temporal metadata extracted for timeline queries.

Research tools

Platform	What syncs	Notes
Zotero	Library items, PDFs, and annotations	Group and personal libraries. Citation metadata preserved.
Overleaf	LaTeX projects and compiled documents	Project-level sync. Compiled PDF and source .tex files both ingested.

Scientific platforms

Platform	What syncs	Notes
Benchling Enterprise	Notebook entries, protocols, sequences, and registry entities	Enterprise API integration. Structured scientific data preserved with schema context.
Labguru	Experiments, protocols, and inventory records	ELN data extracted with experimental metadata and relationships intact.

Public databases

Public database connectors query external scientific and academic databases on demand and ingest results into the knowledge base. These do not require OAuth -- they use public APIs. Beakr has handlers for 25+ public and scientific sources; the table below groups representative coverage rather than listing every endpoint individually.

Database	What syncs	Notes
PubMed / PMC / bioRxiv	Article abstracts, full text where available, preprints, metadata, and MeSH terms	Search by keyword, author, PMID, DOI, or public identifier. Full citation metadata preserved.
ClinicalTrials.gov / NIH RePORTER	Trial records, endpoints, sponsors, funded projects, and linked publications	NCT, grant, investigator, and topic lookup. Trial phase, status, and funding context tracked.
UniProt / AlphaFold / PDB	Protein entries, structures, sequences, and functional annotations	Accession and structure lookup with cross-references to genes, pathways, and literature.
KEGG / Reactome / STRING	Pathways, interactions, compounds, and gene relationships	Pathway-level ingestion with graph-ready cross-references.
OpenAlex / OpenNeuro / USPTO	Scholarly works, authors, institutions, datasets, and patent records	Broad academic and IP search. Metadata includes citations, topics, datasets, inventors, and assignees.
PubChem / ChEMBL / HMDB / openFDA	Molecules, bioactivity, metabolites, labels, adverse events, and recalls	Chemical and regulatory context for drug discovery and translational workflows.
Ensembl / ClinVar / GWAS Catalog / GDC / cBioPortal	Genes, variants, studies, cancer genomics, and cohort-level datasets	Variant and disease context with stable identifiers for downstream provenance.

How sync works

Every connector follows the same five-stage pipeline, regardless of the source platform. This consistency means that once data enters Beakr, it is structured, searchable, and attributed the same way whether it came from Slack or a PDF in Google Drive.

1. Connector configuration

Each connector is configured with a scope and a mode:

Setting	Options	Description
Scope	`user`, `group`, `org`	Determines who can access the synced data. User-scoped connectors are private. Org-scoped connectors share data across the organization.
Mode	`all`, `restricted`	In `all` mode, the connector syncs everything it has access to. In `restricted` mode, you select specific folders, channels, or items to sync.

2. OAuth authentication

Authentication is handled via secure OAuth management. When a user connects a platform, they authorize through the provider's standard OAuth flow. Beakr manages token storage, refresh, and rotation securely -- OAuth tokens are never exposed to end users or stored alongside application data.

3. Enumeration

Provider-specific handlers list all available items from the source. Each provider has its own enumeration logic (e.g., listing files in Drive, channels in Slack, pages in Confluence). The enumeration step produces a manifest of items to ingest, filtered by the connector's scope and mode settings.

4. Ingestion

Each enumerated item is downloaded, parsed, chunked, and embedded. File-based items go through ingest_file_item (binary download, text extraction, chunking). Document-based items go through ingest_document_item (API-fetched content, structured parsing). Both paths produce the same output: chunked text with source metadata ready for the knowledge base.

5. Compilation

The compilation step takes ingested chunks and creates or updates structured knowledge base pages. New information is merged with existing pages. Attribution is tracked at the paragraph level so every statement can be traced back to the source document and connector that produced it.

Continuous sync

Connectors re-enumerate on a schedule and in response to webhook triggers where supported. Only changed or new items are re-ingested. Deleted items are flagged and their knowledge base contributions marked accordingly. This means the knowledge base stays current without manual intervention.

Communication connectors

Slack, Microsoft Teams, and Google Calendar use a distinct ingestion model: snapshot-based ingestion. Instead of treating each message as a separate document, the system captures conversations and events as cohesive snapshots.

Speaker attribution -- every message is tagged with the speaker's name and role, so the knowledge base knows who said what.
Thread context -- replies are grouped with their parent message. A Slack thread about a decision becomes a single, coherent knowledge unit rather than a bag of isolated messages.
Temporal metadata -- timestamps are extracted and indexed, enabling timeline queries like "what did the team discuss about the trial protocol in March?"
Channel and team scoping -- you choose which channels or teams to sync. Private channels require explicit opt-in.

Connector health

Every connector has a health_status that Beakr monitors continuously. This lets administrators catch issues before they cause knowledge gaps.

Status	Meaning	Action
`healthy`	Connector is syncing normally. Last sync completed without errors.	None required.
`degraded`	Some items failed to sync but the connector is still partially operational.	Review error logs. Often caused by permission changes on individual files or folders.
`expired`	OAuth token has expired and could not be refreshed automatically.	Re-authenticate through the connector settings to issue a new token.
`revoked`	Access was revoked at the source platform (e.g., app uninstalled from Slack workspace).	Re-authorize the integration from the source platform, then re-authenticate in Beakr.

Health status is surfaced in the Beakr dashboard and through the kb_stats MCP tool. Degraded or expired connectors trigger alerts so your team can resolve the issue promptly.

Scope and permissions

Connectors are scoped at three levels, and the scope determines both who can configure the connector and who can access the resulting knowledge:

User scope -- the connector and its synced data are visible only to the user who created it. Useful for personal email or private cloud storage.
Group scope -- data is shared with members of a specific group. Useful for team-level Slack channels or shared project drives.
Org scope -- data is available to the entire organization. Appropriate for company-wide knowledge sources like Confluence spaces or shared Google Drives.

All connector data respects Beakr's Row Level Security (RLS) policies at the database level. There is no application-side filtering -- tenant isolation is enforced by PostgreSQL RLS policies on every query. A user in one organization can never access connector data from another organization, regardless of how the application code is structured.

Custom integrations

The connector framework is designed for extensibility. Each provider is implemented as a handler module that follows a standard interface: enumerate items, download or fetch content, and yield structured documents.

Adding a new provider typically takes 48 hours from start to tested deployment. The framework handles OAuth, job scheduling, error handling, health tracking, and knowledge compilation. The provider handler only needs to implement the platform-specific enumeration and content-fetching logic.

If your team uses a platform not listed above, contact us. Most SaaS integrations can be built and deployed within a week.

Security

Integration security is designed around the principle that Beakr should never hold credentials it does not need.

User authentication remains separate -- Beakr application login is handled through Clerk; connector OAuth grants are managed separately and scoped only to the connected provider.
OAuth tokens managed securely -- all token lifecycle management (issuance, storage, refresh, and rotation) is handled through secure, certified infrastructure.
No credential exposure -- OAuth tokens, API keys, and refresh tokens are stored securely and never exposed to end users or application-level code.
Encrypted in transit and at rest -- all data pulled from connectors is encrypted with TLS 1.2+ in transit and AES-256 at rest in Beakr's infrastructure.
Scoped access -- connectors request only the permissions needed for the configured scope. A user-scoped Google Drive connector does not request access to other users' files.
Audit trail -- every sync event, including what was ingested and when, is logged and attributable.