Workers
SOMA has seven workers that process agent traces into organizational knowledge. Each worker has a specific role, operates on specific layers, and feeds results into the vault.
All seven workers feed the vault. The Policy Bridge reads from it to serve agents. The Governance API mediates human review of L3 proposals into L4 canon.
State Tracking
All pipeline workers (Harvester, Reconciler, Synthesizer, Cartographer) maintain a local state file in .soma/ to enable incremental processing — only new or changed entities are reprocessed on each cycle.
Entity-count change detection: Each worker tracks the vault's entity count in its state file. On startup, if the current count is lower than the saved count, the worker infers a vault restructuring (migration, manual cleanup) and resets its state for a full rescan. Normal writes (new entities) never trigger a reset because the count only increases.
Content hashing: Workers compute an MD5 hash of each entity's content. Unchanged entities are skipped on subsequent cycles. This allows frequent cycle intervals (60s for Harvester, 5min for Reconciler) without redundant processing.
Synthesizer deduplication: When the Synthesizer extracts insights from LLM analysis, it checks existing vault entities for fuzzy title matches (overlap coefficient ≥ 0.7):
| Outcome | Condition | Action |
|---|---|---|
| Skipped | Match found, no higher confidence or new evidence | No write |
| Superseded | Match found, higher confidence or new evidence | Existing entity updated |
| New | No match | New entity created in L3 |
A log line like [Soma Synthesizer] 10 skipped, 6 superseded, 0 new indicates healthy deduplication — the vault already contains the knowledge the LLM would extract.
Harvester
Purpose: Ingests execution traces, events, and full ExecutionGraph objects from agents into the vault.
| Property | Value |
|---|---|
| Layer affinity | L1 (archive) — write only |
| Cycle time | 60 seconds |
| Reads | AgentFlow ExecutionEvent, PatternEvent, ExecutionGraph; inbox files (.json, .jsonl, .md) |
| Writes | execution entities, agent profiles, decision entities (all L1) |
What It Does
The Harvester is the entry point for all data into SOMA. It processes three input types:
- ExecutionEvents — Summarized metrics from agent runs (duration, status, tool calls)
- PatternEvents — Process mining patterns detected by AgentFlow
- ExecutionGraphs — Full graph structures with nodes, edges, and trace events
For ExecutionGraph inputs, the Harvester extracts decisions from graph structure:
| Graph Structure | Decision Type | Captured Data |
|---|---|---|
tool node | tool_choice | Tool name, metadata, duration, outcome |
branched edge | branch | Selected branch, alternatives |
retried edge | retry | Retry count, final outcome |
subagent node | delegation | Subagent name, parent agent |
| Failed node | failure | Error message, failure path |
Guards and Safety
- Duplicate trace detection — Traces with a
trace_idalready in the vault are skipped - Stable decision IDs — Decision IDs are derived from
graph_id-node_id, making re-ingestion idempotent - Circuit breaker — Stops after 100 creates per run
- Pluggable inbox parsers — Custom parsers registered by file extension
Reconciler
Purpose: Maintains vault structural integrity by scanning for and fixing data quality issues.
| Property | Value |
|---|---|
| Layer affinity | L1 (archive) — write only |
| Cycle time | 5 minutes |
| Reads | All vault entities (cross-layer scan) |
| Writes | Fixed entities in L1, merge entities in L1 |
What It Does
The Reconciler scans the entire vault looking for structural problems:
- Missing fields — Required fields absent from entities
- Invalid types/statuses — Types or statuses not in the registry
- Broken wikilinks — References to entities that don't exist
- Orphan entities — Entities with no inbound references
- Stub entities — Empty or near-empty entity bodies
- Duplicates — Near-duplicate entries detected via overlap coefficient
Auto-fixes applied without human intervention:
- Type corrections (e.g.,
insightstoinsight) - Status alias mapping (e.g.,
donetocompleted) - Array type coercion (e.g., string tags to array)
For duplicates: the Reconciler uses overlap coefficient for near-duplicate detection, merges with multi-agent attribution, and resolves conflicts by keeping the newest entry (older gets superseded_by).
Guards and Safety
- Merge dedup — Won't create a merge entity if one already exists with the same
reconciled_fromsources - L1 only writes — Cannot modify L2/L3/L4 entities directly
Synthesizer
Purpose: Detects cross-agent patterns in L1 data and generates L3 proposals with confidence scores.
| Property | Value |
|---|---|
| Layer affinity | L3 (emerging) — write only |
| Cycle time | 1 hour |
| Reads | L1 execution, insight, agent, and decision entities |
| Writes | L3 proposals (insight, archetype, policy, synthesis entities) |
What It Does
The Synthesizer operates in three modes:
-
Entity synthesis (
synthesize()) — LLM-powered extraction from execution, insight, agent, and decision entities. Uses the configuredanalysisFnto identify patterns. -
L1 pattern synthesis (
synthesizeL3()) — Cross-agent content similarity patterns without LLM. Groups L1 entries by semantic similarity and proposes archetypes. -
Decision pattern synthesis (
synthesizeDecisions()) — Groups decisions by type and choice, detects behavioral patterns. If 5+ agents make the same tool choice, it proposes an archetype.
Confidence Scoring
| Signal | Score Contribution |
|---|---|
| Cross-agent corroboration (5+ agents) | >= 0.8 |
| Single-agent patterns | Capped at 0.5 |
| Per additional trace | +0.02 |
| Per additional agent | +0.15 |
Guards and Safety
- Self-exclusion — Entities tagged
synthesizedare excluded from the candidate pool (prevents the Synthesizer from processing its own output) - Circuit breaker — Stops after 100 proposals per run
Cartographer
Purpose: Maps relationships between entities, discovers archetypes via clustering, and detects contradictions.
| Property | Value |
|---|---|
| Layer affinity | L3 (emerging) — write only |
| Cycle time | On-change (triggered when vault changes) |
| Reads | All vault entities (for embedding), L3 proposals, L4 canon |
| Writes | L3 relationship proposals, archetype entities, contradiction entities |
What It Does
- Embed entities into the vector store (incremental, change-detected — only new/modified entities are re-embedded)
- Discover archetypes via BFS community detection on the entity graph
- Map relationships between entities sharing tags (proposed as L3 entries)
- Detect contradictions between L3 proposals and existing L4 canon
- Semantic search across all entities by vector similarity
Guards and Safety
- Self-reference guard — Won't propose relationships between entities it created (prevents circular references)
- Circuit breaker — Stops after 100 proposals per run
Decay Processor
Purpose: Manages entry lifecycle for ephemeral layers (L2 and L3), moving expired entries to L1.
| Property | Value |
|---|---|
| Layer affinity | Reads L2/L3, writes L1 |
| Cycle time | Per pipeline run |
| Reads | L2 entries, L3 entries, L3/L4 evidence links |
| Writes | New L1 entries (decayed copies), updated evidence references |
What It Does
- Moves expired L2 entries to L1 with
decayed_from: 'working' - Moves expired L3 entries to L1 with
decayed_from: 'emerging' - Skips promoted/rejected L3 entries (they never decay)
- Updates
evidence_linksin L3/L4 entries that pointed to decayed entries (no broken links) - Respects activity-based extension: reading an entry resets its
decay_attimer
Guards and Safety
- Evidence link preservation — Before removing a decayed entry, scans all L3/L4 for references and updates them to the new
decayed-{oldId}location - Never touches L1 or L4 — L1 entries are permanent archive; L4 entries are ratified canon
Policy Bridge
Purpose: Read-only query interface that routes agent requests to the appropriate knowledge layer based on intent.
| Property | Value |
|---|---|
| Layer affinity | READ all layers |
| Cycle time | On-demand (per agent query) |
| Reads | L1, L2, L3, L4 |
| Writes | Nothing — strictly read-only |
What It Does
Agents query the Policy Bridge with an intent, and the bridge routes to the correct layer:
| Intent | Layer | Semantic Weight |
|---|---|---|
enforce | L4 | mandatory |
advise | L3 | advisory |
brief | L2 | contextual |
route | L1 | historical |
all | L1-L4 | Stratified |
Every result includes source_layer and semantic_weight metadata so agents know how to treat the information.
See Policy Bridge architecture for full details.
Governance API
Purpose: Human-in-the-loop review for promoting L3 proposals to L4 canon.
| Property | Value |
|---|---|
| Layer affinity | Reads L3, writes L4 |
| Cycle time | On-demand (human-triggered) |
| Reads | L3 pending entries, L1 evidence chains |
| Writes | L4 canon entries |
What It Does
list_pending()— Returns L3 entries with statuspending, sorted by confidence descendingpromote(entryId, reviewerId)— Creates L4 entry, marks L3 aspromotedreject(entryId, reviewerId, reason)— Marks L3 asrejectedwith reasonget_evidence(entryId)— Returns the L3 entry with its full evidence chain (linked L1 traces)
Guards and Safety
- L2 entries cannot be promoted — Returns error
- Already-promoted/rejected entries cannot be re-promoted — Returns error
- Only pending L3 entries are eligible — Status must be
pending - L4 is write-only through Governance — No other worker can write to L4