Workers

SOMA has seven workers that process agent traces into organizational knowledge. Each worker has a specific role, operates on specific layers, and feeds results into the vault.

All seven workers feed the vault. The Policy Bridge reads from it to serve agents. The Governance API mediates human review of L3 proposals into L4 canon.

State Tracking

All pipeline workers (Harvester, Reconciler, Synthesizer, Cartographer) maintain a local state file in .soma/ to enable incremental processing — only new or changed entities are reprocessed on each cycle.

Entity-count change detection: Each worker tracks the vault's entity count in its state file. On startup, if the current count is lower than the saved count, the worker infers a vault restructuring (migration, manual cleanup) and resets its state for a full rescan. Normal writes (new entities) never trigger a reset because the count only increases.

Content hashing: Workers compute an MD5 hash of each entity's content. Unchanged entities are skipped on subsequent cycles. This allows frequent cycle intervals (60s for Harvester, 5min for Reconciler) without redundant processing.

Synthesizer deduplication: When the Synthesizer extracts insights from LLM analysis, it checks existing vault entities for fuzzy title matches (overlap coefficient ≥ 0.7):

Outcome	Condition	Action
Skipped	Match found, no higher confidence or new evidence	No write
Superseded	Match found, higher confidence or new evidence	Existing entity updated
New	No match	New entity created in L3

A log line like [Soma Synthesizer] 10 skipped, 6 superseded, 0 new indicates healthy deduplication — the vault already contains the knowledge the LLM would extract.

Harvester

Purpose: Ingests execution traces, events, and full ExecutionGraph objects from agents into the vault.

Property	Value
Layer affinity	L1 (archive) — write only
Cycle time	60 seconds
Reads	AgentFlow `ExecutionEvent`, `PatternEvent`, `ExecutionGraph`; inbox files (`.json`, `.jsonl`, `.md`)
Writes	`execution` entities, `agent` profiles, `decision` entities (all L1)

What It Does

The Harvester is the entry point for all data into SOMA. It processes three input types:

ExecutionEvents — Summarized metrics from agent runs (duration, status, tool calls)
PatternEvents — Process mining patterns detected by AgentFlow
ExecutionGraphs — Full graph structures with nodes, edges, and trace events

For ExecutionGraph inputs, the Harvester extracts decisions from graph structure:

Graph Structure	Decision Type	Captured Data
`tool` node	`tool_choice`	Tool name, metadata, duration, outcome
`branched` edge	`branch`	Selected branch, alternatives
`retried` edge	`retry`	Retry count, final outcome
`subagent` node	`delegation`	Subagent name, parent agent
Failed node	`failure`	Error message, failure path

Guards and Safety

Duplicate trace detection — Traces with a trace_id already in the vault are skipped
Stable decision IDs — Decision IDs are derived from graph_id-node_id, making re-ingestion idempotent
Circuit breaker — Stops after 100 creates per run
Pluggable inbox parsers — Custom parsers registered by file extension

Reconciler

Purpose: Maintains vault structural integrity by scanning for and fixing data quality issues.

Property	Value
Layer affinity	L1 (archive) — write only
Cycle time	5 minutes
Reads	All vault entities (cross-layer scan)
Writes	Fixed entities in L1, merge entities in L1

What It Does

The Reconciler scans the entire vault looking for structural problems:

Missing fields — Required fields absent from entities
Invalid types/statuses — Types or statuses not in the registry
Broken wikilinks — References to entities that don't exist
Orphan entities — Entities with no inbound references
Stub entities — Empty or near-empty entity bodies
Duplicates — Near-duplicate entries detected via overlap coefficient

Auto-fixes applied without human intervention:

Type corrections (e.g., insights to insight)
Status alias mapping (e.g., done to completed)
Array type coercion (e.g., string tags to array)

For duplicates: the Reconciler uses overlap coefficient for near-duplicate detection, merges with multi-agent attribution, and resolves conflicts by keeping the newest entry (older gets superseded_by).

Guards and Safety

Merge dedup — Won't create a merge entity if one already exists with the same reconciled_from sources
L1 only writes — Cannot modify L2/L3/L4 entities directly

Synthesizer

Purpose: Detects cross-agent patterns in L1 data and generates L3 proposals with confidence scores.

Property	Value
Layer affinity	L3 (emerging) — write only
Cycle time	1 hour
Reads	L1 execution, insight, agent, and decision entities
Writes	L3 proposals (insight, archetype, policy, synthesis entities)

What It Does

The Synthesizer operates in three modes:

Entity synthesis (synthesize()) — LLM-powered extraction from execution, insight, agent, and decision entities. Uses the configured analysisFn to identify patterns.
L1 pattern synthesis (synthesizeL3()) — Cross-agent content similarity patterns without LLM. Groups L1 entries by semantic similarity and proposes archetypes.
Decision pattern synthesis (synthesizeDecisions()) — Groups decisions by type and choice, detects behavioral patterns. If 5+ agents make the same tool choice, it proposes an archetype.

Confidence Scoring

Signal	Score Contribution
Cross-agent corroboration (5+ agents)	>= 0.8
Single-agent patterns	Capped at 0.5
Per additional trace	+0.02
Per additional agent	+0.15

Guards and Safety

Self-exclusion — Entities tagged synthesized are excluded from the candidate pool (prevents the Synthesizer from processing its own output)
Circuit breaker — Stops after 100 proposals per run

Cartographer

Purpose: Maps relationships between entities, discovers archetypes via clustering, and detects contradictions.

Property	Value
Layer affinity	L3 (emerging) — write only
Cycle time	On-change (triggered when vault changes)
Reads	All vault entities (for embedding), L3 proposals, L4 canon
Writes	L3 relationship proposals, archetype entities, contradiction entities

What It Does

Embed entities into the vector store (incremental, change-detected — only new/modified entities are re-embedded)
Discover archetypes via BFS community detection on the entity graph
Map relationships between entities sharing tags (proposed as L3 entries)
Detect contradictions between L3 proposals and existing L4 canon
Semantic search across all entities by vector similarity

Guards and Safety

Self-reference guard — Won't propose relationships between entities it created (prevents circular references)
Circuit breaker — Stops after 100 proposals per run

Decay Processor

Purpose: Manages entry lifecycle for ephemeral layers (L2 and L3), moving expired entries to L1.

Property	Value
Layer affinity	Reads L2/L3, writes L1
Cycle time	Per pipeline run
Reads	L2 entries, L3 entries, L3/L4 evidence links
Writes	New L1 entries (decayed copies), updated evidence references

What It Does

Moves expired L2 entries to L1 with decayed_from: 'working'
Moves expired L3 entries to L1 with decayed_from: 'emerging'
Skips promoted/rejected L3 entries (they never decay)
Updates evidence_links in L3/L4 entries that pointed to decayed entries (no broken links)
Respects activity-based extension: reading an entry resets its decay_at timer

Guards and Safety

Evidence link preservation — Before removing a decayed entry, scans all L3/L4 for references and updates them to the new decayed-{oldId} location
Never touches L1 or L4 — L1 entries are permanent archive; L4 entries are ratified canon

Policy Bridge

Purpose: Read-only query interface that routes agent requests to the appropriate knowledge layer based on intent.

Property	Value
Layer affinity	READ all layers
Cycle time	On-demand (per agent query)
Reads	L1, L2, L3, L4
Writes	Nothing — strictly read-only

What It Does

Agents query the Policy Bridge with an intent, and the bridge routes to the correct layer:

Intent	Layer	Semantic Weight
`enforce`	L4	`mandatory`
`advise`	L3	`advisory`
`brief`	L2	`contextual`
`route`	L1	`historical`
`all`	L1-L4	Stratified

Every result includes source_layer and semantic_weight metadata so agents know how to treat the information.

See Policy Bridge architecture for full details.

Governance API

Purpose: Human-in-the-loop review for promoting L3 proposals to L4 canon.

Property	Value
Layer affinity	Reads L3, writes L4
Cycle time	On-demand (human-triggered)
Reads	L3 pending entries, L1 evidence chains
Writes	L4 canon entries

What It Does

list_pending() — Returns L3 entries with status pending, sorted by confidence descending
promote(entryId, reviewerId) — Creates L4 entry, marks L3 as promoted
reject(entryId, reviewerId, reason) — Marks L3 as rejected with reason
get_evidence(entryId) — Returns the L3 entry with its full evidence chain (linked L1 traces)

Guards and Safety

L2 entries cannot be promoted — Returns error
Already-promoted/rejected entries cannot be re-promoted — Returns error
Only pending L3 entries are eligible — Status must be pending
L4 is write-only through Governance — No other worker can write to L4

State Tracking​

Harvester​

What It Does​

Guards and Safety​

Reconciler​

What It Does​

Guards and Safety​

Synthesizer​

What It Does​

Confidence Scoring​

Guards and Safety​

Cartographer​

What It Does​

Guards and Safety​

Decay Processor​

What It Does​

Guards and Safety​

Policy Bridge​

What It Does​

Governance API​

What It Does​

Guards and Safety​

State Tracking

Harvester

What It Does

Guards and Safety

Reconciler

What It Does

Guards and Safety

Synthesizer

What It Does

Confidence Scoring

Guards and Safety

Cartographer

What It Does

Guards and Safety

Decay Processor

What It Does

Guards and Safety

Policy Bridge

What It Does

Governance API

What It Does

Guards and Safety