Skip to main content

Workers

SOMA has seven workers that process agent traces into organizational knowledge. Each worker has a specific role, operates on specific layers, and feeds results into the vault.

All seven workers feed the vault. The Policy Bridge reads from it to serve agents. The Governance API mediates human review of L3 proposals into L4 canon.


State Tracking

All pipeline workers (Harvester, Reconciler, Synthesizer, Cartographer) maintain a local state file in .soma/ to enable incremental processing — only new or changed entities are reprocessed on each cycle.

Entity-count change detection: Each worker tracks the vault's entity count in its state file. On startup, if the current count is lower than the saved count, the worker infers a vault restructuring (migration, manual cleanup) and resets its state for a full rescan. Normal writes (new entities) never trigger a reset because the count only increases.

Content hashing: Workers compute an MD5 hash of each entity's content. Unchanged entities are skipped on subsequent cycles. This allows frequent cycle intervals (60s for Harvester, 5min for Reconciler) without redundant processing.

Synthesizer deduplication: When the Synthesizer extracts insights from LLM analysis, it checks existing vault entities for fuzzy title matches (overlap coefficient ≥ 0.7):

OutcomeConditionAction
SkippedMatch found, no higher confidence or new evidenceNo write
SupersededMatch found, higher confidence or new evidenceExisting entity updated
NewNo matchNew entity created in L3

A log line like [Soma Synthesizer] 10 skipped, 6 superseded, 0 new indicates healthy deduplication — the vault already contains the knowledge the LLM would extract.


Harvester

Purpose: Ingests execution traces, events, and full ExecutionGraph objects from agents into the vault.

PropertyValue
Layer affinityL1 (archive) — write only
Cycle time60 seconds
ReadsAgentFlow ExecutionEvent, PatternEvent, ExecutionGraph; inbox files (.json, .jsonl, .md)
Writesexecution entities, agent profiles, decision entities (all L1)

What It Does

The Harvester is the entry point for all data into SOMA. It processes three input types:

  1. ExecutionEvents — Summarized metrics from agent runs (duration, status, tool calls)
  2. PatternEvents — Process mining patterns detected by AgentFlow
  3. ExecutionGraphs — Full graph structures with nodes, edges, and trace events

For ExecutionGraph inputs, the Harvester extracts decisions from graph structure:

Graph StructureDecision TypeCaptured Data
tool nodetool_choiceTool name, metadata, duration, outcome
branched edgebranchSelected branch, alternatives
retried edgeretryRetry count, final outcome
subagent nodedelegationSubagent name, parent agent
Failed nodefailureError message, failure path

Guards and Safety

  • Duplicate trace detection — Traces with a trace_id already in the vault are skipped
  • Stable decision IDs — Decision IDs are derived from graph_id-node_id, making re-ingestion idempotent
  • Circuit breaker — Stops after 100 creates per run
  • Pluggable inbox parsers — Custom parsers registered by file extension

Reconciler

Purpose: Maintains vault structural integrity by scanning for and fixing data quality issues.

PropertyValue
Layer affinityL1 (archive) — write only
Cycle time5 minutes
ReadsAll vault entities (cross-layer scan)
WritesFixed entities in L1, merge entities in L1

What It Does

The Reconciler scans the entire vault looking for structural problems:

  • Missing fields — Required fields absent from entities
  • Invalid types/statuses — Types or statuses not in the registry
  • Broken wikilinks — References to entities that don't exist
  • Orphan entities — Entities with no inbound references
  • Stub entities — Empty or near-empty entity bodies
  • Duplicates — Near-duplicate entries detected via overlap coefficient

Auto-fixes applied without human intervention:

  • Type corrections (e.g., insights to insight)
  • Status alias mapping (e.g., done to completed)
  • Array type coercion (e.g., string tags to array)

For duplicates: the Reconciler uses overlap coefficient for near-duplicate detection, merges with multi-agent attribution, and resolves conflicts by keeping the newest entry (older gets superseded_by).

Guards and Safety

  • Merge dedup — Won't create a merge entity if one already exists with the same reconciled_from sources
  • L1 only writes — Cannot modify L2/L3/L4 entities directly

Synthesizer

Purpose: Detects cross-agent patterns in L1 data and generates L3 proposals with confidence scores.

PropertyValue
Layer affinityL3 (emerging) — write only
Cycle time1 hour
ReadsL1 execution, insight, agent, and decision entities
WritesL3 proposals (insight, archetype, policy, synthesis entities)

What It Does

The Synthesizer operates in three modes:

  1. Entity synthesis (synthesize()) — LLM-powered extraction from execution, insight, agent, and decision entities. Uses the configured analysisFn to identify patterns.

  2. L1 pattern synthesis (synthesizeL3()) — Cross-agent content similarity patterns without LLM. Groups L1 entries by semantic similarity and proposes archetypes.

  3. Decision pattern synthesis (synthesizeDecisions()) — Groups decisions by type and choice, detects behavioral patterns. If 5+ agents make the same tool choice, it proposes an archetype.

Confidence Scoring

SignalScore Contribution
Cross-agent corroboration (5+ agents)>= 0.8
Single-agent patternsCapped at 0.5
Per additional trace+0.02
Per additional agent+0.15

Guards and Safety

  • Self-exclusion — Entities tagged synthesized are excluded from the candidate pool (prevents the Synthesizer from processing its own output)
  • Circuit breaker — Stops after 100 proposals per run

Cartographer

Purpose: Maps relationships between entities, discovers archetypes via clustering, and detects contradictions.

PropertyValue
Layer affinityL3 (emerging) — write only
Cycle timeOn-change (triggered when vault changes)
ReadsAll vault entities (for embedding), L3 proposals, L4 canon
WritesL3 relationship proposals, archetype entities, contradiction entities

What It Does

  • Embed entities into the vector store (incremental, change-detected — only new/modified entities are re-embedded)
  • Discover archetypes via BFS community detection on the entity graph
  • Map relationships between entities sharing tags (proposed as L3 entries)
  • Detect contradictions between L3 proposals and existing L4 canon
  • Semantic search across all entities by vector similarity

Guards and Safety

  • Self-reference guard — Won't propose relationships between entities it created (prevents circular references)
  • Circuit breaker — Stops after 100 proposals per run

Decay Processor

Purpose: Manages entry lifecycle for ephemeral layers (L2 and L3), moving expired entries to L1.

PropertyValue
Layer affinityReads L2/L3, writes L1
Cycle timePer pipeline run
ReadsL2 entries, L3 entries, L3/L4 evidence links
WritesNew L1 entries (decayed copies), updated evidence references

What It Does

  • Moves expired L2 entries to L1 with decayed_from: 'working'
  • Moves expired L3 entries to L1 with decayed_from: 'emerging'
  • Skips promoted/rejected L3 entries (they never decay)
  • Updates evidence_links in L3/L4 entries that pointed to decayed entries (no broken links)
  • Respects activity-based extension: reading an entry resets its decay_at timer

Guards and Safety

  • Evidence link preservation — Before removing a decayed entry, scans all L3/L4 for references and updates them to the new decayed-{oldId} location
  • Never touches L1 or L4 — L1 entries are permanent archive; L4 entries are ratified canon

Policy Bridge

Purpose: Read-only query interface that routes agent requests to the appropriate knowledge layer based on intent.

PropertyValue
Layer affinityREAD all layers
Cycle timeOn-demand (per agent query)
ReadsL1, L2, L3, L4
WritesNothing — strictly read-only

What It Does

Agents query the Policy Bridge with an intent, and the bridge routes to the correct layer:

IntentLayerSemantic Weight
enforceL4mandatory
adviseL3advisory
briefL2contextual
routeL1historical
allL1-L4Stratified

Every result includes source_layer and semantic_weight metadata so agents know how to treat the information.

See Policy Bridge architecture for full details.


Governance API

Purpose: Human-in-the-loop review for promoting L3 proposals to L4 canon.

PropertyValue
Layer affinityReads L3, writes L4
Cycle timeOn-demand (human-triggered)
ReadsL3 pending entries, L1 evidence chains
WritesL4 canon entries

What It Does

  • list_pending() — Returns L3 entries with status pending, sorted by confidence descending
  • promote(entryId, reviewerId) — Creates L4 entry, marks L3 as promoted
  • reject(entryId, reviewerId, reason) — Marks L3 as rejected with reason
  • get_evidence(entryId) — Returns the L3 entry with its full evidence chain (linked L1 traces)

Guards and Safety

  • L2 entries cannot be promoted — Returns error
  • Already-promoted/rejected entries cannot be re-promoted — Returns error
  • Only pending L3 entries are eligible — Status must be pending
  • L4 is write-only through Governance — No other worker can write to L4