Skip to main content

Vault Storage

The vault is SOMA's persistent storage layer. Entities are Markdown files with YAML frontmatter, organized by type in a flat directory structure.

Directory Structure

.soma/vault/
agent/ Agent profiles (one per agent)
execution/ Execution records from agent runs
decision/ Inferred agent decisions (from graph structure)
insight/ LLM-extracted observations and patterns
policy/ Auto-generated and ratified guard policies
archetype/ Cross-agent behavioral patterns
assumption/ System beliefs about agent behavior
constraint/ Identified operational limitations
contradiction/ Conflicting positions between entries
synthesis/ Cross-cluster pattern summaries
_index.json Fast-lookup index (includes layer field)
_mutations.jsonl Append-only audit log (auto-rotated at 10MB)
_vault.lock File lock for concurrent access
_vectors.json Vector embeddings (JSON backend, if used)

Each type directory contains .md files named by entity ID. The vault never nests deeper than one level.

Entity Format

Every entity is a Markdown file with YAML frontmatter containing structured metadata and a Markdown body with human-readable content.

---
type: decision
id: tool-choice-fetch-data-agent-alpha
name: "tool_choice: fetch-data (agent-alpha)"
status: active
layer: archive
source_worker: harvester
decision_type: tool_choice
choice: fetch-data
outcome: completed
agent_id: agent-alpha
graph_id: exec-123
trace_id: decision-exec-123-t1
confidence: medium
tags: ["graph-inferred", "tool_choice"]
created: "2026-03-21T01:00:00.000Z"
updated: "2026-03-21T01:00:00.000Z"
---

## tool_choice: fetch-data

Agent **agent-alpha** made a tool_choice decision.
- **Choice:** fetch-data
- **Outcome:** completed

Required Fields

All entities must have: type, id, name, status, layer, source_worker, created, updated.

Layer Field Values

Layer ValueNumericDescription
canonL4Ratified organizational truth
emergingL3Machine-proposed insights
workingL2Team-scoped ephemeral context
archiveL1Raw traces and history

Entity Types and Statuses

TypeValid Statuses
agent, archetypeactive, inactive, deprecated, proposed
executioncompleted, failed, running, pending
insightactive, superseded, rejected
policyactive, draft, deprecated, enforcing
decisionactive, superseded, reversed, flagged
assumptionactive, validated, invalidated
constraintactive, resolved, deprecated
contradictionactive, resolved
synthesisactive, superseded

Index Structure

The _index.json file provides fast lookups without scanning disk. It mirrors all entities with their key metadata:

{
"tool-choice-fetch-data-agent-alpha": {
"type": "decision",
"name": "tool_choice: fetch-data (agent-alpha)",
"status": "active",
"layer": "archive",
"tags": ["graph-inferred", "tool_choice"],
"created": "2026-03-21T01:00:00.000Z",
"updated": "2026-03-21T01:00:00.000Z"
},
"canon-max-retry-count": {
"type": "policy",
"name": "max-retry-count",
"status": "enforcing",
"layer": "canon",
"tags": ["governance", "retry"],
"created": "2026-03-18T10:00:00.000Z",
"updated": "2026-03-18T10:00:00.000Z"
}
}

The layer field in the index enables queryByLayer() to filter at the index level. On a vault with 100K entities and 1K in L3, an emerging query reads only ~1K files from disk, not 100K.

Safety Features

FeatureMechanismDetails
File lockingO_EXCL atomic lock with PID-based stale detectionLock file _vault.lock created atomically. Contains PID of holder. Stale locks (PID not running) auto-removed on startup. 5s timeout, 50ms retry interval. All mutations acquire the lock.
Disk space checksstatfsSync before every writeRejects writes below 10MB available. If writeFileSync fails mid-write, temp file is cleaned up (no partial .md files).
Temp file cleanupOrphan detection on startup.tmp.* files in vault directory are removed when SOMA starts.
Index corruption recoverySpot-check validation + full rebuildOn load, 10% of entries (min 1, max 50) are checked against disk. If >50% missing, index is discarded and rebuilt from all .md files. Invalid JSON triggers immediate rebuild.
Mutation log rotationSize-based archival_mutations.jsonl is rotated at 10MB. Append-only log of all create/update/delete ops with timestamps. Currently write-only (future audit/replay).
Worker fingerprintingMD5 of index fileEach worker stores a vault fingerprint in its state file. If the fingerprint changes (vault reset or manual edit), the worker discards cached state and reprocesses from scratch.
Layer-safe updatesvault.update() rejects layer changesThe layer field cannot be changed via vault.update(). Layer transitions happen only through Governance (promote) or Decay (move to L1).
Circuit breakersPer-worker create limitsWorkers stop after 100 creates per run to prevent runaway loops.

YAML Handling

The vault's YAML parser handles two categories of values differently to preserve round-trip integrity without a YAML library dependency.

Simple Values

Strings, numbers, booleans, and string arrays use standard YAML syntax:

type: decision
confidence: 0.82
active: true
tags: ["graph-inferred", "tool_choice"]

Complex Values

Nested objects and arrays of objects are serialized as inline JSON:

metadata: {"author":"alice","version":2}
evidence_links: [{"id":"trace-001","type":"execution"},{"id":"trace-002","type":"decision"}]

The parser detects values starting with { or [{ and parses them as JSON. This trade-off avoids YAML library complexity while maintaining lossless round-trips for all data types.

vault.update() vs writeToLayer()

MethodPurposeLayer behavior
vault.update(id, fields)Patch existing entity fieldsRejects changes to the layer field
writeToLayer(entity, layer, worker)Create new entity in a specific layerEnforces worker permissions per layer
// Update status — OK
await vault.update('my-entity', { status: 'superseded' });

// Try to change layer — REJECTED
await vault.update('my-entity', { layer: 'canon' });
// Error: LayerPermissionError — layer cannot be changed via update

// Create in a specific layer — permission checked
await writeToLayer(vault, newEntity, 'emerging', 'synthesizer');
// OK — synthesizer is authorized for L3

await writeToLayer(vault, newEntity, 'canon', 'synthesizer');
// Error: LayerPermissionError — only governance can write to canon

Mutation Log

The _mutations.jsonl file records every vault operation:

{"op":"create","id":"exec-alpha-001","type":"execution","layer":"archive","worker":"harvester","ts":"2026-03-21T01:00:00.000Z"}
{"op":"update","id":"exec-alpha-001","fields":["status"],"ts":"2026-03-21T01:05:00.000Z"}
{"op":"create","id":"decision-tool-fetch-alpha","type":"decision","layer":"archive","worker":"harvester","ts":"2026-03-21T01:00:01.000Z"}
{"op":"create","id":"retry-heavy-agents","type":"archetype","layer":"emerging","worker":"synthesizer","ts":"2026-03-21T02:00:00.000Z"}
{"op":"update","id":"retry-heavy-agents","fields":["status","ratified_by","ratified_at"],"ts":"2026-03-21T09:15:00.000Z"}

The log is auto-rotated at 10MB. It is append-only and write-only during normal operations — it exists for future audit and replay capabilities.