Vault Storage
The vault is SOMA's persistent storage layer. Entities are Markdown files with YAML frontmatter, organized by type in a flat directory structure.
Directory Structure
.soma/vault/
agent/ Agent profiles (one per agent)
execution/ Execution records from agent runs
decision/ Inferred agent decisions (from graph structure)
insight/ LLM-extracted observations and patterns
policy/ Auto-generated and ratified guard policies
archetype/ Cross-agent behavioral patterns
assumption/ System beliefs about agent behavior
constraint/ Identified operational limitations
contradiction/ Conflicting positions between entries
synthesis/ Cross-cluster pattern summaries
_index.json Fast-lookup index (includes layer field)
_mutations.jsonl Append-only audit log (auto-rotated at 10MB)
_vault.lock File lock for concurrent access
_vectors.json Vector embeddings (JSON backend, if used)
Each type directory contains .md files named by entity ID. The vault never nests deeper than one level.
Entity Format
Every entity is a Markdown file with YAML frontmatter containing structured metadata and a Markdown body with human-readable content.
---
type: decision
id: tool-choice-fetch-data-agent-alpha
name: "tool_choice: fetch-data (agent-alpha)"
status: active
layer: archive
source_worker: harvester
decision_type: tool_choice
choice: fetch-data
outcome: completed
agent_id: agent-alpha
graph_id: exec-123
trace_id: decision-exec-123-t1
confidence: medium
tags: ["graph-inferred", "tool_choice"]
created: "2026-03-21T01:00:00.000Z"
updated: "2026-03-21T01:00:00.000Z"
---
## tool_choice: fetch-data
Agent **agent-alpha** made a tool_choice decision.
- **Choice:** fetch-data
- **Outcome:** completed
Required Fields
All entities must have: type, id, name, status, layer, source_worker, created, updated.
Layer Field Values
| Layer Value | Numeric | Description |
|---|---|---|
canon | L4 | Ratified organizational truth |
emerging | L3 | Machine-proposed insights |
working | L2 | Team-scoped ephemeral context |
archive | L1 | Raw traces and history |
Entity Types and Statuses
| Type | Valid Statuses |
|---|---|
agent, archetype | active, inactive, deprecated, proposed |
execution | completed, failed, running, pending |
insight | active, superseded, rejected |
policy | active, draft, deprecated, enforcing |
decision | active, superseded, reversed, flagged |
assumption | active, validated, invalidated |
constraint | active, resolved, deprecated |
contradiction | active, resolved |
synthesis | active, superseded |
Index Structure
The _index.json file provides fast lookups without scanning disk. It mirrors all entities with their key metadata:
{
"tool-choice-fetch-data-agent-alpha": {
"type": "decision",
"name": "tool_choice: fetch-data (agent-alpha)",
"status": "active",
"layer": "archive",
"tags": ["graph-inferred", "tool_choice"],
"created": "2026-03-21T01:00:00.000Z",
"updated": "2026-03-21T01:00:00.000Z"
},
"canon-max-retry-count": {
"type": "policy",
"name": "max-retry-count",
"status": "enforcing",
"layer": "canon",
"tags": ["governance", "retry"],
"created": "2026-03-18T10:00:00.000Z",
"updated": "2026-03-18T10:00:00.000Z"
}
}
The layer field in the index enables queryByLayer() to filter at the index level. On a vault with 100K entities and 1K in L3, an emerging query reads only ~1K files from disk, not 100K.
Safety Features
| Feature | Mechanism | Details |
|---|---|---|
| File locking | O_EXCL atomic lock with PID-based stale detection | Lock file _vault.lock created atomically. Contains PID of holder. Stale locks (PID not running) auto-removed on startup. 5s timeout, 50ms retry interval. All mutations acquire the lock. |
| Disk space checks | statfsSync before every write | Rejects writes below 10MB available. If writeFileSync fails mid-write, temp file is cleaned up (no partial .md files). |
| Temp file cleanup | Orphan detection on startup | .tmp.* files in vault directory are removed when SOMA starts. |
| Index corruption recovery | Spot-check validation + full rebuild | On load, 10% of entries (min 1, max 50) are checked against disk. If >50% missing, index is discarded and rebuilt from all .md files. Invalid JSON triggers immediate rebuild. |
| Mutation log rotation | Size-based archival | _mutations.jsonl is rotated at 10MB. Append-only log of all create/update/delete ops with timestamps. Currently write-only (future audit/replay). |
| Worker fingerprinting | MD5 of index file | Each worker stores a vault fingerprint in its state file. If the fingerprint changes (vault reset or manual edit), the worker discards cached state and reprocesses from scratch. |
| Layer-safe updates | vault.update() rejects layer changes | The layer field cannot be changed via vault.update(). Layer transitions happen only through Governance (promote) or Decay (move to L1). |
| Circuit breakers | Per-worker create limits | Workers stop after 100 creates per run to prevent runaway loops. |
YAML Handling
The vault's YAML parser handles two categories of values differently to preserve round-trip integrity without a YAML library dependency.
Simple Values
Strings, numbers, booleans, and string arrays use standard YAML syntax:
type: decision
confidence: 0.82
active: true
tags: ["graph-inferred", "tool_choice"]
Complex Values
Nested objects and arrays of objects are serialized as inline JSON:
metadata: {"author":"alice","version":2}
evidence_links: [{"id":"trace-001","type":"execution"},{"id":"trace-002","type":"decision"}]
The parser detects values starting with { or [{ and parses them as JSON. This trade-off avoids YAML library complexity while maintaining lossless round-trips for all data types.
vault.update() vs writeToLayer()
| Method | Purpose | Layer behavior |
|---|---|---|
vault.update(id, fields) | Patch existing entity fields | Rejects changes to the layer field |
writeToLayer(entity, layer, worker) | Create new entity in a specific layer | Enforces worker permissions per layer |
// Update status — OK
await vault.update('my-entity', { status: 'superseded' });
// Try to change layer — REJECTED
await vault.update('my-entity', { layer: 'canon' });
// Error: LayerPermissionError — layer cannot be changed via update
// Create in a specific layer — permission checked
await writeToLayer(vault, newEntity, 'emerging', 'synthesizer');
// OK — synthesizer is authorized for L3
await writeToLayer(vault, newEntity, 'canon', 'synthesizer');
// Error: LayerPermissionError — only governance can write to canon
Mutation Log
The _mutations.jsonl file records every vault operation:
{"op":"create","id":"exec-alpha-001","type":"execution","layer":"archive","worker":"harvester","ts":"2026-03-21T01:00:00.000Z"}
{"op":"update","id":"exec-alpha-001","fields":["status"],"ts":"2026-03-21T01:05:00.000Z"}
{"op":"create","id":"decision-tool-fetch-alpha","type":"decision","layer":"archive","worker":"harvester","ts":"2026-03-21T01:00:01.000Z"}
{"op":"create","id":"retry-heavy-agents","type":"archetype","layer":"emerging","worker":"synthesizer","ts":"2026-03-21T02:00:00.000Z"}
{"op":"update","id":"retry-heavy-agents","fields":["status","ratified_by","ratified_at"],"ts":"2026-03-21T09:15:00.000Z"}
The log is auto-rotated at 10MB. It is append-only and write-only during normal operations — it exists for future audit and replay capabilities.