Vault Storage

The vault is SOMA's persistent storage layer. Entities are Markdown files with YAML frontmatter, organized by type in a flat directory structure.

Directory Structure

.soma/vault/
  agent/              Agent profiles (one per agent)
  execution/          Execution records from agent runs
  decision/           Inferred agent decisions (from graph structure)
  insight/            LLM-extracted observations and patterns
  policy/             Auto-generated and ratified guard policies
  archetype/          Cross-agent behavioral patterns
  assumption/         System beliefs about agent behavior
  constraint/         Identified operational limitations
  contradiction/      Conflicting positions between entries
  synthesis/          Cross-cluster pattern summaries
  _index.json         Fast-lookup index (includes layer field)
  _mutations.jsonl    Append-only audit log (auto-rotated at 10MB)
  _vault.lock         File lock for concurrent access
  _vectors.json       Vector embeddings (JSON backend, if used)

Each type directory contains .md files named by entity ID. The vault never nests deeper than one level.

Entity Format

Every entity is a Markdown file with YAML frontmatter containing structured metadata and a Markdown body with human-readable content.

---
type: decision
id: tool-choice-fetch-data-agent-alpha
name: "tool_choice: fetch-data (agent-alpha)"
status: active
layer: archive
source_worker: harvester
decision_type: tool_choice
choice: fetch-data
outcome: completed
agent_id: agent-alpha
graph_id: exec-123
trace_id: decision-exec-123-t1
confidence: medium
tags: ["graph-inferred", "tool_choice"]
created: "2026-03-21T01:00:00.000Z"
updated: "2026-03-21T01:00:00.000Z"
---

## tool_choice: fetch-data

Agent **agent-alpha** made a tool_choice decision.
- **Choice:** fetch-data
- **Outcome:** completed

Required Fields

All entities must have: type, id, name, status, layer, source_worker, created, updated.

Layer Field Values

Layer Value	Numeric	Description
`canon`	L4	Ratified organizational truth
`emerging`	L3	Machine-proposed insights
`working`	L2	Team-scoped ephemeral context
`archive`	L1	Raw traces and history

Entity Types and Statuses

Type	Valid Statuses
`agent`, `archetype`	`active`, `inactive`, `deprecated`, `proposed`
`execution`	`completed`, `failed`, `running`, `pending`
`insight`	`active`, `superseded`, `rejected`
`policy`	`active`, `draft`, `deprecated`, `enforcing`
`decision`	`active`, `superseded`, `reversed`, `flagged`
`assumption`	`active`, `validated`, `invalidated`
`constraint`	`active`, `resolved`, `deprecated`
`contradiction`	`active`, `resolved`
`synthesis`	`active`, `superseded`

Index Structure

The _index.json file provides fast lookups without scanning disk. It mirrors all entities with their key metadata:

{
  "tool-choice-fetch-data-agent-alpha": {
    "type": "decision",
    "name": "tool_choice: fetch-data (agent-alpha)",
    "status": "active",
    "layer": "archive",
    "tags": ["graph-inferred", "tool_choice"],
    "created": "2026-03-21T01:00:00.000Z",
    "updated": "2026-03-21T01:00:00.000Z"
  },
  "canon-max-retry-count": {
    "type": "policy",
    "name": "max-retry-count",
    "status": "enforcing",
    "layer": "canon",
    "tags": ["governance", "retry"],
    "created": "2026-03-18T10:00:00.000Z",
    "updated": "2026-03-18T10:00:00.000Z"
  }
}

The layer field in the index enables queryByLayer() to filter at the index level. On a vault with 100K entities and 1K in L3, an emerging query reads only ~1K files from disk, not 100K.

Safety Features

Feature	Mechanism	Details
File locking	`O_EXCL` atomic lock with PID-based stale detection	Lock file `_vault.lock` created atomically. Contains PID of holder. Stale locks (PID not running) auto-removed on startup. 5s timeout, 50ms retry interval. All mutations acquire the lock.
Disk space checks	`statfsSync` before every write	Rejects writes below 10MB available. If `writeFileSync` fails mid-write, temp file is cleaned up (no partial `.md` files).
Temp file cleanup	Orphan detection on startup	`.tmp.*` files in vault directory are removed when SOMA starts.
Index corruption recovery	Spot-check validation + full rebuild	On load, 10% of entries (min 1, max 50) are checked against disk. If >50% missing, index is discarded and rebuilt from all `.md` files. Invalid JSON triggers immediate rebuild.
Mutation log rotation	Size-based archival	`_mutations.jsonl` is rotated at 10MB. Append-only log of all create/update/delete ops with timestamps. Currently write-only (future audit/replay).
Worker fingerprinting	MD5 of index file	Each worker stores a vault fingerprint in its state file. If the fingerprint changes (vault reset or manual edit), the worker discards cached state and reprocesses from scratch.
Layer-safe updates	`vault.update()` rejects layer changes	The `layer` field cannot be changed via `vault.update()`. Layer transitions happen only through Governance (`promote`) or Decay (move to L1).
Circuit breakers	Per-worker create limits	Workers stop after 100 creates per run to prevent runaway loops.

YAML Handling

The vault's YAML parser handles two categories of values differently to preserve round-trip integrity without a YAML library dependency.

Simple Values

Strings, numbers, booleans, and string arrays use standard YAML syntax:

type: decision
confidence: 0.82
active: true
tags: ["graph-inferred", "tool_choice"]

Complex Values

Nested objects and arrays of objects are serialized as inline JSON:

metadata: {"author":"alice","version":2}
evidence_links: [{"id":"trace-001","type":"execution"},{"id":"trace-002","type":"decision"}]

The parser detects values starting with { or [{ and parses them as JSON. This trade-off avoids YAML library complexity while maintaining lossless round-trips for all data types.

`vault.update()` vs `writeToLayer()`

Method	Purpose	Layer behavior
`vault.update(id, fields)`	Patch existing entity fields	Rejects changes to the `layer` field
`writeToLayer(entity, layer, worker)`	Create new entity in a specific layer	Enforces worker permissions per layer

// Update status — OK
await vault.update('my-entity', { status: 'superseded' });

// Try to change layer — REJECTED
await vault.update('my-entity', { layer: 'canon' });
// Error: LayerPermissionError — layer cannot be changed via update

// Create in a specific layer — permission checked
await writeToLayer(vault, newEntity, 'emerging', 'synthesizer');
// OK — synthesizer is authorized for L3

await writeToLayer(vault, newEntity, 'canon', 'synthesizer');
// Error: LayerPermissionError — only governance can write to canon

Mutation Log

The _mutations.jsonl file records every vault operation:

{"op":"create","id":"exec-alpha-001","type":"execution","layer":"archive","worker":"harvester","ts":"2026-03-21T01:00:00.000Z"}
{"op":"update","id":"exec-alpha-001","fields":["status"],"ts":"2026-03-21T01:05:00.000Z"}
{"op":"create","id":"decision-tool-fetch-alpha","type":"decision","layer":"archive","worker":"harvester","ts":"2026-03-21T01:00:01.000Z"}
{"op":"create","id":"retry-heavy-agents","type":"archetype","layer":"emerging","worker":"synthesizer","ts":"2026-03-21T02:00:00.000Z"}
{"op":"update","id":"retry-heavy-agents","fields":["status","ratified_by","ratified_at"],"ts":"2026-03-21T09:15:00.000Z"}

The log is auto-rotated at 10MB. It is append-only and write-only during normal operations — it exists for future audit and replay capabilities.

Directory Structure​

Entity Format​

Required Fields​

Layer Field Values​

Entity Types and Statuses​

Index Structure​

Safety Features​

YAML Handling​

Simple Values​

Complex Values​

vault.update() vs writeToLayer()​

Mutation Log​