Configuration

SOMA is configured through the SomaConfig interface passed to the pipeline. Every field has a sensible default.

Full SomaConfig

interface SomaConfig {
  // Vault storage directory
  vaultDir?: string;              // Default: '.soma/vault'

  // Vector store backend for Cartographer embeddings
  vectorStore?: VectorStore;      // Default: JSON file backend

  // LLM function used by the Synthesizer for pattern extraction
  analysisFn?: AnalysisFn;       // (prompt: string) => Promise<string>

  // Embedding function used by the Cartographer for semantic search
  embedFn?: EmbedFn;              // (text: string) => Promise<number[]>

  // Inbox directory for file-based ingestion
  inboxDir?: string;              // Default: '.soma/inbox'

  // Worker-specific configuration
  harvester?: HarvesterConfig;    // { batchSize?: number }
  synthesizer?: SynthesizerConfig; // { minConfidence?: number, maxProposals?: number }
  cartographer?: CartographerConfig; // { clusterThreshold?: number }
  reconciler?: ReconcilerConfig;  // { overlapThreshold?: number }

  // Decay timing for ephemeral layers
  decay?: DecayConfig;
}

interface DecayConfig {
  l2DefaultDays: number;          // Default: 14
  l3DefaultDays: number;          // Default: 90
  teamDecayDays?: Record<string, number>; // Per-team L2 overrides
}

Decay Configuration Example

Override the default L2 expiry for specific teams:

const config: SomaConfig = {
  decay: {
    l2DefaultDays: 14,
    l3DefaultDays: 90,
    teamDecayDays: {
      'platform': 21,    // Platform team context lives 21 days
      'ml-ops': 7,       // ML-ops context is highly ephemeral
      'security': 30,    // Security context persists longer
    }
  }
};

Teams not listed in teamDecayDays use l2DefaultDays. L3 entries use l3DefaultDays globally — there is no per-team override for L3.

Decay rules:

L1 and L4 entries never decay
Promoted or rejected L3 entries never decay
Reading an entry resets its decay timer (activity-based extension)
Decayed entries move to L1 with decayed_from metadata

LLM Providers

SOMA supports four LLM providers for the Synthesizer's analysisFn. Configure via CLI flags or environment variables.

Provider	CLI `--provider`	Env Var (API Key)	Endpoint
OpenRouter	`openrouter`	`OPENROUTER_API_KEY`	`https://openrouter.ai/api/v1`
Anthropic	`anthropic`	`ANTHROPIC_API_KEY`	`https://api.anthropic.com/v1`
OpenAI	`openai`	`OPENAI_API_KEY`	`https://api.openai.com/v1`
Custom	`custom`	`SOMA_API_KEY`	Set via `--endpoint` flag

The custom provider works with any OpenAI-compatible endpoint (vLLM, Ollama, LiteLLM, etc.):

soma run \
  --provider custom \
  --model llama-3.1-70b \
  --endpoint http://localhost:8080/v1 \
  --api-key $MY_LOCAL_KEY

API Key Resolution Order

SOMA resolves the API key in this order, using the first value found:

--api-key CLI flag
SOMA_API_KEY environment variable
Provider-specific environment variable (OPENROUTER_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY)
~/.env file (dotenv format, key matching step 2 or 3)

# Option 1: CLI flag (highest priority)
soma run --provider openrouter --model anthropic/claude-sonnet-4 --api-key sk-or-...

# Option 2: SOMA_API_KEY works for any provider
export SOMA_API_KEY=sk-or-...
soma run --provider openrouter --model anthropic/claude-sonnet-4

# Option 3: Provider-specific env var
export OPENROUTER_API_KEY=sk-or-...
soma run --provider openrouter --model anthropic/claude-sonnet-4

# Option 4: ~/.env file
echo "OPENROUTER_API_KEY=sk-or-..." >> ~/.env
soma run --provider openrouter --model anthropic/claude-sonnet-4

Vector Store Backends

The Cartographer uses a vector store for entity embeddings and semantic search. Three backends are available.

JSON File Backend (Default)

Zero-dependency file-based storage. Suitable for vaults up to ~10K entities.

import { JsonVectorStore } from '@soma/vector-store';

const config: SomaConfig = {
  vectorStore: new JsonVectorStore('.soma/vault/_vectors.json'),
};

Embeddings are stored in a single JSON file alongside the vault. No external services required.

LanceDB

Columnar storage with native vector search. Good for 10K-1M entities.

import { LanceVectorStore } from '@soma/vector-store-lance';

const config: SomaConfig = {
  vectorStore: new LanceVectorStore('.soma/vault/_lance'),
};

Requires the @lancedb/lancedb package. Stores data in a local directory.

Milvus

Distributed vector database for large-scale deployments (1M+ entities).

import { MilvusVectorStore } from '@soma/vector-store-milvus';

const config: SomaConfig = {
  vectorStore: new MilvusVectorStore({
    address: 'localhost:19530',
    collection: 'soma_embeddings',
  }),
};

Requires a running Milvus instance and the @zilliz/milvus2-sdk-node package.

Backend Comparison

Backend	Entities	Dependencies	Latency	Persistence
JSON	< 10K	None	~10ms	File
LanceDB	10K - 1M	`@lancedb/lancedb`	~2ms	Directory
Milvus	1M+	Milvus server	~1ms	External DB

All backends implement the same VectorStore interface:

interface VectorStore {
  upsert(id: string, vector: number[], metadata?: Record<string, unknown>): Promise<void>;
  search(vector: number[], topK: number): Promise<Array<{ id: string; score: number }>>;
  remove(id: string): Promise<void>;
}

Full Configuration Example

A production-ready configuration combining all options:

import { SomaConfig } from '@soma/core';
import { LanceVectorStore } from '@soma/vector-store-lance';

const config: SomaConfig = {
  vaultDir: '/data/soma/vault',
  inboxDir: '/data/soma/inbox',
  vectorStore: new LanceVectorStore('/data/soma/vectors'),

  analysisFn: async (prompt) => {
    const res = await fetch('https://openrouter.ai/api/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'anthropic/claude-sonnet-4',
        messages: [{ role: 'user', content: prompt }],
      }),
    });
    const json = await res.json();
    return json.choices[0].message.content;
  },

  embedFn: async (text) => {
    const res = await fetch('https://api.openai.com/v1/embeddings', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'text-embedding-3-small',
        input: text,
      }),
    });
    const json = await res.json();
    return json.data[0].embedding;
  },

  harvester: { batchSize: 50 },
  synthesizer: { minConfidence: 0.3, maxProposals: 20 },
  cartographer: { clusterThreshold: 0.7 },
  reconciler: { overlapThreshold: 0.8 },

  decay: {
    l2DefaultDays: 14,
    l3DefaultDays: 90,
    teamDecayDays: {
      'platform': 21,
      'ml-ops': 7,
    },
  },
};

Full SomaConfig​

Decay Configuration Example​

LLM Providers​

API Key Resolution Order​

Vector Store Backends​

JSON File Backend (Default)​

LanceDB​

Milvus​

Backend Comparison​

Full Configuration Example​