Skip to main content

Configuration

SOMA is configured through the SomaConfig interface passed to the pipeline. Every field has a sensible default.

Full SomaConfig

interface SomaConfig {
// Vault storage directory
vaultDir?: string; // Default: '.soma/vault'

// Vector store backend for Cartographer embeddings
vectorStore?: VectorStore; // Default: JSON file backend

// LLM function used by the Synthesizer for pattern extraction
analysisFn?: AnalysisFn; // (prompt: string) => Promise<string>

// Embedding function used by the Cartographer for semantic search
embedFn?: EmbedFn; // (text: string) => Promise<number[]>

// Inbox directory for file-based ingestion
inboxDir?: string; // Default: '.soma/inbox'

// Worker-specific configuration
harvester?: HarvesterConfig; // { batchSize?: number }
synthesizer?: SynthesizerConfig; // { minConfidence?: number, maxProposals?: number }
cartographer?: CartographerConfig; // { clusterThreshold?: number }
reconciler?: ReconcilerConfig; // { overlapThreshold?: number }

// Decay timing for ephemeral layers
decay?: DecayConfig;
}

interface DecayConfig {
l2DefaultDays: number; // Default: 14
l3DefaultDays: number; // Default: 90
teamDecayDays?: Record<string, number>; // Per-team L2 overrides
}

Decay Configuration Example

Override the default L2 expiry for specific teams:

const config: SomaConfig = {
decay: {
l2DefaultDays: 14,
l3DefaultDays: 90,
teamDecayDays: {
'platform': 21, // Platform team context lives 21 days
'ml-ops': 7, // ML-ops context is highly ephemeral
'security': 30, // Security context persists longer
}
}
};

Teams not listed in teamDecayDays use l2DefaultDays. L3 entries use l3DefaultDays globally — there is no per-team override for L3.

Decay rules:

  • L1 and L4 entries never decay
  • Promoted or rejected L3 entries never decay
  • Reading an entry resets its decay timer (activity-based extension)
  • Decayed entries move to L1 with decayed_from metadata

LLM Providers

SOMA supports four LLM providers for the Synthesizer's analysisFn. Configure via CLI flags or environment variables.

ProviderCLI --providerEnv Var (API Key)Endpoint
OpenRouteropenrouterOPENROUTER_API_KEYhttps://openrouter.ai/api/v1
AnthropicanthropicANTHROPIC_API_KEYhttps://api.anthropic.com/v1
OpenAIopenaiOPENAI_API_KEYhttps://api.openai.com/v1
CustomcustomSOMA_API_KEYSet via --endpoint flag

The custom provider works with any OpenAI-compatible endpoint (vLLM, Ollama, LiteLLM, etc.):

soma run \
--provider custom \
--model llama-3.1-70b \
--endpoint http://localhost:8080/v1 \
--api-key $MY_LOCAL_KEY

API Key Resolution Order

SOMA resolves the API key in this order, using the first value found:

  1. --api-key CLI flag
  2. SOMA_API_KEY environment variable
  3. Provider-specific environment variable (OPENROUTER_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY)
  4. ~/.env file (dotenv format, key matching step 2 or 3)
# Option 1: CLI flag (highest priority)
soma run --provider openrouter --model anthropic/claude-sonnet-4 --api-key sk-or-...

# Option 2: SOMA_API_KEY works for any provider
export SOMA_API_KEY=sk-or-...
soma run --provider openrouter --model anthropic/claude-sonnet-4

# Option 3: Provider-specific env var
export OPENROUTER_API_KEY=sk-or-...
soma run --provider openrouter --model anthropic/claude-sonnet-4

# Option 4: ~/.env file
echo "OPENROUTER_API_KEY=sk-or-..." >> ~/.env
soma run --provider openrouter --model anthropic/claude-sonnet-4

Vector Store Backends

The Cartographer uses a vector store for entity embeddings and semantic search. Three backends are available.

JSON File Backend (Default)

Zero-dependency file-based storage. Suitable for vaults up to ~10K entities.

import { JsonVectorStore } from '@soma/vector-store';

const config: SomaConfig = {
vectorStore: new JsonVectorStore('.soma/vault/_vectors.json'),
};

Embeddings are stored in a single JSON file alongside the vault. No external services required.

LanceDB

Columnar storage with native vector search. Good for 10K-1M entities.

import { LanceVectorStore } from '@soma/vector-store-lance';

const config: SomaConfig = {
vectorStore: new LanceVectorStore('.soma/vault/_lance'),
};

Requires the @lancedb/lancedb package. Stores data in a local directory.

Milvus

Distributed vector database for large-scale deployments (1M+ entities).

import { MilvusVectorStore } from '@soma/vector-store-milvus';

const config: SomaConfig = {
vectorStore: new MilvusVectorStore({
address: 'localhost:19530',
collection: 'soma_embeddings',
}),
};

Requires a running Milvus instance and the @zilliz/milvus2-sdk-node package.

Backend Comparison

BackendEntitiesDependenciesLatencyPersistence
JSON< 10KNone~10msFile
LanceDB10K - 1M@lancedb/lancedb~2msDirectory
Milvus1M+Milvus server~1msExternal DB

All backends implement the same VectorStore interface:

interface VectorStore {
upsert(id: string, vector: number[], metadata?: Record<string, unknown>): Promise<void>;
search(vector: number[], topK: number): Promise<Array<{ id: string; score: number }>>;
remove(id: string): Promise<void>;
}

Full Configuration Example

A production-ready configuration combining all options:

import { SomaConfig } from '@soma/core';
import { LanceVectorStore } from '@soma/vector-store-lance';

const config: SomaConfig = {
vaultDir: '/data/soma/vault',
inboxDir: '/data/soma/inbox',
vectorStore: new LanceVectorStore('/data/soma/vectors'),

analysisFn: async (prompt) => {
const res = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4',
messages: [{ role: 'user', content: prompt }],
}),
});
const json = await res.json();
return json.choices[0].message.content;
},

embedFn: async (text) => {
const res = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: text,
}),
});
const json = await res.json();
return json.data[0].embedding;
},

harvester: { batchSize: 50 },
synthesizer: { minConfidence: 0.3, maxProposals: 20 },
cartographer: { clusterThreshold: 0.7 },
reconciler: { overlapThreshold: 0.8 },

decay: {
l2DefaultDays: 14,
l3DefaultDays: 90,
teamDecayDays: {
'platform': 21,
'ml-ops': 7,
},
},
};