Configuration
SOMA is configured through the SomaConfig interface passed to the pipeline. Every field has a sensible default.
Full SomaConfig
interface SomaConfig {
// Vault storage directory
vaultDir?: string; // Default: '.soma/vault'
// Vector store backend for Cartographer embeddings
vectorStore?: VectorStore; // Default: JSON file backend
// LLM function used by the Synthesizer for pattern extraction
analysisFn?: AnalysisFn; // (prompt: string) => Promise<string>
// Embedding function used by the Cartographer for semantic search
embedFn?: EmbedFn; // (text: string) => Promise<number[]>
// Inbox directory for file-based ingestion
inboxDir?: string; // Default: '.soma/inbox'
// Worker-specific configuration
harvester?: HarvesterConfig; // { batchSize?: number }
synthesizer?: SynthesizerConfig; // { minConfidence?: number, maxProposals?: number }
cartographer?: CartographerConfig; // { clusterThreshold?: number }
reconciler?: ReconcilerConfig; // { overlapThreshold?: number }
// Decay timing for ephemeral layers
decay?: DecayConfig;
}
interface DecayConfig {
l2DefaultDays: number; // Default: 14
l3DefaultDays: number; // Default: 90
teamDecayDays?: Record<string, number>; // Per-team L2 overrides
}
Decay Configuration Example
Override the default L2 expiry for specific teams:
const config: SomaConfig = {
decay: {
l2DefaultDays: 14,
l3DefaultDays: 90,
teamDecayDays: {
'platform': 21, // Platform team context lives 21 days
'ml-ops': 7, // ML-ops context is highly ephemeral
'security': 30, // Security context persists longer
}
}
};
Teams not listed in teamDecayDays use l2DefaultDays. L3 entries use l3DefaultDays globally — there is no per-team override for L3.
Decay rules:
- L1 and L4 entries never decay
- Promoted or rejected L3 entries never decay
- Reading an entry resets its decay timer (activity-based extension)
- Decayed entries move to L1 with
decayed_frommetadata
LLM Providers
SOMA supports four LLM providers for the Synthesizer's analysisFn. Configure via CLI flags or environment variables.
| Provider | CLI --provider | Env Var (API Key) | Endpoint |
|---|---|---|---|
| OpenRouter | openrouter | OPENROUTER_API_KEY | https://openrouter.ai/api/v1 |
| Anthropic | anthropic | ANTHROPIC_API_KEY | https://api.anthropic.com/v1 |
| OpenAI | openai | OPENAI_API_KEY | https://api.openai.com/v1 |
| Custom | custom | SOMA_API_KEY | Set via --endpoint flag |
The custom provider works with any OpenAI-compatible endpoint (vLLM, Ollama, LiteLLM, etc.):
soma run \
--provider custom \
--model llama-3.1-70b \
--endpoint http://localhost:8080/v1 \
--api-key $MY_LOCAL_KEY
API Key Resolution Order
SOMA resolves the API key in this order, using the first value found:
--api-keyCLI flagSOMA_API_KEYenvironment variable- Provider-specific environment variable (
OPENROUTER_API_KEY,ANTHROPIC_API_KEY,OPENAI_API_KEY) ~/.envfile (dotenv format, key matching step 2 or 3)
# Option 1: CLI flag (highest priority)
soma run --provider openrouter --model anthropic/claude-sonnet-4 --api-key sk-or-...
# Option 2: SOMA_API_KEY works for any provider
export SOMA_API_KEY=sk-or-...
soma run --provider openrouter --model anthropic/claude-sonnet-4
# Option 3: Provider-specific env var
export OPENROUTER_API_KEY=sk-or-...
soma run --provider openrouter --model anthropic/claude-sonnet-4
# Option 4: ~/.env file
echo "OPENROUTER_API_KEY=sk-or-..." >> ~/.env
soma run --provider openrouter --model anthropic/claude-sonnet-4
Vector Store Backends
The Cartographer uses a vector store for entity embeddings and semantic search. Three backends are available.
JSON File Backend (Default)
Zero-dependency file-based storage. Suitable for vaults up to ~10K entities.
import { JsonVectorStore } from '@soma/vector-store';
const config: SomaConfig = {
vectorStore: new JsonVectorStore('.soma/vault/_vectors.json'),
};
Embeddings are stored in a single JSON file alongside the vault. No external services required.
LanceDB
Columnar storage with native vector search. Good for 10K-1M entities.
import { LanceVectorStore } from '@soma/vector-store-lance';
const config: SomaConfig = {
vectorStore: new LanceVectorStore('.soma/vault/_lance'),
};
Requires the @lancedb/lancedb package. Stores data in a local directory.
Milvus
Distributed vector database for large-scale deployments (1M+ entities).
import { MilvusVectorStore } from '@soma/vector-store-milvus';
const config: SomaConfig = {
vectorStore: new MilvusVectorStore({
address: 'localhost:19530',
collection: 'soma_embeddings',
}),
};
Requires a running Milvus instance and the @zilliz/milvus2-sdk-node package.
Backend Comparison
| Backend | Entities | Dependencies | Latency | Persistence |
|---|---|---|---|---|
| JSON | < 10K | None | ~10ms | File |
| LanceDB | 10K - 1M | @lancedb/lancedb | ~2ms | Directory |
| Milvus | 1M+ | Milvus server | ~1ms | External DB |
All backends implement the same VectorStore interface:
interface VectorStore {
upsert(id: string, vector: number[], metadata?: Record<string, unknown>): Promise<void>;
search(vector: number[], topK: number): Promise<Array<{ id: string; score: number }>>;
remove(id: string): Promise<void>;
}
Full Configuration Example
A production-ready configuration combining all options:
import { SomaConfig } from '@soma/core';
import { LanceVectorStore } from '@soma/vector-store-lance';
const config: SomaConfig = {
vaultDir: '/data/soma/vault',
inboxDir: '/data/soma/inbox',
vectorStore: new LanceVectorStore('/data/soma/vectors'),
analysisFn: async (prompt) => {
const res = await fetch('https://openrouter.ai/api/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENROUTER_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4',
messages: [{ role: 'user', content: prompt }],
}),
});
const json = await res.json();
return json.choices[0].message.content;
},
embedFn: async (text) => {
const res = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: text,
}),
});
const json = await res.json();
return json.data[0].embedding;
},
harvester: { batchSize: 50 },
synthesizer: { minConfidence: 0.3, maxProposals: 20 },
cartographer: { clusterThreshold: 0.7 },
reconciler: { overlapThreshold: 0.8 },
decay: {
l2DefaultDays: 14,
l3DefaultDays: 90,
teamDecayDays: {
'platform': 21,
'ml-ops': 7,
},
},
};