Memory System
Teleton uses a three-layer hybrid RAG architecture combining vector embeddings (sqlite-vec) with full-text search (FTS5) across a Knowledge Base, Telegram Feed, and Session state -- all stored in SQLite.
Overview
The memory system is organized into three distinct layers, each serving a different purpose:
| Layer | Contents | Persistence |
|---|---|---|
| Knowledge Base | Chunked Markdown documents (MEMORY.md, memory/*.md) | Permanent -- survives restarts |
| Telegram Feed | Archived messages, users, and chat metadata | Permanent -- grows over time |
| Sessions | Conversation state, message history, context window | Ephemeral -- compacted and summarized |
All data lives in a single SQLite database with WAL mode enabled for concurrent reads. When sqlite-vec is available, the system performs hybrid search -- merging FTS5 keyword scores with vector cosine similarity for every query.
Knowledge Base
Source Files
The knowledge base is built from Markdown files in the workspace:
MEMORY.md-- the root memory document (in workspace root)memory/*.md-- additional topic-specific memory files
Chunking Strategy
Documents are split into chunks for indexing with the following rules:
| Parameter | Value |
|---|---|
| Target chunk size | 500 characters |
| Maximum chunk size | 1000 characters |
| Boundary rules | Respects heading boundaries, code blocks, and list groups |
Incremental Indexing
Each chunk is hashed with SHA-256. On re-index, only chunks whose hash has changed are re-embedded and written. Unchanged chunks are skipped, keeping re-indexing fast.
Knowledge Sources
Every knowledge entry is tagged with a source field:
| Source | Origin | Description |
|---|---|---|
memory | Markdown files | Chunks parsed from MEMORY.md and memory/*.md |
session | LLM summaries | Summaries generated during session compaction |
learned | Agent interactions | Facts the agent picks up during conversations |
Embedding Providers
Configured via embedding.provider in config.yaml. Three options are available:
Local (default)
Runs entirely on-device using ONNX Runtime with the Xenova/all-MiniLM-L6-v2 model (384 dimensions). Zero API cost, works offline.
embedding:
provider: "local"Anthropic
Uses the Anthropic API for higher-quality embeddings. Requires a valid API key.
embedding:
provider: "anthropic"None
Disables vector search entirely. Only FTS5 keyword search is used. Simplest setup with no embedding dependencies.
embedding:
provider: "none"Embedding Cache
All computed embeddings are cached in the embedding_cache SQLite table to avoid redundant computation:
| Parameter | Value |
|---|---|
| TTL | 60 days |
| Max entries | 50,000 |
| Eviction policy | LRU (least recently used) |
Hybrid Search
Every search query runs through a two-stage pipeline that merges vector and keyword results:
# For each query:
1. Vector search -- cosine distance via sqlite-vec, top 30 candidates
2. Keyword search -- FTS5 BM25 ranking
3. Score merge -- final = 0.5 * vectorScore + 0.5 * keywordScore (memory retrieval weights; tool selection uses 0.6/0.4)
4. Filter -- discard results below 0.15 minimum score
5. Return -- top 10 resultsIf vector search is unavailable (provider set to none, or sqlite-vec failed to load), the system falls back to keyword-only search using FTS5.
Telegram Feed
All incoming and outgoing Telegram messages are archived for later retrieval. The feed is stored across three tables:
| Table | Contents |
|---|---|
tg_messages | Full message text, sender ID, chat ID, timestamp, optional embedding vector |
tg_messages_fts | FTS5 index over message text for keyword search |
tg_chats | Chat metadata (title, type, member count) |
tg_users | User metadata (name, username, phone) |
When vector search is enabled, each message also gets an embedding stored alongside the text, allowing semantic search across the entire message archive.
Context Building (RAG)
When the agent processes an incoming message, context is assembled in four steps:
1. Fetch 10 most recent messages from the current chat
2. Hybrid search the Knowledge Base -- top 5 chunks
3. Hybrid search the Telegram Feed -- top 5 messages
4. Deduplicate resultsThe retrieved context is injected into the prompt in two labeled blocks:
[Relevant knowledge from memory]
... matched knowledge base chunks ...
[Relevant messages from Telegram feed]
... matched archived messages ...Memory Tools
Two tools are exposed to the agent for explicit memory operations:
| Tool | Description |
|---|---|
memory_write | Write content to persistent memory (MEMORY.md) or to the current daily log file |
memory_read | Read from persistent memory or retrieve daily log entries |
The agent can use these tools proactively -- for example, saving important user preferences to MEMORY.md so they persist across sessions.
Daily Logs
A daily log file is automatically created for each day the agent is active:
~/.teleton/workspace/memory/{YYYY-MM-DD}.mdDaily logs contain:
- Session notes and conversation summaries
- Memory flushes from session compaction
- Milestone events and notable interactions
System prompt inclusion (DM only): The logs for yesterday and today are automatically included in the system prompt, capped at 100 lines each. This gives the agent short-term memory across restarts. Group chats do not receive daily log context.
Session Memory
Before a session is compacted or a daily reset occurs, the system preserves key information:
- An LLM generates a summary of the old session's conversation
- The summary is saved to memory (either persistent
MEMORY.mdor the daily log) - The agent retains key facts, preferences, and context across session boundaries
This ensures continuity -- even after a context window reset, the agent remembers what matters.
Observation Masking
To save context window space, old tool results are compressed into a compact format:
[Tool: send_message - OK]
[Tool: search_messages - OK]
[Tool: get_balance - ERROR: insufficient funds]| Rule | Detail |
|---|---|
| Last 10 results | Kept intact (full output preserved) |
| Error results | Always kept intact regardless of age |
| Older results | Compressed to [Tool: name - OK] |
| Size reduction | ~90% per masked result |
Context Compaction
When the conversation grows too large, automatic compaction kicks in:
| Threshold | Trigger | Action |
|---|---|---|
| 50% of context window | Soft warning | Memory flush warning -- agent is prompted to save important facts |
| 200+ messages or 75% of context window | Hard compaction | Full compaction cycle runs |
The compaction process:
- AI generates a summary of the old conversation
- Old messages are replaced with the summary
- The last 20 messages are kept intact
- A new session ID is assigned
Privacy
Memory context injection follows strict privacy boundaries:
| Chat Type | Memory Context | Reason |
|---|---|---|
| Direct Messages | Full context included (MEMORY.md + daily logs) | Private 1:1 conversation, safe to include personal context |
| Group Chats | Own-chat feed RAG search (recent messages from that group), but not MEMORY.md, STRATEGY.md, or cross-chat search | Prevents cross-user information leakage |
This separation ensures that private notes, preferences, and personal information stored in memory are never exposed in group conversations.
Configuration
Memory-related settings in config.yaml:
embedding:
provider: "local" # "local" | "anthropic" | "none"
model: null # Override default model (optional)
storage:
sessions_file: "~/.teleton/sessions.json"
memory_file: "~/.teleton/memory.json"
history_limit: 100| Key | Default | Description |
|---|---|---|
embedding.provider | "local" | Embedding backend: local (ONNX), anthropic, or none |
embedding.model | null | Override the default model for the chosen provider |
storage.sessions_file | ~/.teleton/sessions.json | Path to session state file |
storage.memory_file | ~/.teleton/memory.json | Path to memory metadata file |
storage.history_limit | 100 | Maximum messages retained in raw history |
Database Tables
All data is stored in a single SQLite database (schema version 1.13.0). Key tables:
| Table | Purpose |
|---|---|
meta | Schema metadata (stores current schema version) |
knowledge | Knowledge base chunks (text, hash, source, embedding) |
knowledge_fts | FTS5 index over knowledge chunks |
knowledge_vec | sqlite-vec virtual table for vector similarity search over knowledge chunks |
sessions | Conversation session state and history |
tg_messages | Archived Telegram messages |
tg_messages_fts | FTS5 index over Telegram messages |
tg_messages_vec | sqlite-vec virtual table for vector similarity search over archived messages |
tg_chats | Telegram chat metadata |
tg_users | Telegram user metadata |
embedding_cache | Cached embedding vectors (60-day TTL, 50k max, LRU) |
exec_audit | Command execution audit log (tool, command, exit code, stdout/stderr, duration) |
tool_index | Tool RAG index: tool name, description, and search text for semantic tool selection |
tool_index_fts | FTS5 index over tool_index for keyword-based tool search |
tool_config | Runtime tool configuration overrides (enabled, scope) set via admin commands |
tasks | Scheduled and pending agent tasks |
task_dependencies | Dependency graph between tasks |