Memory System - Teleton Agent Documentation

Overview

The memory system is organized into three distinct layers, each serving a different purpose:

Layer	Contents	Persistence
Knowledge Base	Chunked Markdown documents (`MEMORY.md`, `memory/*.md`)	Permanent -- survives restarts
Telegram Feed	Archived messages, users, and chat metadata	Permanent -- grows over time
Sessions	Conversation state, message history, context window	Ephemeral -- compacted and summarized

All data lives in a single SQLite database with WAL mode enabled for concurrent reads. When sqlite-vec is available, the system performs hybrid search -- merging FTS5 keyword scores with vector cosine similarity for every query.

Knowledge Base

Source Files

The knowledge base is built from Markdown files in the workspace:

MEMORY.md -- the root memory document (in workspace root)
memory/*.md -- additional topic-specific memory files

Chunking Strategy

Documents are split into chunks for indexing with the following rules:

Parameter	Value
Target chunk size	500 characters
Maximum chunk size	1000 characters
Boundary rules	Respects heading boundaries, code blocks, and list groups

Incremental Indexing

Each chunk is hashed with SHA-256. On re-index, only chunks whose hash has changed are re-embedded and written. Unchanged chunks are skipped, keeping re-indexing fast.

Knowledge Sources

Every knowledge entry is tagged with a source field:

Source	Origin	Description
`memory`	Markdown files	Chunks parsed from `MEMORY.md` and `memory/*.md`
`session`	LLM summaries	Summaries generated during session compaction
`learned`	Agent interactions	Facts the agent picks up during conversations

Embedding Providers

Configured via embedding.provider in config.yaml. Three options are available:

Local (default)

Runs entirely on-device using ONNX Runtime with the Xenova/all-MiniLM-L6-v2 model (384 dimensions). Zero API cost, works offline.

config.yaml

embedding:
  provider: "local"

Anthropic

Uses the Anthropic API for higher-quality embeddings. Requires a valid API key.

config.yaml

embedding:
  provider: "anthropic"

None

Disables vector search entirely. Only FTS5 keyword search is used. Simplest setup with no embedding dependencies.

config.yaml

embedding:
  provider: "none"

Embedding Cache

All computed embeddings are cached in the embedding_cache SQLite table to avoid redundant computation:

Parameter	Value
TTL	60 days
Max entries	50,000
Eviction policy	LRU (least recently used)

Hybrid Search

Every search query runs through a two-stage pipeline that merges vector and keyword results:

Search Pipeline

# For each query:
1. Vector search   -- cosine distance via sqlite-vec, top 30 candidates
2. Keyword search  -- FTS5 BM25 ranking
3. Score merge     -- final = 0.5 * vectorScore + 0.5 * keywordScore  (memory retrieval weights; tool selection uses 0.6/0.4)
4. Filter          -- discard results below 0.15 minimum score
5. Return          -- top 10 results

If vector search is unavailable (provider set to none, or sqlite-vec failed to load), the system falls back to keyword-only search using FTS5.

Telegram Feed

All incoming and outgoing Telegram messages are archived for later retrieval. The feed is stored across three tables:

Table	Contents
`tg_messages`	Full message text, sender ID, chat ID, timestamp, optional embedding vector
`tg_messages_fts`	FTS5 index over message text for keyword search
`tg_chats`	Chat metadata (title, type, member count)
`tg_users`	User metadata (name, username, phone)

When vector search is enabled, each message also gets an embedding stored alongside the text, allowing semantic search across the entire message archive.

Context Building (RAG)

When the agent processes an incoming message, context is assembled in four steps:

RAG Pipeline

1. Fetch 10 most recent messages from the current chat
2. Hybrid search the Knowledge Base  -- top 5 chunks
3. Hybrid search the Telegram Feed   -- top 5 messages
4. Deduplicate results

The retrieved context is injected into the prompt in two labeled blocks:

Injected Context Format

[Relevant knowledge from memory]
  ... matched knowledge base chunks ...

[Relevant messages from Telegram feed]
  ... matched archived messages ...

Memory Tools

Two tools are exposed to the agent for explicit memory operations:

Tool	Description
`memory_write`	Write content to persistent memory (`MEMORY.md`) or to the current daily log file
`memory_read`	Read from persistent memory or retrieve daily log entries

The agent can use these tools proactively -- for example, saving important user preferences to MEMORY.md so they persist across sessions.

Daily Logs

A daily log file is automatically created for each day the agent is active:

Path

~/.teleton/workspace/memory/{YYYY-MM-DD}.md

Daily logs contain:

Session notes and conversation summaries
Memory flushes from session compaction
Milestone events and notable interactions

System prompt inclusion (DM only): The logs for yesterday and today are automatically included in the system prompt, capped at 100 lines each. This gives the agent short-term memory across restarts. Group chats do not receive daily log context.

Session Memory

Before a session is compacted or a daily reset occurs, the system preserves key information:

An LLM generates a summary of the old session's conversation
The summary is saved to memory (either persistent MEMORY.md or the daily log)
The agent retains key facts, preferences, and context across session boundaries

This ensures continuity -- even after a context window reset, the agent remembers what matters.

Observation Masking

To save context window space, old tool results are compressed into a compact format:

Masked Format

[Tool: send_message - OK]
[Tool: search_messages - OK]
[Tool: get_balance - ERROR: insufficient funds]

Rule	Detail
Last 10 results	Kept intact (full output preserved)
Error results	Always kept intact regardless of age
Older results	Compressed to `[Tool: name - OK]`
Size reduction	~90% per masked result

Context Compaction

When the conversation grows too large, automatic compaction kicks in:

Threshold	Trigger	Action
50% of context window	Soft warning	Memory flush warning -- agent is prompted to save important facts
200+ messages or 75% of context window	Hard compaction	Full compaction cycle runs

The compaction process:

AI generates a summary of the old conversation
Old messages are replaced with the summary
The last 20 messages are kept intact
A new session ID is assigned

Privacy

Memory context injection follows strict privacy boundaries:

Chat Type	Memory Context	Reason
Direct Messages	Full context included (`MEMORY.md` + daily logs)	Private 1:1 conversation, safe to include personal context
Group Chats	Own-chat feed RAG search (recent messages from that group), but not MEMORY.md, STRATEGY.md, or cross-chat search	Prevents cross-user information leakage

This separation ensures that private notes, preferences, and personal information stored in memory are never exposed in group conversations.

Configuration

Memory-related settings in config.yaml:

config.yaml

embedding:
  provider: "local"    # "local" | "anthropic" | "none"
  model: null          # Override default model (optional)

storage:
  sessions_file: "~/.teleton/sessions.json"
  memory_file: "~/.teleton/memory.json"
  history_limit: 100

Key	Default	Description
`embedding.provider`	`"local"`	Embedding backend: `local` (ONNX), `anthropic`, or `none`
`embedding.model`	`null`	Override the default model for the chosen provider
`storage.sessions_file`	`~/.teleton/sessions.json`	Path to session state file
`storage.memory_file`	`~/.teleton/memory.json`	Path to memory metadata file
`storage.history_limit`	`100`	Maximum messages retained in raw history

Database Tables

All data is stored in a single SQLite database (schema version 1.13.0). Key tables:

Table	Purpose
`meta`	Schema metadata (stores current schema version)
`knowledge`	Knowledge base chunks (text, hash, source, embedding)
`knowledge_fts`	FTS5 index over knowledge chunks
`knowledge_vec`	sqlite-vec virtual table for vector similarity search over knowledge chunks
`sessions`	Conversation session state and history
`tg_messages`	Archived Telegram messages
`tg_messages_fts`	FTS5 index over Telegram messages
`tg_messages_vec`	sqlite-vec virtual table for vector similarity search over archived messages
`tg_chats`	Telegram chat metadata
`tg_users`	Telegram user metadata
`embedding_cache`	Cached embedding vectors (60-day TTL, 50k max, LRU)
`exec_audit`	Command execution audit log (tool, command, exit code, stdout/stderr, duration)
`tool_index`	Tool RAG index: tool name, description, and search text for semantic tool selection
`tool_index_fts`	FTS5 index over tool_index for keyword-based tool search
`tool_config`	Runtime tool configuration overrides (enabled, scope) set via admin commands
`tasks`	Scheduled and pending agent tasks
`task_dependencies`	Dependency graph between tasks