Agentic Loop - Teleton Agent Documentation

1. Overview

Teleton implements a Think-Act-Observe reasoning loop. When the agent receives a message it:

Think. Reasons about the user's request, the conversation history, and the available tools.
Act. Optionally calls one or more tools to gather information or perform actions.
Observe. Inspects the tool results, then decides whether the task is complete or another iteration is needed.

This cycle repeats up to a configurable maximum number of iterations (default: 5). If the LLM produces a final text response or the limit is reached, the loop exits and the response is delivered.

2. Message Entry Point

Every inbound Telegram message passes through a multi-stage pipeline before reaching the agent runtime:

TelegramBridge.onNewMessage() -- GramJS fires the raw event.
MessageDebouncer.enqueue() -- groups messages are batched with a 1500 ms window; DMs and admin messages are dispatched immediately.
MessageHandler policy checks:
- DM policy -- allowlist, open, admin-only, or disabled.
- Group policy -- open, allowlist, or disabled.
- Rate-limit enforcement.
- Mention requirement (groups only, if configured).
If all checks pass, the message is forwarded to AgentRuntime.processMessage().

Simplified pipeline

GramJS event
  -> TelegramBridge.onNewMessage()
  -> MessageDebouncer.enqueue()          # 1500ms batch (groups) / immediate (DMs)
  -> MessageHandler                       # policy + rate-limit + mention checks
  -> AgentRuntime.processMessage()        # enters the agentic loop

3. Session Management

Each chat (identified by chatId) is mapped to a UUID-based session via getOrCreateSession(chatId).

Reset Policies

Policy	Trigger	Default
`daily_reset`	Fires at a configured hour each day	4:00 AM
`idle_expiry`	Fires after N minutes of inactivity	1440 min (24 h)

Reset Procedure

The old session transcript is summarized by the LLM.
The summary is saved to long-term memory.
A new session ID is generated and the conversation starts fresh.

Transcript Storage

Every message and tool call is persisted as JSONL:

File path

~/.teleton/sessions/{sessionId}.jsonl

4. Context Building (RAG)

Before every LLM call the system assembles rich context from multiple sources:

Recent messages -- the 10 most recent messages from the current chat.
Knowledge base search -- hybrid search (vector + FTS5) returns the top 5 chunks.
Telegram feed search -- hybrid search returns the top 5 messages from monitored feeds.
Deduplication -- overlapping results are merged.
Injection -- results are injected into the prompt as structured sections.

Hybrid search (tool selection) combines dense vector similarity with SQLite FTS5 keyword matching, weighted as 0.6 * vectorScore + 0.4 * keywordScore. Note: memory retrieval uses equal 0.5/0.5 weights — see Memory System.

5. System Prompt Construction

The system prompt is assembled dynamically by buildSystemPrompt(). The sections are appended in this order:

Soul personality -- loaded from SOUL.md, or a built-in default if the file does not exist.
Security rules -- loaded from SECURITY.md (if present).
Strategy -- loaded from STRATEGY.md (if present, DM only).
Workspace intro -- brief description of the workspace environment.
Response format guidelines -- instructions on message length, Markdown usage, etc.
Owner information -- the configured owner's identity.
Memory context (DM only):
- MEMORY.md -- up to 150 lines of persistent memory.
- Daily logs -- yesterday + today, 100 lines each.
Current user info -- username, ID, timezone offset.
RAG search results -- the context built in step 4.
Memory flush warning -- injected if the context is approaching token limits, prompting the agent to persist important information.

6. Tool Selection

Two modes determine which tools the LLM sees:

All Tools (Tool RAG disabled)

Every tool that passes scope filtering is sent to the LLM. Simple but token-expensive with large tool registries.

Tool RAG (enabled)

Semantic search selects the most relevant tools for the current message:

The user message is embedded as a vector.
Hybrid search scores each tool (tool selection weights): 0.6 * vectorScore + 0.4 * keywordScore.
The top-K tools are returned (default 25).
Always-include patterns are preserved regardless of score: telegram_send_message, journal_*, workspace_*, web_*.
Provider-specific tool limits are applied: Anthropic and claude-code have unlimited tool calls; all other providers cap at 128.

config.yaml

tool_rag:
  enabled: false     # toggle Tool RAG
  top_k: 25          # max tools returned by semantic search

7. The Iteration Loop (Core)

This is the heart of the agent. The pseudocode below describes exactly what happens on each iteration:

Pseudocode -- agentic loop

iteration = 0

while iteration < max_agentic_iterations:        # default 5

    # 1. Mask old tool results to save context space
    maskOldToolResults(transcript)
    #    -> keep last 10 results intact
    #    -> keep error results intact
    #    -> replace older ones with "[Tool: name - OK]"

    # 2. Call LLM via pi-ai library
    response = llm.call(
        systemPrompt,
        transcript,          # user + assistant + tool messages
        tools                # selected tool definitions
    )
    # Provider-specific handling:
    #   - Cocoon: injects tool definitions into prompt text
    #   - Gemini: sanitizes JSON schemas for compatibility

    # 3. Handle errors
    if response.error == "context_overflow":
        archiveTranscript()
        resetSession()
        retry()
    if response.error == 429:       # rate limit
        exponentialBackoff(maxRetries=3)

    # 4. Process tool calls
    for toolCall in response.toolCalls:
        validate(toolCall, registry)
        checkScope(toolCall)         # dm-only, admin-only, etc.
        checkModulePermissions(toolCall)
        result = execute(toolCall, timeout=30_000)  # 30s timeout
        if result.size > 50KB:
            result = truncate(result)
        transcript.append(result)

    # 5. Decide: continue or break
    if response.stopReason == "toolUse" AND toolCalls.length > 0:
        iteration++
        continue                     # next iteration
    else:
        break                        # done -- return response

    iteration++

Iteration budget. The default of 5 iterations is adjustable between 1 and 50 at runtime via the /loop admin command.

8. Message Envelope Format

Every user message is wrapped in a structured envelope before being added to the transcript. The format varies by context:

Direct Message

DM envelope

[Telegram User (@username, id:123) +2h 2026-02-20 15:30 UTC] <user_message>Hello!</user_message>

Group Message

Group envelope

[Telegram Group (+5m 2026-02-20 15:30) User: Hello everyone!

Media Message

Media envelope

[photo msg_id=456] [Telegram User (@username, id:123) +2h 2026-02-20 15:30 UTC] <user_message>Check this out</user_message>

The envelope encodes the sender's identity, timezone offset, timestamp, and any attached media type, giving the LLM full situational awareness.

9. Observation Masking

As the conversation grows, old tool results are compressed to prevent context bloat:

The last 10 tool results are kept intact.
Error results are always kept intact (regardless of age).
All older results are replaced with a one-line summary:
- Success: [Tool: name - OK]
- Failure: [Tool: name - ERROR - summary]

This achieves approximately 90% size reduction per masked result while preserving the agent's awareness of what tools were called and whether they succeeded.

10. Context Compaction

When observation masking alone is not enough, full context compaction kicks in:

Preemptive compaction. Compaction also runs before the first LLM call if the loaded transcript already exceeds the configured token threshold. This prevents context overflow on the very first iteration of a resumed session.

Threshold	Trigger	Action
Soft (50% of context window)	Token count exceeds half the model's window	Inject a memory flush warning, prompting the agent to persist important facts to `MEMORY.md`
Hard (200+ messages or 75% of window)	Message count or token count crosses limit	Full compaction (see below)

Full Compaction Process

The LLM summarizes the entire conversation so far.
Old messages are replaced with the summary.
The last 20 messages are kept intact for continuity.
A new session ID is generated.

11. Response Formatting

After the loop exits, the response is determined as follows:

If the LLM produced a text response, it is returned to the user.
If the telegram_send_message tool was used during the loop, the response is empty (the message was already delivered).
If tool calls were made but no text was produced, a fallback message is returned.

After delivery, the session record is updated with the message count, model name, and provider used.

12. Group vs DM Differences

The agent behaves differently depending on the chat type:

Aspect	Direct Message	Group
Memory	Full (`MEMORY.md` + daily logs)	None (privacy)
Strategy	Included in system prompt	Excluded (privacy)
Tool scope	`dm-only` tools available	`group-only` tools available
Debounce	None (immediate dispatch)	1500 ms batching window
Mention	Not required	Required if configured

13. Configuration Reference

Key knobs that control the agentic loop:

Parameter	Default	Notes
`agent.max_agentic_iterations`	5	Range 1-50. Adjustable at runtime via `/loop`.
`agent.max_tokens`	4096	Max output tokens per LLM call.
`agent.temperature`	0.7	LLM sampling temperature.
`telegram.debounce_ms`	1500	Group message batching window.
`tool_rag.enabled`	true	Enable semantic tool selection.
`tool_rag.top_k`	25	Max tools returned by Tool RAG.
Compaction: message limit	200	Hard compaction after 200 messages.
Compaction: token threshold	75%	Hard compaction at 75% of context window.
Compaction: keep last	20	Messages preserved after compaction.

config.yaml -- agentic loop settings

agent:
  max_agentic_iterations: 5
  max_tokens: 4096
  temperature: 0.7

telegram:
  debounce_ms: 1500

tool_rag:
  enabled: false
  top_k: 25

14. Error Handling

The loop is designed to recover gracefully from a range of failures:

Error	Detection	Recovery
Context overflow	LLM returns a context-length error	Archive the transcript, reset the session, and retry the message.
Tool timeout	Execution exceeds 30 seconds	Return an error result to the LLM so it can reason about the failure.
Rate limit (429)	HTTP 429 from provider	Exponential backoff, up to 3 retries.
LLM provider error	Non-429 API error	Retry once; if persistent, return a fallback error message.
Corrupt transcript	Malformed JSONL entries detected	Auto-sanitize: strip invalid entries and continue.