ESC
Start typing to search...

Agentic Loop

The definitive reference for Teleton's Think-Act-Observe reasoning loop -- from message ingestion through session management, context building, tool selection, iterative execution, and response formatting.

1. Overview

Teleton implements a Think-Act-Observe reasoning loop. When the agent receives a message it:

  1. Think. Reasons about the user's request, the conversation history, and the available tools.
  2. Act. Optionally calls one or more tools to gather information or perform actions.
  3. Observe. Inspects the tool results, then decides whether the task is complete or another iteration is needed.

This cycle repeats up to a configurable maximum number of iterations (default: 5). If the LLM produces a final text response or the limit is reached, the loop exits and the response is delivered.

2. Message Entry Point

Every inbound Telegram message passes through a multi-stage pipeline before reaching the agent runtime:

  1. TelegramBridge.onNewMessage() -- GramJS fires the raw event.
  2. MessageDebouncer.enqueue() -- groups messages are batched with a 1500 ms window; DMs and admin messages are dispatched immediately.
  3. MessageHandler policy checks:
    • DM policy -- allowlist, open, admin-only, or disabled.
    • Group policy -- open, allowlist, or disabled.
    • Rate-limit enforcement.
    • Mention requirement (groups only, if configured).
  4. If all checks pass, the message is forwarded to AgentRuntime.processMessage().
Simplified pipeline
GramJS event
  -> TelegramBridge.onNewMessage()
  -> MessageDebouncer.enqueue()          # 1500ms batch (groups) / immediate (DMs)
  -> MessageHandler                       # policy + rate-limit + mention checks
  -> AgentRuntime.processMessage()        # enters the agentic loop

3. Session Management

Each chat (identified by chatId) is mapped to a UUID-based session via getOrCreateSession(chatId).

Reset Policies

PolicyTriggerDefault
daily_resetFires at a configured hour each day4:00 AM
idle_expiryFires after N minutes of inactivity1440 min (24 h)

Reset Procedure

  1. The old session transcript is summarized by the LLM.
  2. The summary is saved to long-term memory.
  3. A new session ID is generated and the conversation starts fresh.

Transcript Storage

Every message and tool call is persisted as JSONL:

File path
~/.teleton/sessions/{sessionId}.jsonl

4. Context Building (RAG)

Before every LLM call the system assembles rich context from multiple sources:

  1. Recent messages -- the 10 most recent messages from the current chat.
  2. Knowledge base search -- hybrid search (vector + FTS5) returns the top 5 chunks.
  3. Telegram feed search -- hybrid search returns the top 5 messages from monitored feeds.
  4. Deduplication -- overlapping results are merged.
  5. Injection -- results are injected into the prompt as structured sections.
Hybrid search (tool selection) combines dense vector similarity with SQLite FTS5 keyword matching, weighted as 0.6 * vectorScore + 0.4 * keywordScore. Note: memory retrieval uses equal 0.5/0.5 weights — see Memory System.

5. System Prompt Construction

The system prompt is assembled dynamically by buildSystemPrompt(). The sections are appended in this order:

  1. Soul personality -- loaded from SOUL.md, or a built-in default if the file does not exist.
  2. Security rules -- loaded from SECURITY.md (if present).
  3. Strategy -- loaded from STRATEGY.md (if present, DM only).
  4. Workspace intro -- brief description of the workspace environment.
  5. Response format guidelines -- instructions on message length, Markdown usage, etc.
  6. Owner information -- the configured owner's identity.
  7. Memory context (DM only):
    • MEMORY.md -- up to 150 lines of persistent memory.
    • Daily logs -- yesterday + today, 100 lines each.
  8. Current user info -- username, ID, timezone offset.
  9. RAG search results -- the context built in step 4.
  10. Memory flush warning -- injected if the context is approaching token limits, prompting the agent to persist important information.

6. Tool Selection

Two modes determine which tools the LLM sees:

All Tools (Tool RAG disabled)

Every tool that passes scope filtering is sent to the LLM. Simple but token-expensive with large tool registries.

Tool RAG (enabled)

Semantic search selects the most relevant tools for the current message:

  1. The user message is embedded as a vector.
  2. Hybrid search scores each tool (tool selection weights): 0.6 * vectorScore + 0.4 * keywordScore.
  3. The top-K tools are returned (default 25).
  4. Always-include patterns are preserved regardless of score: telegram_send_message, journal_*, workspace_*, web_*.
  5. Provider-specific tool limits are applied: Anthropic and claude-code have unlimited tool calls; all other providers cap at 128.
config.yaml
tool_rag:
  enabled: false     # toggle Tool RAG
  top_k: 25          # max tools returned by semantic search

7. The Iteration Loop (Core)

This is the heart of the agent. The pseudocode below describes exactly what happens on each iteration:

Pseudocode -- agentic loop
iteration = 0

while iteration < max_agentic_iterations:        # default 5

    # 1. Mask old tool results to save context space
    maskOldToolResults(transcript)
    #    -> keep last 10 results intact
    #    -> keep error results intact
    #    -> replace older ones with "[Tool: name - OK]"

    # 2. Call LLM via pi-ai library
    response = llm.call(
        systemPrompt,
        transcript,          # user + assistant + tool messages
        tools                # selected tool definitions
    )
    # Provider-specific handling:
    #   - Cocoon: injects tool definitions into prompt text
    #   - Gemini: sanitizes JSON schemas for compatibility

    # 3. Handle errors
    if response.error == "context_overflow":
        archiveTranscript()
        resetSession()
        retry()
    if response.error == 429:       # rate limit
        exponentialBackoff(maxRetries=3)

    # 4. Process tool calls
    for toolCall in response.toolCalls:
        validate(toolCall, registry)
        checkScope(toolCall)         # dm-only, admin-only, etc.
        checkModulePermissions(toolCall)
        result = execute(toolCall, timeout=30_000)  # 30s timeout
        if result.size > 50KB:
            result = truncate(result)
        transcript.append(result)

    # 5. Decide: continue or break
    if response.stopReason == "toolUse" AND toolCalls.length > 0:
        iteration++
        continue                     # next iteration
    else:
        break                        # done -- return response

    iteration++
Iteration budget. The default of 5 iterations is adjustable between 1 and 50 at runtime via the /loop admin command.

8. Message Envelope Format

Every user message is wrapped in a structured envelope before being added to the transcript. The format varies by context:

Direct Message

DM envelope
[Telegram User (@username, id:123) +2h 2026-02-20 15:30 UTC] <user_message>Hello!</user_message>

Group Message

Group envelope
[Telegram Group (+5m 2026-02-20 15:30) User: Hello everyone!

Media Message

Media envelope
[photo msg_id=456] [Telegram User (@username, id:123) +2h 2026-02-20 15:30 UTC] <user_message>Check this out</user_message>

The envelope encodes the sender's identity, timezone offset, timestamp, and any attached media type, giving the LLM full situational awareness.

9. Observation Masking

As the conversation grows, old tool results are compressed to prevent context bloat:

  • The last 10 tool results are kept intact.
  • Error results are always kept intact (regardless of age).
  • All older results are replaced with a one-line summary:
    • Success: [Tool: name - OK]
    • Failure: [Tool: name - ERROR - summary]

This achieves approximately 90% size reduction per masked result while preserving the agent's awareness of what tools were called and whether they succeeded.

10. Context Compaction

When observation masking alone is not enough, full context compaction kicks in:

Preemptive compaction. Compaction also runs before the first LLM call if the loaded transcript already exceeds the configured token threshold. This prevents context overflow on the very first iteration of a resumed session.
ThresholdTriggerAction
Soft (50% of context window)Token count exceeds half the model's windowInject a memory flush warning, prompting the agent to persist important facts to MEMORY.md
Hard (200+ messages or 75% of window)Message count or token count crosses limitFull compaction (see below)

Full Compaction Process

  1. The LLM summarizes the entire conversation so far.
  2. Old messages are replaced with the summary.
  3. The last 20 messages are kept intact for continuity.
  4. A new session ID is generated.

11. Response Formatting

After the loop exits, the response is determined as follows:

  1. If the LLM produced a text response, it is returned to the user.
  2. If the telegram_send_message tool was used during the loop, the response is empty (the message was already delivered).
  3. If tool calls were made but no text was produced, a fallback message is returned.

After delivery, the session record is updated with the message count, model name, and provider used.

12. Group vs DM Differences

The agent behaves differently depending on the chat type:

AspectDirect MessageGroup
MemoryFull (MEMORY.md + daily logs)None (privacy)
StrategyIncluded in system promptExcluded (privacy)
Tool scopedm-only tools availablegroup-only tools available
DebounceNone (immediate dispatch)1500 ms batching window
MentionNot requiredRequired if configured

13. Configuration Reference

Key knobs that control the agentic loop:

ParameterDefaultNotes
agent.max_agentic_iterations5Range 1-50. Adjustable at runtime via /loop.
agent.max_tokens4096Max output tokens per LLM call.
agent.temperature0.7LLM sampling temperature.
telegram.debounce_ms1500Group message batching window.
tool_rag.enabledtrueEnable semantic tool selection.
tool_rag.top_k25Max tools returned by Tool RAG.
Compaction: message limit200Hard compaction after 200 messages.
Compaction: token threshold75%Hard compaction at 75% of context window.
Compaction: keep last20Messages preserved after compaction.
config.yaml -- agentic loop settings
agent:
  max_agentic_iterations: 5
  max_tokens: 4096
  temperature: 0.7

telegram:
  debounce_ms: 1500

tool_rag:
  enabled: false
  top_k: 25

14. Error Handling

The loop is designed to recover gracefully from a range of failures:

ErrorDetectionRecovery
Context overflowLLM returns a context-length errorArchive the transcript, reset the session, and retry the message.
Tool timeoutExecution exceeds 30 secondsReturn an error result to the LLM so it can reason about the failure.
Rate limit (429)HTTP 429 from providerExponential backoff, up to 3 retries.
LLM provider errorNon-429 API errorRetry once; if persistent, return a fallback error message.
Corrupt transcriptMalformed JSONL entries detectedAuto-sanitize: strip invalid entries and continue.