Multi-LLM Providers - Teleton Agent Documentation

Provider Overview

All providers share the same configuration shape. Only provider, model, and (where required) api_key need to change when switching.

Provider ID	Display Name	Default Model	Env Var	Tool Limit
`anthropic`	Anthropic (Claude)	`claude-opus-4-6`	`ANTHROPIC_API_KEY`	Unlimited
`claude-code`	Claude Code (Auto)	`claude-opus-4-6`	Auto-detected	Unlimited
`openai`	OpenAI (GPT-4o)	`gpt-4o`	`OPENAI_API_KEY`	128
`google`	Google (Gemini)	`gemini-2.5-flash`	`GOOGLE_API_KEY`	128
`xai`	xAI (Grok)	`grok-3`	`XAI_API_KEY`	128
`groq`	Groq	`llama-3.3-70b-versatile`	`GROQ_API_KEY`	128
`openrouter`	OpenRouter	`anthropic/claude-opus-4.5`	`OPENROUTER_API_KEY`	128
`moonshot`	Moonshot (Kimi K2.5)	`k2p5`	`MOONSHOT_API_KEY`	128
`mistral`	Mistral AI	`devstral-small-2507`	`MISTRAL_API_KEY`	128
`cerebras`	Cerebras	`qwen-3-235b-a22b-instruct-2507`	`CEREBRAS_API_KEY`	128
`zai`	ZAI (Zhipu)	`glm-4.7`	`ZAI_API_KEY`	128
`minimax`	MiniMax	`MiniMax-M2.5`	`MINIMAX_API_KEY`	128
`huggingface`	HuggingFace	`deepseek-ai/DeepSeek-V3.2`	`HF_TOKEN`	128
`cocoon`	Cocoon Network	`Qwen/Qwen3-32B`	None (pays in TON)	128
`local`	Local LLM	auto	None	128

Configuration

Switch providers by changing agent.provider in config.yaml. All other agent settings (memory, tools, Telegram) remain unchanged.

config.yaml — switching providers

agent:
  provider: openai          # change this one value
  model: gpt-4o             # pick a model for that provider
  api_key: ${OPENAI_API_KEY}
  max_tokens: 4096
  temperature: 0.7

Utility Model

Each provider has a utility model — a cheaper, faster model used automatically for memory summarization and compaction. Teleton selects a sensible default per provider. Override it explicitly if needed:

config.yaml

agent:
  provider: anthropic
  model: claude-opus-4-6
  utility_model: claude-haiku-4-5-20251001  # optional override

Provider	Default Utility Model
`anthropic`	`claude-haiku-4-5-20251001`
`claude-code`	`claude-haiku-4-5-20251001`
`openai`	`gpt-4o-mini`
`google`	`gemini-2.0-flash-lite`
`xai`	`grok-3-mini-fast`
`groq`	`llama-3.1-8b-instant`
`openrouter`	`google/gemini-2.5-flash-lite`
`moonshot`	`k2p5`
`mistral`	`ministral-8b-latest`
`cerebras`	`llama3.1-8b`
`zai`	`glm-4.7-flash`
`minimax`	`MiniMax-M2`
`huggingface`	`Qwen/Qwen3-Next-80B-A3B-Instruct`
`cocoon`	`Qwen/Qwen3-32B`
`local`	auto

Tool Limits

Anthropic and Claude Code providers have unlimited tool calls per turn — ideal for complex agentic workflows that call many tools in sequence. All other providers are capped at 128 tool calls per turn, which is sufficient for the vast majority of use cases.

Common Parameters

Parameter	Type	Description
`provider`	`string`	One of the 15 provider IDs listed above.
`model`	`string`	Model ID for the selected provider. See each provider section for available models.
`api_key`	`string`	API key or token. Reference env vars with `${VAR_NAME}`. Not required for `claude-code`, `cocoon`, and `local`.
`utility_model`	`string`	Model used for summarization and memory compaction. Defaults to provider's fast model.
`max_tokens`	`number`	Maximum output tokens per response. Default: 4096.
`temperature`	`number`	Sampling temperature (0.0–1.0). Default: 0.7.
`base_url`	`string`	Custom API endpoint. Required for `local`. Optional for self-hosted deployments.

Anthropic (Claude)

The default provider. Claude models offer unlimited tool calls, making them the best choice for complex multi-step agentic workflows. Key prefix: sk-ant-api03-.

config.yaml

agent:
  provider: anthropic
  model: claude-opus-4-6
  api_key: ${ANTHROPIC_API_KEY}
  max_tokens: 4096

Available models:

Model ID	Name	Notes
`claude-opus-4-6`	Claude Opus 4.6	Most capable, 1M ctx, $5/M
`claude-opus-4-5-20251101`	Claude Opus 4.5	Previous gen, 200K ctx, $5/M
`claude-sonnet-4-6`	Claude Sonnet 4.6	Balanced, 200K ctx, $3/M
`claude-haiku-4-5-20251001`	Claude Haiku 4.5	Fast & cheap, $1/M — default utility model

Console: console.anthropic.com

Claude Code (Auto)

A special variant of the Anthropic provider that automatically reads OAuth credentials from a local Claude Code installation. No API key required — Teleton reads the token directly from disk and rotates it on expiry or 401 errors.

config.yaml

agent:
  provider: claude-code
  model: claude-opus-4-6
  # api_key is optional — used only as fallback if auto-detection fails

Credential resolution order:

Return cached token if still valid
Linux / Windows: read ~/.claude/.credentials.json (or $CLAUDE_CONFIG_DIR/.credentials.json)
macOS: read from Keychain (service Claude Code-credentials), fall back to credentials file
If all else fails, use api_key from config
Throw if nothing works — run claude login to authenticate

Same models as anthropic, same unlimited tool limit. OAuth tokens use the prefix sk-ant-oat01-.

Get Claude Code: claude.ai/code

OpenAI

Access GPT-4o, GPT-5, o3, and other OpenAI models. Key prefix: sk-. Tool calls capped at 128 per turn.

config.yaml

agent:
  provider: openai
  model: gpt-4o
  api_key: ${OPENAI_API_KEY}

Available models:

Model ID	Name	Notes
`gpt-5`	GPT-5	Most capable, 400K ctx, $1.25/M
`gpt-5-pro`	GPT-5 Pro	Extended thinking, 400K ctx
`gpt-5-mini`	GPT-5 Mini	Fast & cheap, 400K ctx
`gpt-5.1`	GPT-5.1	Latest gen, 400K ctx
`gpt-4o`	GPT-4o	Balanced, 128K ctx, $2.50/M — default
`gpt-4.1`	GPT-4.1	1M ctx, $2/M
`gpt-4.1-mini`	GPT-4.1 Mini	1M ctx, cheap, $0.40/M — default utility model
`o4-mini`	o4 Mini	Reasoning, fast, 200K ctx
`o3`	o3	Reasoning, 200K ctx, $2/M
`codex-mini-latest`	Codex Mini	Coding specialist

Console: platform.openai.com

Google (Gemini)

Gemini models with up to 1M context window. No key prefix requirement. Tool calls capped at 128 per turn.

config.yaml

agent:
  provider: google
  model: gemini-2.5-flash
  api_key: ${GOOGLE_API_KEY}

Available models:

Model ID	Name	Notes
`gemini-3-pro-preview`	Gemini 3 Pro	Preview, most capable
`gemini-3-flash-preview`	Gemini 3 Flash	Preview, fast
`gemini-2.5-pro`	Gemini 2.5 Pro	Stable, 1M ctx, $1.25/M
`gemini-2.5-flash`	Gemini 2.5 Flash	Fast, 1M ctx, $0.30/M — default
`gemini-2.5-flash-lite`	Gemini 2.5 Flash Lite	Ultra cheap, 1M ctx
`gemini-2.0-flash`	Gemini 2.0 Flash	Cheap, 1M ctx, $0.10/M — default utility model

Console: aistudio.google.com

xAI (Grok)

Grok models from xAI with very large context windows and vision capabilities. Key prefix: xai-.

config.yaml

agent:
  provider: xai
  model: grok-3
  api_key: ${XAI_API_KEY}

Available models:

Model ID	Name	Notes
`grok-4-1-fast`	Grok 4.1 Fast	Latest, vision, 2M ctx
`grok-4-fast`	Grok 4 Fast	Vision, 2M ctx, $0.20/M
`grok-4`	Grok 4	Reasoning, 256K ctx, $3/M
`grok-code-fast-1`	Grok Code	Coding specialist, fast
`grok-3`	Grok 3	Stable, 131K ctx, $3/M — default
`grok-3-mini-fast`	Grok 3 Mini Fast	Default utility model

Console: console.x.ai

Groq

Ultra-fast inference on open-source models via Groq's custom hardware. Key prefix: gsk_. Best for latency-sensitive applications.

config.yaml

agent:
  provider: groq
  model: llama-3.3-70b-versatile
  api_key: ${GROQ_API_KEY}

Available models:

Model ID	Name	Notes
`meta-llama/llama-4-maverick-17b-128e-instruct`	Llama 4 Maverick	Vision, 131K ctx, $0.20/M
`qwen/qwen3-32b`	Qwen3 32B	Reasoning, 131K ctx, $0.29/M
`deepseek-r1-distill-llama-70b`	DeepSeek R1 70B	Reasoning, 131K ctx, $0.75/M
`llama-3.3-70b-versatile`	Llama 3.3 70B	General purpose, 131K ctx — default
`llama-3.1-8b-instant`	Llama 3.1 8B Instant	Very fast, cheap — default utility model

Console: console.groq.com

OpenRouter

Multi-model gateway — access hundreds of models from many providers with a single API key. Model IDs use the format provider/model-name. Key prefix: sk-or-.

config.yaml

agent:
  provider: openrouter
  model: anthropic/claude-opus-4.5
  api_key: ${OPENROUTER_API_KEY}

Available models (selection):

Model ID	Name	Notes
`anthropic/claude-opus-4.5`	Claude Opus 4.5	200K ctx, $5/M — default
`anthropic/claude-sonnet-4-6`	Claude Sonnet 4.6	200K ctx, $3/M
`openai/gpt-5`	GPT-5	400K ctx, $1.25/M
`google/gemini-2.5-flash`	Gemini 2.5 Flash	1M ctx, $0.30/M
`google/gemini-2.5-flash-lite`	Gemini 2.5 Flash Lite	Default utility model
`deepseek/deepseek-r1`	DeepSeek R1	Reasoning, 64K ctx, $0.70/M
`deepseek/deepseek-r1-0528`	DeepSeek R1 0528	Reasoning improved
`deepseek/deepseek-v3.2`	DeepSeek V3.2	Latest general, 64K ctx
`qwen/qwen3-coder`	Qwen3 Coder	Coding specialist
`qwen/qwen3-235b-a22b`	Qwen3 235B	235B params, MoE
`x-ai/grok-4`	Grok 4	256K ctx, $3/M
`perplexity/sonar-pro`	Perplexity Sonar Pro	Web search integrated

Any model listed on openrouter.ai/models can be used directly as the model value.

Console: openrouter.ai

Moonshot (Kimi K2.5)

Moonshot AI's Kimi K2.5 model accessed via the kimi-coding API at api.kimi.com/coding. Multimodal with a 262K context window. Key prefix: sk-.

config.yaml

agent:
  provider: moonshot
  model: k2p5
  api_key: ${MOONSHOT_API_KEY}

Note: The model ID k2p5 is the config alias for Kimi K2.5. The provider uses a specialized kimi-coding API endpoint internally — do not use generic Moonshot platform model IDs.

Available models:

Model ID	Name	Notes
`k2p5`	Kimi K2.5	262K ctx, multimodal — default & utility
`kimi-k2-thinking`	Kimi K2 Thinking	262K ctx, reasoning mode

Console: platform.moonshot.ai

Mistral AI

Mistral models including Devstral (coding-optimized) and Magistral (reasoning). No key prefix requirement.

config.yaml

agent:
  provider: mistral
  model: devstral-small-2507
  api_key: ${MISTRAL_API_KEY}

Available models:

Model ID	Name	Notes
`devstral-small-2507`	Devstral Small	Coding, 128K ctx, $0.10/M — default
`devstral-medium-latest`	Devstral Medium	Coding, 262K ctx, $0.40/M
`mistral-large-latest`	Mistral Large	General, 128K ctx, $2/M
`magistral-small`	Magistral Small	Reasoning, 128K ctx, $0.50/M
`ministral-8b-latest`	Ministral 8B	Default utility model

Console: console.mistral.ai

Cerebras

High-speed inference on Cerebras' Wafer-Scale Engine hardware. Key prefix: csk-. Excellent throughput for large models.

config.yaml

agent:
  provider: cerebras
  model: qwen-3-235b-a22b-instruct-2507
  api_key: ${CEREBRAS_API_KEY}

Available models:

Model ID	Name	Notes
`qwen-3-235b-a22b-instruct-2507`	Qwen 3 235B	131K ctx, $0.60/$1.20 — default
`gpt-oss-120b`	GPT OSS 120B	Reasoning, 131K ctx, $0.25/M
`zai-glm-4.7`	ZAI GLM-4.7	131K ctx, $2.25/M
`llama3.1-8b`	Llama 3.1 8B	Fast & cheap, 32K ctx, $0.10/M — default utility model

Console: cloud.cerebras.ai

ZAI (Zhipu)

GLM models from Zhipu AI (ZAI). Features FREE flash variants with 200K context. No key prefix requirement.

config.yaml

agent:
  provider: zai
  model: glm-4.7
  api_key: ${ZAI_API_KEY}

Available models:

Model ID	Name	Notes
`glm-4.7`	GLM-4.7	204K ctx, $0.60/$2.20 — default
`glm-5`	GLM-5	Best quality, 204K ctx, $1.00/$3.20
`glm-4.6`	GLM-4.6	204K ctx, $0.60/$2.20
`glm-4.7-flash`	GLM-4.7 Flash	FREE, 200K ctx — default utility model
`glm-4.5-flash`	GLM-4.5 Flash	FREE, 131K ctx
`glm-4.5v`	GLM-4.5V	Vision, 64K ctx, $0.60/$1.80

Console: z.ai

MiniMax

MiniMax M2 and M2.5 series models with a 204K context window. No key prefix requirement.

config.yaml

agent:
  provider: minimax
  model: MiniMax-M2.5
  api_key: ${MINIMAX_API_KEY}

Available models:

Model ID	Name	Notes
`MiniMax-M2.5`	MiniMax M2.5	204K ctx, $0.30/$1.20 — default
`MiniMax-M2.5-highspeed`	MiniMax M2.5 Fast	204K ctx, higher throughput, $0.60/$2.40
`MiniMax-M2.1`	MiniMax M2.1	204K ctx, $0.30/$1.20
`MiniMax-M2`	MiniMax M2	196K ctx, $0.30/$1.20 — default utility model

Console: platform.minimax.io

HuggingFace

Access models hosted on HuggingFace Inference API. Model IDs use the org/model-name format. Token prefix: hf_.

Note: The environment variable is HF_TOKEN, not HUGGINGFACE_API_KEY.

config.yaml

agent:
  provider: huggingface
  model: deepseek-ai/DeepSeek-V3.2
  api_key: ${HF_TOKEN}

Available models:

Model ID	Name	Notes
`deepseek-ai/DeepSeek-V3.2`	DeepSeek V3.2	163K ctx, $0.28/$0.40 — default
`deepseek-ai/DeepSeek-R1-0528`	DeepSeek R1	Reasoning, 163K ctx, $3/$5
`Qwen/Qwen3-235B-A22B-Thinking-2507`	Qwen3 235B	Reasoning, 262K ctx, $0.30/$3
`Qwen/Qwen3-Coder-480B-A35B-Instruct`	Qwen3 Coder 480B	Coding, 262K ctx, $2/$2
`Qwen/Qwen3-Next-80B-A3B-Instruct`	Qwen3 Next 80B	262K ctx, $0.25/$1 — default utility model
`moonshotai/Kimi-K2.5`	Kimi K2.5	262K ctx, $0.60/$3
`zai-org/GLM-4.7-Flash`	GLM-4.7 Flash	FREE, 200K ctx
`zai-org/GLM-5`	GLM-5	202K ctx, $1/$3.20

Console: huggingface.co

Cocoon Network (Decentralized)

Decentralized inference network — no API key required. Payments are made in TON. Requires the cocoon-cli proxy running locally. Teleton connects to it over a local HTTP port.

Architecture note: Cocoon uses XML-based tool injection rather than the standard JSON tool calling API. This is handled transparently inside Teleton.

config.yaml

agent:
  provider: cocoon
  model: Qwen/Qwen3-32B

cocoon:
  port: 10000  # cocoon-cli local proxy port (default: 10000)

Setup:

Install and start cocoon-cli from cocoon.network
Fund your Cocoon wallet with TON
Set agent.provider: cocoon in config — no API key needed
Configure cocoon.port if you changed the default proxy port

Available models depend on what Cocoon Network offers at runtime. Qwen/Qwen3-32B is the recommended default.

Website: cocoon.network

Local LLM

Run any OpenAI-compatible local model server: Ollama, vLLM, LM Studio, llama.cpp, or any other server that exposes the /v1/chat/completions endpoint. No API key or internet connection required.

base_url is required for this provider.

config.yaml — Ollama

agent:
  provider: local
  model: llama3.2          # model tag as known to Ollama
  base_url: http://localhost:11434/v1

config.yaml — LM Studio

agent:
  provider: local
  model: auto              # auto = discovered at runtime
  base_url: http://localhost:1234/v1

config.yaml — vLLM

agent:
  provider: local
  model: mistralai/Mistral-7B-Instruct-v0.3
  base_url: http://localhost:8000/v1
  api_key: none            # some servers require a placeholder key

Notes:

Setting model: auto tells Teleton to query the server's /v1/models endpoint and use the first available model.
Tool calling support depends on the local model and server. Not all local models support function calling.
For Ollama, pull the model first: ollama pull llama3.2

Model Catalog

Teleton ships with a catalog of 80+ pre-defined model IDs across all 15 providers, used by the WebUI setup wizard and CLI onboard flow. You are not limited to this list — any model ID accepted by the provider's API can be set as agent.model directly in config.yaml.

The catalog is defined in src/config/model-catalog.ts. Each entry includes the model value, display name, and a short description with context window and approximate pricing.

Using a model not in the catalog

agent:
  provider: openrouter
  model: nvidia/nemotron-nano-9b-v2  # any valid provider model ID works