Multi-LLM Providers
15 LLM providers supported through pi-ai unified abstraction. Switch providers with a single config change.
Provider Overview
All providers share the same configuration shape. Only provider, model, and (where required) api_key need to change when switching.
| Provider ID | Display Name | Default Model | Env Var | Tool Limit |
|---|---|---|---|---|
anthropic | Anthropic (Claude) | claude-opus-4-6 | ANTHROPIC_API_KEY | Unlimited |
claude-code | Claude Code (Auto) | claude-opus-4-6 | Auto-detected | Unlimited |
openai | OpenAI (GPT-4o) | gpt-4o | OPENAI_API_KEY | 128 |
google | Google (Gemini) | gemini-2.5-flash | GOOGLE_API_KEY | 128 |
xai | xAI (Grok) | grok-3 | XAI_API_KEY | 128 |
groq | Groq | llama-3.3-70b-versatile | GROQ_API_KEY | 128 |
openrouter | OpenRouter | anthropic/claude-opus-4.5 | OPENROUTER_API_KEY | 128 |
moonshot | Moonshot (Kimi K2.5) | k2p5 | MOONSHOT_API_KEY | 128 |
mistral | Mistral AI | devstral-small-2507 | MISTRAL_API_KEY | 128 |
cerebras | Cerebras | qwen-3-235b-a22b-instruct-2507 | CEREBRAS_API_KEY | 128 |
zai | ZAI (Zhipu) | glm-4.7 | ZAI_API_KEY | 128 |
minimax | MiniMax | MiniMax-M2.5 | MINIMAX_API_KEY | 128 |
huggingface | HuggingFace | deepseek-ai/DeepSeek-V3.2 | HF_TOKEN | 128 |
cocoon | Cocoon Network | Qwen/Qwen3-32B | None (pays in TON) | 128 |
local | Local LLM | auto | None | 128 |
Configuration
Switch providers by changing agent.provider in config.yaml. All other agent settings (memory, tools, Telegram) remain unchanged.
agent:
provider: openai # change this one value
model: gpt-4o # pick a model for that provider
api_key: ${OPENAI_API_KEY}
max_tokens: 4096
temperature: 0.7Utility Model
Each provider has a utility model — a cheaper, faster model used automatically for memory summarization and compaction. Teleton selects a sensible default per provider. Override it explicitly if needed:
agent:
provider: anthropic
model: claude-opus-4-6
utility_model: claude-haiku-4-5-20251001 # optional override| Provider | Default Utility Model |
|---|---|
anthropic | claude-haiku-4-5-20251001 |
claude-code | claude-haiku-4-5-20251001 |
openai | gpt-4o-mini |
google | gemini-2.0-flash-lite |
xai | grok-3-mini-fast |
groq | llama-3.1-8b-instant |
openrouter | google/gemini-2.5-flash-lite |
moonshot | k2p5 |
mistral | ministral-8b-latest |
cerebras | llama3.1-8b |
zai | glm-4.7-flash |
minimax | MiniMax-M2 |
huggingface | Qwen/Qwen3-Next-80B-A3B-Instruct |
cocoon | Qwen/Qwen3-32B |
local | auto |
Tool Limits
Anthropic and Claude Code providers have unlimited tool calls per turn — ideal for complex agentic workflows that call many tools in sequence. All other providers are capped at 128 tool calls per turn, which is sufficient for the vast majority of use cases.
Common Parameters
| Parameter | Type | Description |
|---|---|---|
provider | string | One of the 15 provider IDs listed above. |
model | string | Model ID for the selected provider. See each provider section for available models. |
api_key | string | API key or token. Reference env vars with ${VAR_NAME}. Not required for claude-code, cocoon, and local. |
utility_model | string | Model used for summarization and memory compaction. Defaults to provider's fast model. |
max_tokens | number | Maximum output tokens per response. Default: 4096. |
temperature | number | Sampling temperature (0.0–1.0). Default: 0.7. |
base_url | string | Custom API endpoint. Required for local. Optional for self-hosted deployments. |
Anthropic (Claude)
The default provider. Claude models offer unlimited tool calls, making them the best choice for complex multi-step agentic workflows. Key prefix: sk-ant-api03-.
agent:
provider: anthropic
model: claude-opus-4-6
api_key: ${ANTHROPIC_API_KEY}
max_tokens: 4096Available models:
| Model ID | Name | Notes |
|---|---|---|
claude-opus-4-6 | Claude Opus 4.6 | Most capable, 1M ctx, $5/M |
claude-opus-4-5-20251101 | Claude Opus 4.5 | Previous gen, 200K ctx, $5/M |
claude-sonnet-4-6 | Claude Sonnet 4.6 | Balanced, 200K ctx, $3/M |
claude-haiku-4-5-20251001 | Claude Haiku 4.5 | Fast & cheap, $1/M — default utility model |
Console: console.anthropic.com
Claude Code (Auto)
A special variant of the Anthropic provider that automatically reads OAuth credentials from a local Claude Code installation. No API key required — Teleton reads the token directly from disk and rotates it on expiry or 401 errors.
agent:
provider: claude-code
model: claude-opus-4-6
# api_key is optional — used only as fallback if auto-detection failsCredential resolution order:
- Return cached token if still valid
- Linux / Windows: read
~/.claude/.credentials.json(or$CLAUDE_CONFIG_DIR/.credentials.json) - macOS: read from Keychain (service
Claude Code-credentials), fall back to credentials file - If all else fails, use
api_keyfrom config - Throw if nothing works — run
claude loginto authenticate
Same models as anthropic, same unlimited tool limit. OAuth tokens use the prefix sk-ant-oat01-.
Get Claude Code: claude.ai/code
OpenAI
Access GPT-4o, GPT-5, o3, and other OpenAI models. Key prefix: sk-. Tool calls capped at 128 per turn.
agent:
provider: openai
model: gpt-4o
api_key: ${OPENAI_API_KEY}Available models:
| Model ID | Name | Notes |
|---|---|---|
gpt-5 | GPT-5 | Most capable, 400K ctx, $1.25/M |
gpt-5-pro | GPT-5 Pro | Extended thinking, 400K ctx |
gpt-5-mini | GPT-5 Mini | Fast & cheap, 400K ctx |
gpt-5.1 | GPT-5.1 | Latest gen, 400K ctx |
gpt-4o | GPT-4o | Balanced, 128K ctx, $2.50/M — default |
gpt-4.1 | GPT-4.1 | 1M ctx, $2/M |
gpt-4.1-mini | GPT-4.1 Mini | 1M ctx, cheap, $0.40/M — default utility model |
o4-mini | o4 Mini | Reasoning, fast, 200K ctx |
o3 | o3 | Reasoning, 200K ctx, $2/M |
codex-mini-latest | Codex Mini | Coding specialist |
Console: platform.openai.com
Google (Gemini)
Gemini models with up to 1M context window. No key prefix requirement. Tool calls capped at 128 per turn.
agent:
provider: google
model: gemini-2.5-flash
api_key: ${GOOGLE_API_KEY}Available models:
| Model ID | Name | Notes |
|---|---|---|
gemini-3-pro-preview | Gemini 3 Pro | Preview, most capable |
gemini-3-flash-preview | Gemini 3 Flash | Preview, fast |
gemini-2.5-pro | Gemini 2.5 Pro | Stable, 1M ctx, $1.25/M |
gemini-2.5-flash | Gemini 2.5 Flash | Fast, 1M ctx, $0.30/M — default |
gemini-2.5-flash-lite | Gemini 2.5 Flash Lite | Ultra cheap, 1M ctx |
gemini-2.0-flash | Gemini 2.0 Flash | Cheap, 1M ctx, $0.10/M — default utility model |
Console: aistudio.google.com
xAI (Grok)
Grok models from xAI with very large context windows and vision capabilities. Key prefix: xai-.
agent:
provider: xai
model: grok-3
api_key: ${XAI_API_KEY}Available models:
| Model ID | Name | Notes |
|---|---|---|
grok-4-1-fast | Grok 4.1 Fast | Latest, vision, 2M ctx |
grok-4-fast | Grok 4 Fast | Vision, 2M ctx, $0.20/M |
grok-4 | Grok 4 | Reasoning, 256K ctx, $3/M |
grok-code-fast-1 | Grok Code | Coding specialist, fast |
grok-3 | Grok 3 | Stable, 131K ctx, $3/M — default |
grok-3-mini-fast | Grok 3 Mini Fast | Default utility model |
Console: console.x.ai
Groq
Ultra-fast inference on open-source models via Groq's custom hardware. Key prefix: gsk_. Best for latency-sensitive applications.
agent:
provider: groq
model: llama-3.3-70b-versatile
api_key: ${GROQ_API_KEY}Available models:
| Model ID | Name | Notes |
|---|---|---|
meta-llama/llama-4-maverick-17b-128e-instruct | Llama 4 Maverick | Vision, 131K ctx, $0.20/M |
qwen/qwen3-32b | Qwen3 32B | Reasoning, 131K ctx, $0.29/M |
deepseek-r1-distill-llama-70b | DeepSeek R1 70B | Reasoning, 131K ctx, $0.75/M |
llama-3.3-70b-versatile | Llama 3.3 70B | General purpose, 131K ctx — default |
llama-3.1-8b-instant | Llama 3.1 8B Instant | Very fast, cheap — default utility model |
Console: console.groq.com
OpenRouter
Multi-model gateway — access hundreds of models from many providers with a single API key. Model IDs use the format provider/model-name. Key prefix: sk-or-.
agent:
provider: openrouter
model: anthropic/claude-opus-4.5
api_key: ${OPENROUTER_API_KEY}Available models (selection):
| Model ID | Name | Notes |
|---|---|---|
anthropic/claude-opus-4.5 | Claude Opus 4.5 | 200K ctx, $5/M — default |
anthropic/claude-sonnet-4-6 | Claude Sonnet 4.6 | 200K ctx, $3/M |
openai/gpt-5 | GPT-5 | 400K ctx, $1.25/M |
google/gemini-2.5-flash | Gemini 2.5 Flash | 1M ctx, $0.30/M |
google/gemini-2.5-flash-lite | Gemini 2.5 Flash Lite | Default utility model |
deepseek/deepseek-r1 | DeepSeek R1 | Reasoning, 64K ctx, $0.70/M |
deepseek/deepseek-r1-0528 | DeepSeek R1 0528 | Reasoning improved |
deepseek/deepseek-v3.2 | DeepSeek V3.2 | Latest general, 64K ctx |
qwen/qwen3-coder | Qwen3 Coder | Coding specialist |
qwen/qwen3-235b-a22b | Qwen3 235B | 235B params, MoE |
x-ai/grok-4 | Grok 4 | 256K ctx, $3/M |
perplexity/sonar-pro | Perplexity Sonar Pro | Web search integrated |
Any model listed on openrouter.ai/models can be used directly as the model value.
Console: openrouter.ai
Moonshot (Kimi K2.5)
Moonshot AI's Kimi K2.5 model accessed via the kimi-coding API at api.kimi.com/coding. Multimodal with a 262K context window. Key prefix: sk-.
agent:
provider: moonshot
model: k2p5
api_key: ${MOONSHOT_API_KEY}Note: The model ID k2p5 is the config alias for Kimi K2.5. The provider uses a specialized kimi-coding API endpoint internally — do not use generic Moonshot platform model IDs.
Available models:
| Model ID | Name | Notes |
|---|---|---|
k2p5 | Kimi K2.5 | 262K ctx, multimodal — default & utility |
kimi-k2-thinking | Kimi K2 Thinking | 262K ctx, reasoning mode |
Console: platform.moonshot.ai
Mistral AI
Mistral models including Devstral (coding-optimized) and Magistral (reasoning). No key prefix requirement.
agent:
provider: mistral
model: devstral-small-2507
api_key: ${MISTRAL_API_KEY}Available models:
| Model ID | Name | Notes |
|---|---|---|
devstral-small-2507 | Devstral Small | Coding, 128K ctx, $0.10/M — default |
devstral-medium-latest | Devstral Medium | Coding, 262K ctx, $0.40/M |
mistral-large-latest | Mistral Large | General, 128K ctx, $2/M |
magistral-small | Magistral Small | Reasoning, 128K ctx, $0.50/M |
ministral-8b-latest | Ministral 8B | Default utility model |
Console: console.mistral.ai
Cerebras
High-speed inference on Cerebras' Wafer-Scale Engine hardware. Key prefix: csk-. Excellent throughput for large models.
agent:
provider: cerebras
model: qwen-3-235b-a22b-instruct-2507
api_key: ${CEREBRAS_API_KEY}Available models:
| Model ID | Name | Notes |
|---|---|---|
qwen-3-235b-a22b-instruct-2507 | Qwen 3 235B | 131K ctx, $0.60/$1.20 — default |
gpt-oss-120b | GPT OSS 120B | Reasoning, 131K ctx, $0.25/M |
zai-glm-4.7 | ZAI GLM-4.7 | 131K ctx, $2.25/M |
llama3.1-8b | Llama 3.1 8B | Fast & cheap, 32K ctx, $0.10/M — default utility model |
Console: cloud.cerebras.ai
ZAI (Zhipu)
GLM models from Zhipu AI (ZAI). Features FREE flash variants with 200K context. No key prefix requirement.
agent:
provider: zai
model: glm-4.7
api_key: ${ZAI_API_KEY}Available models:
| Model ID | Name | Notes |
|---|---|---|
glm-4.7 | GLM-4.7 | 204K ctx, $0.60/$2.20 — default |
glm-5 | GLM-5 | Best quality, 204K ctx, $1.00/$3.20 |
glm-4.6 | GLM-4.6 | 204K ctx, $0.60/$2.20 |
glm-4.7-flash | GLM-4.7 Flash | FREE, 200K ctx — default utility model |
glm-4.5-flash | GLM-4.5 Flash | FREE, 131K ctx |
glm-4.5v | GLM-4.5V | Vision, 64K ctx, $0.60/$1.80 |
Console: z.ai
MiniMax
MiniMax M2 and M2.5 series models with a 204K context window. No key prefix requirement.
agent:
provider: minimax
model: MiniMax-M2.5
api_key: ${MINIMAX_API_KEY}Available models:
| Model ID | Name | Notes |
|---|---|---|
MiniMax-M2.5 | MiniMax M2.5 | 204K ctx, $0.30/$1.20 — default |
MiniMax-M2.5-highspeed | MiniMax M2.5 Fast | 204K ctx, higher throughput, $0.60/$2.40 |
MiniMax-M2.1 | MiniMax M2.1 | 204K ctx, $0.30/$1.20 |
MiniMax-M2 | MiniMax M2 | 196K ctx, $0.30/$1.20 — default utility model |
Console: platform.minimax.io
HuggingFace
Access models hosted on HuggingFace Inference API. Model IDs use the org/model-name format. Token prefix: hf_.
Note: The environment variable is HF_TOKEN, not HUGGINGFACE_API_KEY.
agent:
provider: huggingface
model: deepseek-ai/DeepSeek-V3.2
api_key: ${HF_TOKEN}Available models:
| Model ID | Name | Notes |
|---|---|---|
deepseek-ai/DeepSeek-V3.2 | DeepSeek V3.2 | 163K ctx, $0.28/$0.40 — default |
deepseek-ai/DeepSeek-R1-0528 | DeepSeek R1 | Reasoning, 163K ctx, $3/$5 |
Qwen/Qwen3-235B-A22B-Thinking-2507 | Qwen3 235B | Reasoning, 262K ctx, $0.30/$3 |
Qwen/Qwen3-Coder-480B-A35B-Instruct | Qwen3 Coder 480B | Coding, 262K ctx, $2/$2 |
Qwen/Qwen3-Next-80B-A3B-Instruct | Qwen3 Next 80B | 262K ctx, $0.25/$1 — default utility model |
moonshotai/Kimi-K2.5 | Kimi K2.5 | 262K ctx, $0.60/$3 |
zai-org/GLM-4.7-Flash | GLM-4.7 Flash | FREE, 200K ctx |
zai-org/GLM-5 | GLM-5 | 202K ctx, $1/$3.20 |
Console: huggingface.co
Cocoon Network (Decentralized)
Decentralized inference network — no API key required. Payments are made in TON. Requires the cocoon-cli proxy running locally. Teleton connects to it over a local HTTP port.
Architecture note: Cocoon uses XML-based tool injection rather than the standard JSON tool calling API. This is handled transparently inside Teleton.
agent:
provider: cocoon
model: Qwen/Qwen3-32B
cocoon:
port: 10000 # cocoon-cli local proxy port (default: 10000)Setup:
- Install and start
cocoon-clifrom cocoon.network - Fund your Cocoon wallet with TON
- Set
agent.provider: cocoonin config — no API key needed - Configure
cocoon.portif you changed the default proxy port
Available models depend on what Cocoon Network offers at runtime. Qwen/Qwen3-32B is the recommended default.
Website: cocoon.network
Local LLM
Run any OpenAI-compatible local model server: Ollama, vLLM, LM Studio, llama.cpp, or any other server that exposes the /v1/chat/completions endpoint. No API key or internet connection required.
base_url is required for this provider.
agent:
provider: local
model: llama3.2 # model tag as known to Ollama
base_url: http://localhost:11434/v1agent:
provider: local
model: auto # auto = discovered at runtime
base_url: http://localhost:1234/v1agent:
provider: local
model: mistralai/Mistral-7B-Instruct-v0.3
base_url: http://localhost:8000/v1
api_key: none # some servers require a placeholder keyNotes:
- Setting
model: autotells Teleton to query the server's/v1/modelsendpoint and use the first available model. - Tool calling support depends on the local model and server. Not all local models support function calling.
- For Ollama, pull the model first:
ollama pull llama3.2
Model Catalog
Teleton ships with a catalog of 80+ pre-defined model IDs across all 15 providers, used by the WebUI setup wizard and CLI onboard flow. You are not limited to this list — any model ID accepted by the provider's API can be set as agent.model directly in config.yaml.
The catalog is defined in src/config/model-catalog.ts. Each entry includes the model value, display name, and a short description with context window and approximate pricing.
agent:
provider: openrouter
model: nvidia/nemotron-nano-9b-v2 # any valid provider model ID works