ESC
Start typing to search...

Multi-LLM Providers

15 LLM providers supported through pi-ai unified abstraction. Switch providers with a single config change.

Provider Overview

All providers share the same configuration shape. Only provider, model, and (where required) api_key need to change when switching.

Provider IDDisplay NameDefault ModelEnv VarTool Limit
anthropicAnthropic (Claude)claude-opus-4-6ANTHROPIC_API_KEYUnlimited
claude-codeClaude Code (Auto)claude-opus-4-6Auto-detectedUnlimited
openaiOpenAI (GPT-4o)gpt-4oOPENAI_API_KEY128
googleGoogle (Gemini)gemini-2.5-flashGOOGLE_API_KEY128
xaixAI (Grok)grok-3XAI_API_KEY128
groqGroqllama-3.3-70b-versatileGROQ_API_KEY128
openrouterOpenRouteranthropic/claude-opus-4.5OPENROUTER_API_KEY128
moonshotMoonshot (Kimi K2.5)k2p5MOONSHOT_API_KEY128
mistralMistral AIdevstral-small-2507MISTRAL_API_KEY128
cerebrasCerebrasqwen-3-235b-a22b-instruct-2507CEREBRAS_API_KEY128
zaiZAI (Zhipu)glm-4.7ZAI_API_KEY128
minimaxMiniMaxMiniMax-M2.5MINIMAX_API_KEY128
huggingfaceHuggingFacedeepseek-ai/DeepSeek-V3.2HF_TOKEN128
cocoonCocoon NetworkQwen/Qwen3-32BNone (pays in TON)128
localLocal LLMautoNone128

Configuration

Switch providers by changing agent.provider in config.yaml. All other agent settings (memory, tools, Telegram) remain unchanged.

config.yaml — switching providers
agent:
  provider: openai          # change this one value
  model: gpt-4o             # pick a model for that provider
  api_key: ${OPENAI_API_KEY}
  max_tokens: 4096
  temperature: 0.7

Utility Model

Each provider has a utility model — a cheaper, faster model used automatically for memory summarization and compaction. Teleton selects a sensible default per provider. Override it explicitly if needed:

config.yaml
agent:
  provider: anthropic
  model: claude-opus-4-6
  utility_model: claude-haiku-4-5-20251001  # optional override
ProviderDefault Utility Model
anthropicclaude-haiku-4-5-20251001
claude-codeclaude-haiku-4-5-20251001
openaigpt-4o-mini
googlegemini-2.0-flash-lite
xaigrok-3-mini-fast
groqllama-3.1-8b-instant
openroutergoogle/gemini-2.5-flash-lite
moonshotk2p5
mistralministral-8b-latest
cerebrasllama3.1-8b
zaiglm-4.7-flash
minimaxMiniMax-M2
huggingfaceQwen/Qwen3-Next-80B-A3B-Instruct
cocoonQwen/Qwen3-32B
localauto

Tool Limits

Anthropic and Claude Code providers have unlimited tool calls per turn — ideal for complex agentic workflows that call many tools in sequence. All other providers are capped at 128 tool calls per turn, which is sufficient for the vast majority of use cases.

Common Parameters

ParameterTypeDescription
providerstringOne of the 15 provider IDs listed above.
modelstringModel ID for the selected provider. See each provider section for available models.
api_keystringAPI key or token. Reference env vars with ${VAR_NAME}. Not required for claude-code, cocoon, and local.
utility_modelstringModel used for summarization and memory compaction. Defaults to provider's fast model.
max_tokensnumberMaximum output tokens per response. Default: 4096.
temperaturenumberSampling temperature (0.0–1.0). Default: 0.7.
base_urlstringCustom API endpoint. Required for local. Optional for self-hosted deployments.

Anthropic (Claude)

The default provider. Claude models offer unlimited tool calls, making them the best choice for complex multi-step agentic workflows. Key prefix: sk-ant-api03-.

config.yaml
agent:
  provider: anthropic
  model: claude-opus-4-6
  api_key: ${ANTHROPIC_API_KEY}
  max_tokens: 4096

Available models:

Model IDNameNotes
claude-opus-4-6Claude Opus 4.6Most capable, 1M ctx, $5/M
claude-opus-4-5-20251101Claude Opus 4.5Previous gen, 200K ctx, $5/M
claude-sonnet-4-6Claude Sonnet 4.6Balanced, 200K ctx, $3/M
claude-haiku-4-5-20251001Claude Haiku 4.5Fast & cheap, $1/M — default utility model

Console: console.anthropic.com

Claude Code (Auto)

A special variant of the Anthropic provider that automatically reads OAuth credentials from a local Claude Code installation. No API key required — Teleton reads the token directly from disk and rotates it on expiry or 401 errors.

config.yaml
agent:
  provider: claude-code
  model: claude-opus-4-6
  # api_key is optional — used only as fallback if auto-detection fails

Credential resolution order:

  1. Return cached token if still valid
  2. Linux / Windows: read ~/.claude/.credentials.json (or $CLAUDE_CONFIG_DIR/.credentials.json)
  3. macOS: read from Keychain (service Claude Code-credentials), fall back to credentials file
  4. If all else fails, use api_key from config
  5. Throw if nothing works — run claude login to authenticate

Same models as anthropic, same unlimited tool limit. OAuth tokens use the prefix sk-ant-oat01-.

Get Claude Code: claude.ai/code

OpenAI

Access GPT-4o, GPT-5, o3, and other OpenAI models. Key prefix: sk-. Tool calls capped at 128 per turn.

config.yaml
agent:
  provider: openai
  model: gpt-4o
  api_key: ${OPENAI_API_KEY}

Available models:

Model IDNameNotes
gpt-5GPT-5Most capable, 400K ctx, $1.25/M
gpt-5-proGPT-5 ProExtended thinking, 400K ctx
gpt-5-miniGPT-5 MiniFast & cheap, 400K ctx
gpt-5.1GPT-5.1Latest gen, 400K ctx
gpt-4oGPT-4oBalanced, 128K ctx, $2.50/M — default
gpt-4.1GPT-4.11M ctx, $2/M
gpt-4.1-miniGPT-4.1 Mini1M ctx, cheap, $0.40/M — default utility model
o4-minio4 MiniReasoning, fast, 200K ctx
o3o3Reasoning, 200K ctx, $2/M
codex-mini-latestCodex MiniCoding specialist

Console: platform.openai.com

Google (Gemini)

Gemini models with up to 1M context window. No key prefix requirement. Tool calls capped at 128 per turn.

config.yaml
agent:
  provider: google
  model: gemini-2.5-flash
  api_key: ${GOOGLE_API_KEY}

Available models:

Model IDNameNotes
gemini-3-pro-previewGemini 3 ProPreview, most capable
gemini-3-flash-previewGemini 3 FlashPreview, fast
gemini-2.5-proGemini 2.5 ProStable, 1M ctx, $1.25/M
gemini-2.5-flashGemini 2.5 FlashFast, 1M ctx, $0.30/M — default
gemini-2.5-flash-liteGemini 2.5 Flash LiteUltra cheap, 1M ctx
gemini-2.0-flashGemini 2.0 FlashCheap, 1M ctx, $0.10/M — default utility model

Console: aistudio.google.com

xAI (Grok)

Grok models from xAI with very large context windows and vision capabilities. Key prefix: xai-.

config.yaml
agent:
  provider: xai
  model: grok-3
  api_key: ${XAI_API_KEY}

Available models:

Model IDNameNotes
grok-4-1-fastGrok 4.1 FastLatest, vision, 2M ctx
grok-4-fastGrok 4 FastVision, 2M ctx, $0.20/M
grok-4Grok 4Reasoning, 256K ctx, $3/M
grok-code-fast-1Grok CodeCoding specialist, fast
grok-3Grok 3Stable, 131K ctx, $3/M — default
grok-3-mini-fastGrok 3 Mini FastDefault utility model

Console: console.x.ai

Groq

Ultra-fast inference on open-source models via Groq's custom hardware. Key prefix: gsk_. Best for latency-sensitive applications.

config.yaml
agent:
  provider: groq
  model: llama-3.3-70b-versatile
  api_key: ${GROQ_API_KEY}

Available models:

Model IDNameNotes
meta-llama/llama-4-maverick-17b-128e-instructLlama 4 MaverickVision, 131K ctx, $0.20/M
qwen/qwen3-32bQwen3 32BReasoning, 131K ctx, $0.29/M
deepseek-r1-distill-llama-70bDeepSeek R1 70BReasoning, 131K ctx, $0.75/M
llama-3.3-70b-versatileLlama 3.3 70BGeneral purpose, 131K ctx — default
llama-3.1-8b-instantLlama 3.1 8B InstantVery fast, cheap — default utility model

Console: console.groq.com

OpenRouter

Multi-model gateway — access hundreds of models from many providers with a single API key. Model IDs use the format provider/model-name. Key prefix: sk-or-.

config.yaml
agent:
  provider: openrouter
  model: anthropic/claude-opus-4.5
  api_key: ${OPENROUTER_API_KEY}

Available models (selection):

Model IDNameNotes
anthropic/claude-opus-4.5Claude Opus 4.5200K ctx, $5/M — default
anthropic/claude-sonnet-4-6Claude Sonnet 4.6200K ctx, $3/M
openai/gpt-5GPT-5400K ctx, $1.25/M
google/gemini-2.5-flashGemini 2.5 Flash1M ctx, $0.30/M
google/gemini-2.5-flash-liteGemini 2.5 Flash LiteDefault utility model
deepseek/deepseek-r1DeepSeek R1Reasoning, 64K ctx, $0.70/M
deepseek/deepseek-r1-0528DeepSeek R1 0528Reasoning improved
deepseek/deepseek-v3.2DeepSeek V3.2Latest general, 64K ctx
qwen/qwen3-coderQwen3 CoderCoding specialist
qwen/qwen3-235b-a22bQwen3 235B235B params, MoE
x-ai/grok-4Grok 4256K ctx, $3/M
perplexity/sonar-proPerplexity Sonar ProWeb search integrated

Any model listed on openrouter.ai/models can be used directly as the model value.

Console: openrouter.ai

Moonshot (Kimi K2.5)

Moonshot AI's Kimi K2.5 model accessed via the kimi-coding API at api.kimi.com/coding. Multimodal with a 262K context window. Key prefix: sk-.

config.yaml
agent:
  provider: moonshot
  model: k2p5
  api_key: ${MOONSHOT_API_KEY}

Note: The model ID k2p5 is the config alias for Kimi K2.5. The provider uses a specialized kimi-coding API endpoint internally — do not use generic Moonshot platform model IDs.

Available models:

Model IDNameNotes
k2p5Kimi K2.5262K ctx, multimodal — default & utility
kimi-k2-thinkingKimi K2 Thinking262K ctx, reasoning mode

Console: platform.moonshot.ai

Mistral AI

Mistral models including Devstral (coding-optimized) and Magistral (reasoning). No key prefix requirement.

config.yaml
agent:
  provider: mistral
  model: devstral-small-2507
  api_key: ${MISTRAL_API_KEY}

Available models:

Model IDNameNotes
devstral-small-2507Devstral SmallCoding, 128K ctx, $0.10/M — default
devstral-medium-latestDevstral MediumCoding, 262K ctx, $0.40/M
mistral-large-latestMistral LargeGeneral, 128K ctx, $2/M
magistral-smallMagistral SmallReasoning, 128K ctx, $0.50/M
ministral-8b-latestMinistral 8BDefault utility model

Console: console.mistral.ai

Cerebras

High-speed inference on Cerebras' Wafer-Scale Engine hardware. Key prefix: csk-. Excellent throughput for large models.

config.yaml
agent:
  provider: cerebras
  model: qwen-3-235b-a22b-instruct-2507
  api_key: ${CEREBRAS_API_KEY}

Available models:

Model IDNameNotes
qwen-3-235b-a22b-instruct-2507Qwen 3 235B131K ctx, $0.60/$1.20 — default
gpt-oss-120bGPT OSS 120BReasoning, 131K ctx, $0.25/M
zai-glm-4.7ZAI GLM-4.7131K ctx, $2.25/M
llama3.1-8bLlama 3.1 8BFast & cheap, 32K ctx, $0.10/M — default utility model

Console: cloud.cerebras.ai

ZAI (Zhipu)

GLM models from Zhipu AI (ZAI). Features FREE flash variants with 200K context. No key prefix requirement.

config.yaml
agent:
  provider: zai
  model: glm-4.7
  api_key: ${ZAI_API_KEY}

Available models:

Model IDNameNotes
glm-4.7GLM-4.7204K ctx, $0.60/$2.20 — default
glm-5GLM-5Best quality, 204K ctx, $1.00/$3.20
glm-4.6GLM-4.6204K ctx, $0.60/$2.20
glm-4.7-flashGLM-4.7 FlashFREE, 200K ctx — default utility model
glm-4.5-flashGLM-4.5 FlashFREE, 131K ctx
glm-4.5vGLM-4.5VVision, 64K ctx, $0.60/$1.80

Console: z.ai

MiniMax

MiniMax M2 and M2.5 series models with a 204K context window. No key prefix requirement.

config.yaml
agent:
  provider: minimax
  model: MiniMax-M2.5
  api_key: ${MINIMAX_API_KEY}

Available models:

Model IDNameNotes
MiniMax-M2.5MiniMax M2.5204K ctx, $0.30/$1.20 — default
MiniMax-M2.5-highspeedMiniMax M2.5 Fast204K ctx, higher throughput, $0.60/$2.40
MiniMax-M2.1MiniMax M2.1204K ctx, $0.30/$1.20
MiniMax-M2MiniMax M2196K ctx, $0.30/$1.20 — default utility model

Console: platform.minimax.io

HuggingFace

Access models hosted on HuggingFace Inference API. Model IDs use the org/model-name format. Token prefix: hf_.

Note: The environment variable is HF_TOKEN, not HUGGINGFACE_API_KEY.

config.yaml
agent:
  provider: huggingface
  model: deepseek-ai/DeepSeek-V3.2
  api_key: ${HF_TOKEN}

Available models:

Model IDNameNotes
deepseek-ai/DeepSeek-V3.2DeepSeek V3.2163K ctx, $0.28/$0.40 — default
deepseek-ai/DeepSeek-R1-0528DeepSeek R1Reasoning, 163K ctx, $3/$5
Qwen/Qwen3-235B-A22B-Thinking-2507Qwen3 235BReasoning, 262K ctx, $0.30/$3
Qwen/Qwen3-Coder-480B-A35B-InstructQwen3 Coder 480BCoding, 262K ctx, $2/$2
Qwen/Qwen3-Next-80B-A3B-InstructQwen3 Next 80B262K ctx, $0.25/$1 — default utility model
moonshotai/Kimi-K2.5Kimi K2.5262K ctx, $0.60/$3
zai-org/GLM-4.7-FlashGLM-4.7 FlashFREE, 200K ctx
zai-org/GLM-5GLM-5202K ctx, $1/$3.20

Console: huggingface.co

Cocoon Network (Decentralized)

Decentralized inference network — no API key required. Payments are made in TON. Requires the cocoon-cli proxy running locally. Teleton connects to it over a local HTTP port.

Architecture note: Cocoon uses XML-based tool injection rather than the standard JSON tool calling API. This is handled transparently inside Teleton.

config.yaml
agent:
  provider: cocoon
  model: Qwen/Qwen3-32B

cocoon:
  port: 10000  # cocoon-cli local proxy port (default: 10000)

Setup:

  1. Install and start cocoon-cli from cocoon.network
  2. Fund your Cocoon wallet with TON
  3. Set agent.provider: cocoon in config — no API key needed
  4. Configure cocoon.port if you changed the default proxy port

Available models depend on what Cocoon Network offers at runtime. Qwen/Qwen3-32B is the recommended default.

Website: cocoon.network

Local LLM

Run any OpenAI-compatible local model server: Ollama, vLLM, LM Studio, llama.cpp, or any other server that exposes the /v1/chat/completions endpoint. No API key or internet connection required.

base_url is required for this provider.

config.yaml — Ollama
agent:
  provider: local
  model: llama3.2          # model tag as known to Ollama
  base_url: http://localhost:11434/v1
config.yaml — LM Studio
agent:
  provider: local
  model: auto              # auto = discovered at runtime
  base_url: http://localhost:1234/v1
config.yaml — vLLM
agent:
  provider: local
  model: mistralai/Mistral-7B-Instruct-v0.3
  base_url: http://localhost:8000/v1
  api_key: none            # some servers require a placeholder key

Notes:

  • Setting model: auto tells Teleton to query the server's /v1/models endpoint and use the first available model.
  • Tool calling support depends on the local model and server. Not all local models support function calling.
  • For Ollama, pull the model first: ollama pull llama3.2

Model Catalog

Teleton ships with a catalog of 80+ pre-defined model IDs across all 15 providers, used by the WebUI setup wizard and CLI onboard flow. You are not limited to this list — any model ID accepted by the provider's API can be set as agent.model directly in config.yaml.

The catalog is defined in src/config/model-catalog.ts. Each entry includes the model value, display name, and a short description with context window and approximate pricing.

Using a model not in the catalog
agent:
  provider: openrouter
  model: nvidia/nemotron-nano-9b-v2  # any valid provider model ID works