Messages

Available on all Portkey plans.

The Messages API is Anthropic’s native format for interacting with Claude models. Portkey extends it to work with all providers — use the Anthropic SDK pointed at Portkey’s base URL, and switch between providers by changing the model string.

Quick Start

Use the Anthropic SDK with Portkey’s base URL. The @provider-slug/model format routes requests to the correct provider.

import anthropic

client = anthropic.Anthropic(
    api_key="PORTKEY_API_KEY",
    base_url="https://api.portkey.ai"
)

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain quantum computing in simple terms"}]
)

print(message.content[0].text)

max_tokens is required for the Messages API. Switch model to use any provider — @openai-provider/gpt-4o, @google-provider/gemini-2.0-flash, etc.

How It Works

Portkey receives Messages API requests and translates them to each provider’s native format:

Anthropic — requests pass through directly
All other providers — Portkey’s adapter translates between Messages format and the provider’s native format

The response always comes back in Anthropic Messages format, regardless of which provider handles the request.

System Prompt

Set a system prompt with the top-level system parameter (not inside messages):

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system="You are a pirate. Always respond in pirate speak.",
    messages=[{"role": "user", "content": "Say hello."}]
)

The system parameter also accepts an array of content blocks for prompt caching:

Python

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system=[
        {"type": "text", "text": "You are an expert on this topic..."},
        {"type": "text", "text": "Here is the reference material...", "cache_control": {"type": "ephemeral"}}
    ],
    messages=[{"role": "user", "content": "Summarize the key points"}]
)

Streaming

Stream responses with stream=True in the SDK, or the stream parameter in cURL.

with client.messages.stream(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a haiku about AI"}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Tool Use

Define tools with name, description, and input_schema (note: different from Chat Completions’ parameters):

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
    tools=[{
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }]
)

for block in message.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}, Input: {block.input}")

Tool Results

Pass tool results back to continue the conversation. Tool results go in a user message with tool_result content blocks:

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What's the weather in Paris?"},
        {"role": "assistant", "content": [
            {"type": "tool_use", "id": "tool_123", "name": "get_weather", "input": {"location": "Paris"}}
        ]},
        {"role": "user", "content": [
            {"type": "tool_result", "tool_use_id": "tool_123", "content": '{"temp": "22°C", "condition": "sunny"}'}
        ]}
    ],
    tools=[{
        "name": "get_weather",
        "description": "Get weather for a location",
        "input_schema": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}
    }]
)

print(message.content[0].text)

Vision

Send images using content blocks. Supports both URLs and base64-encoded data.

# From URL
message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": [
            {"type": "image", "source": {"type": "url", "url": "https://example.com/image.jpg"}},
            {"type": "text", "text": "Describe this image"}
        ]
    }]
)

print(message.content[0].text)

Extended Thinking

Enable extended thinking for complex reasoning tasks. Requires max_tokens greater than budget_tokens.

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Analyze the implications of quantum computing on cryptography"}]
)

for block in message.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"Response: {block.text}")

Extended thinking output counts toward max_tokens. Set max_tokens high enough to accommodate both thinking and the final response.

Prompt Caching

Use cache_control on system prompts, messages, and tool definitions to cache frequently-used content.

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    system=[{
        "type": "text",
        "text": "You are an expert analyst. Here is a very long reference document...",
        "cache_control": {"type": "ephemeral"}
    }],
    messages=[{"role": "user", "content": "Summarize the key points"}]
)

Cached content is reused across requests, reducing latency and costs. Cache usage is reflected in the response usage object.

Multi-turn Conversations

Build conversations by passing the full message history. Messages must alternate between user and assistant roles.

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "My name is Alice."},
        {"role": "assistant", "content": "Hello Alice! How can I help you?"},
        {"role": "user", "content": "What is my name?"}
    ]
)

print(message.content[0].text)  # "Your name is Alice."

Using with Portkey Features

The Messages API works with all Portkey gateway features. Pass Portkey-specific headers alongside the Anthropic request:

import anthropic

client = anthropic.Anthropic(
    api_key="PORTKEY_API_KEY",
    base_url="https://api.portkey.ai",
    default_headers={
        "x-portkey-config": "pp-config-xxx"  # Config with fallbacks, load balancing, etc.
    }
)

message = client.messages.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}]
)

Configs — Route, load balance, and set fallbacks
Caching — Cache responses for faster, cheaper calls
Guardrails — Add input/output guardrails
Observability — Full logging and tracing

API Reference

Messages — POST /v1/messages

Anthropic API Docs

Anthropic specification

API Reference

Portkey Messages API reference

Universal API

All three API formats

Anthropic Integration

Anthropic-specific setup

Introduction

Product

Self-Hosting

Support

Quick Start

How It Works

System Prompt

Streaming

Tool Use

Tool Results

Vision

Extended Thinking

Prompt Caching

Multi-turn Conversations

Using with Portkey Features

API Reference

Anthropic API Docs

API Reference

Universal API

Anthropic Integration

Introduction

Product

Self-Hosting

Support

​Quick Start

​How It Works

​System Prompt

​Streaming

​Tool Use

​Tool Results

​Vision

​Extended Thinking

​Prompt Caching

​Multi-turn Conversations

​Using with Portkey Features

​API Reference

Anthropic API Docs

API Reference

Universal API

Anthropic Integration

Quick Start

How It Works

System Prompt

Streaming

Tool Use

Tool Results

Vision

Extended Thinking

Prompt Caching

Multi-turn Conversations

Using with Portkey Features

API Reference