Skip to main content
Available on all Portkey plans.
Open Responses is an open-source specification for multi-provider, interoperable LLM interfaces based on the OpenAI Responses API. It defines a shared schema for calling language models, streaming results, and composing agentic workflows — independent of provider. Portkey is fully Open Responses compliant. The Responses API works with every provider and model in Portkey’s catalog — including Anthropic, Gemini, Bedrock, and 60+ other providers that don’t natively support it.

Why Responses API

The Responses API is becoming the standard for agentic AI:
  • Agentic loops — Models emit tool calls, receive results, and continue autonomously
  • Items as atomic units — Clear state machines for context management
  • Semantic streaming — Predictable, provider-agnostic streaming events
  • Unified tool calling — Consistent function calling interface across all providers
Previously, the Responses API only worked with OpenAI. Portkey extends it to all providers.

Quick Start

Send a Responses API request to any provider. Change the model string — the API format stays the same.
from portkey_ai import Portkey

portkey = Portkey(api_key="PORTKEY_API_KEY")

response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="Explain quantum computing in simple terms"
)

print(response.output_text)
The same code works for any provider. Switch the @provider/model string to use OpenAI, Gemini, Groq, Bedrock, or any of the 3000+ supported models. See Model Catalog for setup.

Using the OpenAI SDK

The Portkey SDK is a superset of the OpenAI SDK, so portkey.responses.create() works identically. The OpenAI SDK also works directly with Portkey’s base URL:
from openai import OpenAI

client = OpenAI(
    api_key="PORTKEY_API_KEY",
    base_url="https://api.portkey.ai/v1"
)

response = client.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="Explain quantum computing in simple terms"
)

print(response.output_text)

Text Generation

Instructions (System Prompt)

Set a system prompt with instructions or pass a system role message in the input array. Both work identically.
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    instructions="You are a pirate. Always respond in pirate speak.",
    input="Say hello."
)

Streaming

Enable streaming with stream: true.
from portkey_ai import Portkey

portkey = Portkey(api_key="PORTKEY_API_KEY")

stream = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="Write a haiku about AI",
    stream=True
)

for event in stream:
    if hasattr(event, 'delta'):
        print(event.delta, end="", flush=True)
Portkey’s adapter produces the same SSE event stream format (response.created, response.output_text.delta, response.completed, etc.) regardless of the underlying provider.

Multi-turn Conversations

Pass previous messages in the input array for multi-turn conversations. Two formats are supported:
# Shorthand: just role + content
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input=[
        {"role": "user", "content": "My name is Alice."},
        {"role": "assistant", "content": "Hello Alice! How can I help you?"},
        {"role": "user", "content": "What is my name?"}
    ]
)
Supported roles: user, assistant, developer (maps to system), system, and tool.

Generation Parameters

Control generation behavior with optional parameters:
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="Write a creative story",
    max_output_tokens=2048,
    temperature=0.8,
    top_p=0.95,
    parallel_tool_calls=True
)
ParameterTypeDescription
max_output_tokensintegerMaximum tokens in the response. If not set, uses the provider’s default
temperaturefloatSampling temperature (0-2). Higher = more creative
top_pfloatNucleus sampling threshold (0-1)
parallel_tool_callsbooleanAllow multiple tool calls in a single response (default: true)
userstringEnd-user identifier for abuse tracking
metadataobjectArbitrary metadata to attach to the response

Tool Calling

Define tools with the Responses API function tool format. Works across all providers that support function calling.
from portkey_ai import Portkey

portkey = Portkey(api_key="PORTKEY_API_KEY")

response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="What's the weather in San Francisco?",
    tools=[{
        "type": "function",
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }]
)

print(response.output)

Tool Choice

Control tool usage with tool_choice:
ValueBehavior
"auto"Model decides whether to call a tool (default)
"none"Model will not call any tools
"required"Model must call at least one tool
{"type": "function", "name": "get_weather"}Force a specific tool
Python
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="What's the weather in San Francisco?",
    tools=[...],
    tool_choice="required"
)

Function Call Results

Return function call results in a multi-turn flow with function_call and function_call_output items:
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input=[
        {"role": "user", "content": "What's the weather in Paris?"},
        {"type": "function_call", "name": "get_weather", "call_id": "call_123", "arguments": '{"location": "Paris"}'},
        {"type": "function_call_output", "call_id": "call_123", "output": '{"temp": "22°C", "condition": "sunny"}'}
    ],
    tools=[{
        "type": "function",
        "name": "get_weather",
        "description": "Get weather for a location",
        "parameters": {"type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"]}
    }]
)

Input Types

Vision

Send images with the input_image content type. The optional detail parameter ("high", "low", "auto") controls processing fidelity.
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_image", "image_url": "https://example.com/image.jpg", "detail": "high"},
            {"type": "input_text", "text": "Describe this image"}
        ]
    }]
)

File Inputs

Send files with the input_file content type. Pass file data as a base64-encoded data URL or reference an existing file by file_id.
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_file", "filename": "report.pdf", "file_data": "data:application/pdf;base64,JVBERi0xLjQ..."},
            {"type": "input_text", "text": "Summarize this document"}
        ]
    }]
)

Structured Output

Control output format with text.format. Supports json_schema for strict structured output and json_object for free-form JSON.

JSON Schema

response = portkey.responses.create(
    model="@openai-provider/gpt-4.1",
    input="Extract the name and age from: John is 30 years old.",
    text={
        "format": {
            "type": "json_schema",
            "name": "person",
            "schema": {
                "type": "object",
                "properties": {
                    "name": {"type": "string"},
                    "age": {"type": "integer"}
                },
                "required": ["name", "age"]
            }
        }
    }
)

JSON Object

For free-form JSON output without a strict schema:
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="List 3 programming languages and their main use cases as JSON",
    text={"format": {"type": "json_object"}}
)

Reasoning and Thinking

Control reasoning with the unified reasoning parameter. Portkey maps this to each provider’s native thinking mechanism automatically.

Reasoning Effort

reasoning.effort controls how much the model reasons before responding. Works across OpenAI, Anthropic, and Gemini — Portkey translates to each provider’s native format.
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="Solve this step by step: What is 127 * 43?",
    reasoning={"effort": "high"}
)
Portkey maps reasoning.effort to each provider’s native thinking configuration:
EffortBudget Tokens
low1,024
medium8,192
high16,384
xhigh32,768
EffortBudget Tokens
low1,024
medium8,192
high24,576
Passed through as reasoning_effort natively. No translation needed.

Anthropic Extended Thinking

For fine-grained control over Anthropic’s extended thinking, pass the thinking parameter directly with an exact budget_tokens value. This takes precedence over reasoning.effort if both are set.
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="Analyze the implications of quantum computing on cryptography",
    thinking={"type": "enabled", "budget_tokens": 10000}
)

Prompt Caching

Enable prompt caching with cache_control on content items and tools. Works with Anthropic and other compatible providers.
response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input=[{
        "role": "user",
        "content": [
            {"type": "input_text", "text": "Here is a long document to analyze...", "cache_control": {"type": "ephemeral"}},
            {"type": "input_text", "text": "Summarize the key points"}
        ]
    }]
)
Cache control also works on tool definitions:
Python
tools=[{
    "type": "function",
    "name": "search",
    "description": "Search the knowledge base",
    "parameters": {"type": "object", "properties": {"query": {"type": "string"}}},
    "cache_control": {"type": "ephemeral"}
}]

Using with Portkey Features

The Responses API works with all Portkey gateway features. Pass features through headers or the config parameter:
portkey = Portkey(
    api_key="PORTKEY_API_KEY",
    config="pp-config-xxx"  # Your Portkey config with fallbacks, load balancing, etc.
)

response = portkey.responses.create(
    model="@anthropic-provider/claude-sonnet-4-5-20250514",
    input="Hello!"
)

Provider Support

Portkey handles the Responses API in two ways depending on the provider:
  • Native providers — Requests pass through directly to the provider’s Responses API endpoint
  • Adapter providers — Portkey automatically translates between Responses API and Chat Completions formats
This translation is transparent — the response format is identical regardless of which provider handles the request. Native providers: OpenAI, Azure OpenAI, Grok (x.ai), Groq, OpenRouter, Azure AI Adapter providers: Anthropic, Google Gemini, Google Vertex AI, AWS Bedrock, Mistral AI, Together AI, and all other providers All features documented on this page — text generation, streaming, tool calling, reasoning, vision, structured output, prompt caching, and multi-turn conversations — work with both native and adapter providers.

Native-Only Features

A few features require server-side state and are limited to native providers:
  • previous_response_id — Use multi-turn conversations to pass history in the input array instead
  • store — Silently ignored on adapter providers; responses are not persisted server-side
  • Retrieve / DeleteGET and DELETE on /v1/responses/:id are not available
  • Built-in toolsweb_search, file_search, computer_use are native-only. Use Remote MCP or custom function tools instead
Everything else — text generation, streaming, instructions, tool calling, structured output, reasoning, vision, file inputs, prompt caching, and multi-turn conversations — works with every provider.

Reference

Supported Parameters

Complete list of parameters supported by the Responses API and how they map internally:
Responses API ParameterChat Completions EquivalentNotes
inputmessagesString, message array, or typed input items
instructionsSystem message (prepended)Added as the first message with role: system
modelmodelUse @provider/model format or set provider via headers
streamstreamSSE events are translated to Responses API format
max_output_tokensmax_tokensOnly sent if explicitly set
temperaturetemperatureDirect mapping
top_ptop_pDirect mapping
toolstoolsOnly function type is supported
tool_choicetool_choiceauto, none, required, or specific function
parallel_tool_callsparallel_tool_callsDirect mapping
text.formatresponse_formatjson_schema and json_object supported
reasoning.effortProvider-specificMapped to thinking for Anthropic/Gemini, reasoning_effort for OpenAI
thinkingthinkingAnthropic-specific passthrough
top_logprobstop_logprobsDirect mapping
useruserDirect mapping
metadatametadataDirect mapping
Content TypeDescription
input_textText content (maps to text in Chat Completions)
input_imageImage URL with optional detail parameter
input_fileFile with filename and file_data (base64 data URL) or file_id
function_callTool call record with name, call_id, and arguments
function_call_outputTool result with call_id and output
Output TypeDescription
messageText response with output_text content
function_callTool call with name, call_id, arguments, and status
reasoningThinking/reasoning content with summary array (from providers that support it)

API Endpoints

Last modified on February 12, 2026