Available on all Portkey plans.
Why Responses API
The Responses API is becoming the standard for agentic AI:- Agentic loops — Models emit tool calls, receive results, and continue autonomously
- Items as atomic units — Clear state machines for context management
- Semantic streaming — Predictable, provider-agnostic streaming events
- Unified tool calling — Consistent function calling interface across all providers
Quick Start
Send a Responses API request to any provider. Change themodel string — the API format stays the same.
The same code works for any provider. Switch the
@provider/model string to use OpenAI, Gemini, Groq, Bedrock, or any of the 3000+ supported models. See Model Catalog for setup.Using the OpenAI SDK
The Portkey SDK is a superset of the OpenAI SDK, soportkey.responses.create() works identically. The OpenAI SDK also works directly with Portkey’s base URL:
Text Generation
Instructions (System Prompt)
Set a system prompt withinstructions or pass a system role message in the input array. Both work identically.
Streaming
Enable streaming withstream: true.
response.created, response.output_text.delta, response.completed, etc.) regardless of the underlying provider.
Multi-turn Conversations
Pass previous messages in theinput array for multi-turn conversations. Two formats are supported:
user, assistant, developer (maps to system), system, and tool.
Generation Parameters
Control generation behavior with optional parameters:| Parameter | Type | Description |
|---|---|---|
max_output_tokens | integer | Maximum tokens in the response. If not set, uses the provider’s default |
temperature | float | Sampling temperature (0-2). Higher = more creative |
top_p | float | Nucleus sampling threshold (0-1) |
parallel_tool_calls | boolean | Allow multiple tool calls in a single response (default: true) |
user | string | End-user identifier for abuse tracking |
metadata | object | Arbitrary metadata to attach to the response |
Tool Calling
Define tools with the Responses APIfunction tool format. Works across all providers that support function calling.
Tool Choice
Control tool usage withtool_choice:
| Value | Behavior |
|---|---|
"auto" | Model decides whether to call a tool (default) |
"none" | Model will not call any tools |
"required" | Model must call at least one tool |
{"type": "function", "name": "get_weather"} | Force a specific tool |
Python
Function Call Results
Return function call results in a multi-turn flow withfunction_call and function_call_output items:
Input Types
Vision
Send images with theinput_image content type. The optional detail parameter ("high", "low", "auto") controls processing fidelity.
File Inputs
Send files with theinput_file content type. Pass file data as a base64-encoded data URL or reference an existing file by file_id.
Structured Output
Control output format withtext.format. Supports json_schema for strict structured output and json_object for free-form JSON.
JSON Schema
JSON Object
For free-form JSON output without a strict schema:Reasoning and Thinking
Control reasoning with the unifiedreasoning parameter. Portkey maps this to each provider’s native thinking mechanism automatically.
Reasoning Effort
reasoning.effort controls how much the model reasons before responding. Works across OpenAI, Anthropic, and Gemini — Portkey translates to each provider’s native format.
reasoning.effort to each provider’s native thinking configuration:
Anthropic — maps to thinking.budget_tokens
Anthropic — maps to thinking.budget_tokens
| Effort | Budget Tokens |
|---|---|
low | 1,024 |
medium | 8,192 |
high | 16,384 |
xhigh | 32,768 |
Gemini 2.5 — maps to thinking_config.thinking_budget
Gemini 2.5 — maps to thinking_config.thinking_budget
| Effort | Budget Tokens |
|---|---|
low | 1,024 |
medium | 8,192 |
high | 24,576 |
OpenAI o-series
OpenAI o-series
Passed through as
reasoning_effort natively. No translation needed.Anthropic Extended Thinking
For fine-grained control over Anthropic’s extended thinking, pass thethinking parameter directly with an exact budget_tokens value. This takes precedence over reasoning.effort if both are set.
Prompt Caching
Enable prompt caching withcache_control on content items and tools. Works with Anthropic and other compatible providers.
Python
Using with Portkey Features
The Responses API works with all Portkey gateway features.Configs
Route, load balance, and set fallbacks
Caching
Cache responses for faster, cheaper calls
Guardrails
Input/output guardrails
Observability
Full logging and tracing
Provider Support
Portkey handles the Responses API in two ways depending on the provider:- Native providers — Requests pass through directly to the provider’s Responses API endpoint
- Adapter providers — Portkey automatically translates between Responses API and Chat Completions formats
Native-Only Features
A few features require server-side state and are limited to native providers:previous_response_id— Use multi-turn conversations to pass history in theinputarray insteadstore— Silently ignored on adapter providers; responses are not persisted server-side- Retrieve / Delete —
GETandDELETEon/v1/responses/:idare not available - Built-in tools —
web_search,file_search,computer_useare native-only. Use Remote MCP or custom function tools instead
Reference
Supported Parameters
Complete list of parameters supported by the Responses API and how they map internally:| Responses API Parameter | Chat Completions Equivalent | Notes |
|---|---|---|
input | messages | String, message array, or typed input items |
instructions | System message (prepended) | Added as the first message with role: system |
model | model | Use @provider/model format or set provider via headers |
stream | stream | SSE events are translated to Responses API format |
max_output_tokens | max_tokens | Only sent if explicitly set |
temperature | temperature | Direct mapping |
top_p | top_p | Direct mapping |
tools | tools | Only function type is supported |
tool_choice | tool_choice | auto, none, required, or specific function |
parallel_tool_calls | parallel_tool_calls | Direct mapping |
text.format | response_format | json_schema and json_object supported |
reasoning.effort | Provider-specific | Mapped to thinking for Anthropic/Gemini, reasoning_effort for OpenAI |
thinking | thinking | Anthropic-specific passthrough |
top_logprobs | top_logprobs | Direct mapping |
user | user | Direct mapping |
metadata | metadata | Direct mapping |
Input Content Types
Input Content Types
| Content Type | Description |
|---|---|
input_text | Text content (maps to text in Chat Completions) |
input_image | Image URL with optional detail parameter |
input_file | File with filename and file_data (base64 data URL) or file_id |
function_call | Tool call record with name, call_id, and arguments |
function_call_output | Tool result with call_id and output |
Response Output Types
Response Output Types
| Output Type | Description |
|---|---|
message | Text response with output_text content |
function_call | Tool call with name, call_id, arguments, and status |
reasoning | Thinking/reasoning content with summary array (from providers that support it) |
API Endpoints
- Create a Response —
POST /v1/responses - Retrieve a Response —
GET /v1/responses/{response_id} - Delete a Response —
DELETE /v1/responses/{response_id} - List Input Items —
GET /v1/responses/{response_id}/input_items

