OpenRouter Responses API Chat

OpenRouter Responses API Chat - Help & Guide

Quick Start

Add OpenRouter API Key: Go to Settings → Add API Key section (get your key from openrouter.ai)
Browse Models: Go to Settings → Browse OpenRouter Models → Click "Load Models" to see all available models
Add Models: Click any model from the browser to auto-populate the Add New Model form, or manually add model configuration
Select Models: Click "Select Models" button, check the models you want to use. Use the number input to select multiple instances of the same model (1-10).
Choose Mode & Iterations: In model picker, select Parallel/Serial/Rotating/Autonomous and set how many iterations (1-100)
Chat: Type your message and click Send!

Model Browser

The Browse OpenRouter Models section fetches the live model catalog from OpenRouter:

Load Models: Click to fetch all available models
Search: Filter models by name, ID, or description
Sort: Sort by date (newest first), name, or context length
Select: Click any model to auto-fill the Add New Model form

Each model shows context length, pricing, and release date to help you choose.

Model Configuration

Each model is defined by a JSON configuration with NO outer braces. Example:

"endpoint": "https://openrouter.ai/api/v1/responses",
"model_id": "anthropic/claude-3.5-sonnet",
"api_key_id": "123456",
"system_prompt": "",
"max_output_tokens": null,
"temperature": 1.0

Available OpenRouter Models

You can use any model from OpenRouter that supports the Responses API. Popular options include:

anthropic/claude-3.5-sonnet - Claude 3.5 Sonnet
openai/gpt-4o - GPT-4o
google/gemini-pro - Google Gemini Pro
meta-llama/llama-3-70b-instruct - Llama 3 70B
...and many more at openrouter.ai/models

Per-Model System Prompts

Set "system_prompt": "Your prompt here" in each model's config to give it a unique personality.

Global System Prompt Override

Use the "Active System Prompt" dropdown in the model picker to temporarily override all models' prompts.

Multi-Model Modes

Parallel: All selected models respond independently to your message. Each only sees your messages, not other models' responses.
Serial: Models respond one after another. Each model sees all previous messages including other models' responses.
Rotating: Like Serial, but the order rotates each turn (1,2,3 → 3,1,2 → 2,3,1 → repeat).
Autonomous: Models converse with each other without human input. Click "Start Auto" to begin.

Iterations

You can set the number of iterations (1-100) to control how many times models reply:

Parallel: All models respond N times in parallel (e.g., 2 iterations = 2 rounds of all models responding simultaneously)
Serial: The complete sequence of models responds N times (e.g., 3 iterations with 2 models = model1, model2, model1, model2, model1, model2)
Rotating: The rotated sequence repeats N times with order continuing to rotate each iteration
Autonomous: Each model responds exactly N times (e.g., 10 iterations with 3 models = 30 total responses, 10 per model)

Flattened Participant Structure

In multi-model conversations (Serial, Rotating, Autonomous), all participants are treated equally with no hierarchy:

Equal Participants: Your messages are prefixed with your identifier (e.g., "User said: hello"), just like model messages (e.g., "Claude said: hi").
No Special Treatment: Models receive no indication that you are different from other participants. Everyone is simply a named participant in the conversation.
Customizable Identity: Set your identifier in Settings → User Identifier (default: "Human"). Use any name you prefer to shape your role in the conversation.

Mode-Specific Behavior:

Parallel: Each model converses independently with you. Your messages are not prefixed since there's only one other participant.
Serial/Rotating: All participants (you and all models) are listed equally. Messages are prefixed with participant identifiers.
Autonomous: Models converse with each other without your participation.

Multiple Instances: You can select multiple instances of the same model by using the number input next to each model in the picker. The model name is simply a chat identifier to distinguish between different instances - multiple instances of the same underlying model can participate with different names (e.g., "GPT-4 Debater 1" and "GPT-4 Debater 2").

About OpenRouter Responses API

This app uses OpenRouter's Responses API, providing access to 100+ AI models through a modern streaming API:

Anthropic: Claude models (Sonnet, Opus, Haiku)
OpenAI: GPT-4o, GPT-4, O1, and more
Google: Gemini models
Meta: Llama models
Mistral, Cohere, and many others

Visit openrouter.ai to get your API key and browse available models.

Advanced Features

Edit AI Responses

Click "Edit" button below any AI response to manually modify it before continuing the conversation.

Regenerate Responses

Click "Regenerate" to have the same model create a different response.

Continue Responses

Click "Continue" on any AI response to have the model extend/finish that response. Useful when a response was cut off or you want more detail.

Prefill Mode

Click "Prefill" to start typing the beginning of an AI response. The model will continue from where you left off. Great for guiding the model's response style or format.

Edit Names

Click on any participant name (user or model) in the chat to edit it. This lets you customize how participants are identified in the conversation.

Role Switching

Click "→ User" or "→ Assistant" buttons next to any message to convert it between user and assistant roles. This is useful for restructuring conversations or creating custom dialogue flows.

Edit User Messages

Click "Edit" button on your own messages to revise and branch the conversation from that point.

File Attachments

Click the "Attach" button to attach files. Supported file types:

Images: Sent as visual data to vision-capable models
PDFs: Text content is automatically extracted and sent to the model
Text files: Content is read and sent to the model (.txt, .md, .json, .csv, .xml, code files, etc.)
Other files: File name is mentioned (content extraction not supported)

Files are kept in context and will be included when you "Edit & Resend" messages.

Image Generation

The application supports receiving images from image generation models:

Use models with "image" in their output modalities (check model browser or openrouter.ai/models)
Request image generation in your message (e.g., "draw a sunset")
Generated images appear automatically in the response
Multiple images per response are supported
Images are displayed inline below the text content

OpenRouter Responses API Parameters (All Optional)

Customize your model behavior with these parameters (set to null to use defaults):

Sampling Parameters

temperature (0.0-2.0): Randomness/creativity level. Lower = more deterministic, higher = more creative.
max_output_tokens: Maximum tokens in response. Set based on expected response length.
top_p (0.0-1.0): Nucleus sampling - limits choices to tokens whose probabilities sum to P.
top_k: Limits choices to top K most likely tokens. Value of 1 = always pick most likely.
frequency_penalty (-2.0 to 2.0): Reduce word repetition based on frequency in text.
presence_penalty (-2.0 to 2.0): Encourage topic diversity by penalizing tokens already present.
seed: Integer for deterministic/reproducible outputs (not guaranteed for all models).
stop: Array of sequences that will stop generation (e.g., ["END", "\n\n"]).

Tool Calling

tools: Array of function definitions the model can call. Each tool has type, name, description, and parameters.
tool_choice: Control tool usage - "auto" (model decides), "none" (disabled), "required" (must use), or specific tool object.
parallel_tool_calls: Boolean to allow/disallow simultaneous tool invocations.

Output Format

text: Configure output format. Options:
- {"format": {"type": "text"}} - Default text output
- {"format": {"type": "json_object"}} - JSON mode
- {"format": {"type": "json_schema", "name": "...", "schema": {...}}} - Structured output
- {"verbosity": "low"|"medium"|"high"} - Control response length

Reasoning/Extended Thinking

reasoning: Configure extended thinking for supported models (O1, Claude, etc.):
- effort: "none" | "minimal" | "low" | "medium" | "high" | "xhigh"
- summary: "auto" | "concise" | "detailed"
- max_tokens: Limit reasoning tokens
- enabled: Boolean to enable/disable

Provider Routing

provider: Control which providers handle your request:
- allow_fallbacks: Boolean (default: true) - Allow fallback to other providers
- require_parameters: Boolean - Only route to providers supporting all params
- data_collection: "allow" | "deny" - Control data collection
- zdr: Boolean - Restrict to Zero Data Retention endpoints
- order: Array of provider slugs for preference order
- only: Array of provider slugs to exclusively use
- ignore: Array of provider slugs to avoid
- quantizations: Array of acceptable quantizations ("int4", "int8", "fp8", "fp16", etc.)
- sort: "price" | "throughput" | "latency" - How to rank providers
- max_price: {prompt, completion, image, request} - Price limits per million tokens

Plugins

plugins: Array of optional plugins:
- {"id": "web"} - Web search plugin with options: enabled, max_results, search_prompt, engine ("native"|"exa")
- {"id": "moderation"} - Content moderation plugin
- {"id": "file-parser"} - File parsing with pdf engine options ("mistral-ocr"|"pdf-text"|"native")
- {"id": "response-healing"} - Auto-fix malformed responses

Tracking & Storage

user: End-user identifier (max 128 chars) for abuse tracking.
session_id: Unique ID (max 128 chars) for grouping related requests.
store: Boolean - Store conversation server-side for future reference.
metadata: Object with custom key-value string pairs for your tracking.

Other

timeout: Request timeout in milliseconds (default: 120000 = 2 minutes). Increase for reasoning models.

Data Storage

All data is stored locally in your browser using IndexedDB:

Storage Capacity: 50 MB to several GB (varies by browser), much larger than typical localStorage limits
Images Supported: Store conversations with multiple images without quota issues
Mobile Friendly: Works reliably on mobile devices including iPhone/iPad
Privacy: All data stays on your device, nothing sent to servers except OpenRouter API calls

Data Management

Export Data

Settings → Export Data creates a JSON file with all your models, API keys, prompts, and current conversation.

Import Data

Settings → Import Data loads a previously exported JSON file.

Clear History

History → Clear All History removes all saved conversations.

Tips & Best Practices

API Keys: Create separate API key entries for different projects/purposes
System Prompts: Save reusable prompts (e.g., "Coding Assistant", "Creative Writer")
Model Names: Use descriptive names like "GPT-4 Creative" vs "GPT-4 Analytical"
Temperature: Lower (0.0-0.7) for factual/coding, higher (0.8-1.5) for creative
Modes: Use Parallel for comparing responses, Serial for debate/discussion

Troubleshooting

Messages not sending: Check browser console (F12) for errors, verify API key is correct

Images not working: Ensure you're using a vision-capable model (Claude 3+, GPT-4V, Gemini Pro Vision). Images are always sent with the most recent message.

Mode not updating: Hard refresh (Ctrl+Shift+R or Cmd+Shift+R)

Model deleted: Can't regenerate old messages from deleted models

Multiple images on mobile: IndexedDB storage allows unlimited images even on mobile devices like iPhone

Source Code & License

This application is open source and available on GitHub:

https://github.com/mbbrinkman/ResponsesAPIChat

License: CC0 (Public Domain)

You are free to use, modify, distribute, and do basically whatever you want with this code. No attribution required.

Select Models

Conversation History