Service tiers
Different tiers of service allow you to balance availability, performance, and predictable costs based on your application’s needs.
We offer three service tiers:
- Priority Tier: Best for workflows deployed in production where time, availability, and predictable pricing are important
- Standard: Best for bursty traffic, or for when you’re trying a new idea
- Batch: Best for asynchronous workflows which can wait or benefit from being outside your normal capacity
Standard Tier
The standard tier is the default service tier for all API requests. Requests in this tier are prioritized alongside all other requests and observe best-effort availability.
Priority Tier
Requests in this tier are prioritized over all other requests to Anthropic. This prioritization allows us to provide a guarantee around the infrequency of “server overloaded” errors, even during peak times.
For more information, see Get started with Priority Tier
How requests get assigned tiers
When handling a request, Anthropic decides to assign a request to Priority Tier in the following scenarios:
- Your organization has sufficient priority tier capacity input tokens per minute
- Your organization has sufficient priority tier capacity output tokens per minute
Anthropic counts usage against Priority Tier capacity as follows:
Input Tokens
- Cache reads as 0.1 tokens per token read from the cache
- Cache writes as 1.25 tokens per token written to the cache with a 5 minute TTL
- Cache writes as 2.00 tokens per token written to the cache with a 1 hour TTL
- All other input tokens are 1 token per token
Output Tokens
- 1 token per token
Otherwise, requests proceed at standard tier.
Requests assigned Priority Tier pull from both the Priority Tier capacity and the regular rate limits. If servicing the request would exceed the rate limits, the request is declined.
Using service tiers
You can control which service tiers can be used for a request by setting the service_tier
parameter:
The service_tier
parameter accepts the following values:
"auto"
(default) - Uses the Priority Tier capacity if available, falling back to your other capacity if not"standard_only"
- Only use standard tier capacity, useful if you don’t want to use your Priority Tier capacity
The response usage
object also includes the service tier assigned to the request:
This allows you to determine which service tier was assigned to the request.
When requesting service_tier="auto"
with a model with a Priority Tier committment, these response headers provide insights:
Get started with Priority Tier
You may want to commit to Priority Tier capacity if you are interested in:
- Higher availability: 99.9% uptime SLA with prioritized computational resources
- Cost Control: Predictable spend and discounts for longer commitments
- Flexible overflow: Automatically falls back to standard tier when you exceed your committed capacity
Committing to Priority Tier will involve deciding:
- A number of input tokens per minute
- A number of output tokens per minute
- A committment duration (1, 3, 6, or 12 months)
- A specific model version
The ratio of input to output tokens you purchase matters. Sizing your Priority Tier capacity to align with your actual traffic patterns helps ensure you fully utilize all purchased tokens.
Supported models
Priority Tier is supported by:
- Claude Opus 4
- Claude Sonnet 4
- Claude Sonnet 3.7
- Claude Sonnet 3.5 (both versions)
- Claude Haiku 3.5
Check the model overview page for more details on our models.
How to access Priority Tier
To begin using Priority Tier:
- Contact sales to complete provisioning
- (Optional) Update your API requests to optionally set the
service_tier
parameter toauto
- Monitor your usage through response headers and the Anthropic Console