P99 latency refers to the 99th percentile of latency measurements. It's a performance metric that indicates how long it takes to complete 99% of requests within a system. Discover more with Control Plane.

The difference between good and great often hinges on just a few milliseconds.

Users demand nothing short of a lightning-fast experience, quick loading times, and high availability. At the slightest indication of lag or downtime, many rush to sites like Down Detector to verify if their issue is isolated or widespread -- there's no patience for waiting. The pressure is on developers to meet and exceed these increasingly high user expectations.

Users are also pretty unforgiving when it comes to mobile apps. 53% of users will abandon an app as soon as load times exceed three seconds. And for many users, a single bad experience is all it takes -- 79% won't consider giving the app or service another try.

What is P99 latency?

Latency refers to the delay before a transfer of data begins following an instruction for its transfer -- it's essentially the time it takes for a signal to travel from one point to another.

The P99 latency metric is a valuable way to assess the responsiveness of web services and distributed systems. Put simply, it tells you the maximum time it takes for the vast majority of requests (99 out of 100) to complete, excluding the slowest few that are outliers.

This figure helps developers gauge the service experience for most users, allowing you to focus on optimizing the performance for the majority rather than being skewed by a few extreme cases.

In contexts where geo-partitioning is applied (dividing data by geographical location to improve access speed by bringing data closer to users), P99 latency helps assess how well the system distributes and accesses data across regions and whether users are experiencing efficient service, for example, in the case of AWS Regions and Availability Zones.

Key latency percentiles explained

Unlike average response times, which can mask outliers and extreme values, percentiles give a clearer picture of the user experience and system reliability. Here's a closer look at the most talked-about latency percentiles in the industry:

50th percentile (P50)

This indicates the middle point of your response times. 50% of the requests are served faster than this value, and the other 50% are slower. It's a good indicator of average performance but doesn't capture extremes.

90th percentile (P90)

This indicates that 90% of your response times are processed faster than this value. P90 starts to expose the tail end of your latency distribution.

95th percentile (P95)

At this level, you're looking at the speed at which 95% of requests are completed, with just 5% being slower. P95 gives insight into performance under more stressful conditions or peak loads.

What is the difference between P90 and P99 latency?

The core difference between P90 and P99 latency lies in their approach to outliers and what they reveal about system performance.

P90 reveals the upper limit of response times for most requests -- the typical worst-case scenario most users will encounter. Only the most extreme cases are excluded.

P99 showcases the tail end of response times, indicating that 99% of requests are processed faster than this threshold, with only 1% being slower. These are the rare but impactful instances where response times deteriorate significantly.

P99 latency formula

To figure out the P99 latency, you sort all response times from quickest to slowest and find the response time that 99% of requests are faster than. This isn't a single straightforward mathematical equation but instead involves sorting the latencies of all requests from smallest to largest and then finding the value at the 99th percentile position.

Here's a simple step-by-step way to calculate P99 latency:

For every request (N) your system or application handles, measure the time from when the request was received to when the response was fully sent.
Arrange all recorded response times in order from fastest to slowest.
Calculate the position (P) in your sorted list that represents the 99th percentile using the following calculation:

P = (99/100) x N, where N is the total number of requests

Consider a list with 1000 requests (N). Once plugged into the formula (99/100) x 1000, we find that the latency value at the 990th position represents the P99 latency.

Why do you need P99 latency?

Monitoring and optimizing P99 latency is about much more than just tracking a performance metric. It's about making a concerted effort to ensure that your application performs well on average and delivers consistently excellent experiences to almost all users in all situations.

By improving P99 latency, you can align technical strategies with broader business goals, creating applications that are not just quick but reliably so.

P99 latency can help you identify the less apparent bottlenecks in your system, whether they're in your database, network, or code. For example, if database queries are the bottleneck, P99 latency will spike during complex query execution or when accessing large datasets.
Aiming for a low P99 latency means you're committed to providing a fast and stable experience for practically everyone who uses your application. This can help retain users and attract new ones.
Optimizing your application to improve P99 latency can lead to more predictable server loads, improved performance, and potentially lower cloud costs by avoiding the need for infrastructure upgrades.
P99 latency is an excellent benchmark for testing your application's ability to handle sudden traffic spikes and other stress conditions.
For services bound by Service Level Agreements that specify response times, keeping P99 latency low is key for meeting these commitments and maintaining customer trust and brand integrity.

4 tips to improve P99 latency

1. Optimize network performance

Network performance optimization involves measures to reduce the latency inherent in data transmission over a network. This includes optimizing route paths, reducing packet loss, and managing congestion.

Implement CDN (Content Delivery Networks) -- CDNs are designed to minimize latency by caching content in multiple locations closer to your users. By deploying CDNs with global coverage, you effectively shorten the distance data needs to travel, thereby reducing delay.

Optimize TCP settings -- Adjust TCP window sizes and enable TCP Fast Open to reduce the number of round trips required to establish a connection and start data transfer. These optimizations might require consultation with network engineers to ensure they don't negatively impact other aspects of network performance.
Use Anycast routing -- Anycast routing is a network addressing and routing methodology that allows incoming requests to be directed to the nearest (in terms of routing distance) data center or server.

2. Leverage Control Plane for latency improvement

Control Plane is designed to help you manage and deploy containerized applications. It offers significant latency improvements by fine-tuning resource allocation and enabling automated scaling.

Automated Scalability -- Control Plane's serverless architecture and "Capacity AI" technology automatically scale resources to match demand without over-provisioning, directly reducing latency by preventing bottlenecks.
Global Virtual Clouds™ -- When using Control Plane, engineers can create an unlimited number of Global Virtual Clouds™ (GVC™). A GVC™ is simply a named collection of locations within one or multiple cloud regions, and even on-prem environments. Regions can span across clouds easily. These locations can encompass those from AWS, GCP, Azure, secondary cloud providers like Hetzner, Oracle, Linode, and more, as well as on-premises bare metal machines or virtual machines (VMs).
Intelligent Workload Distribution -- Through geo-routing, Control Plane routes requests to the nearest healthy cluster or location. This lowers latency (to single-digit milliseconds) by providing data access from the closest point.
Universal Cloud Identity™-- This unique technology allows workloads to leverage services across any cloud provider, enabling optimal placement and configuration of microservices to minimize latency, regardless of the underlying infrastructure.

3. Improve server and application performance

This stage involves optimizing both the hardware and software components of your server and applications to process requests more efficiently, thereby reducing response times.

Profile your application -- Use profiling tools to identify and optimize high-latency operations within your application, such as inefficient database queries or slow internal APIs.
Upgrade hardware -- Consider upgrading server components (such as SSDs for faster data access) or moving to more powerful compute instances if hardware limits are a bottleneck.
Strategic caching -- Use intelligent caching strategies to store frequently accessed data in memory, reducing the need to perform expensive operations like database reads or complex calculations on every request.

4. Reduce data transfer volumes

Minimizing the amount of data that must be transferred between the client and server can directly impact latency, as less data means faster transmission times.

Compress data -- Implement data compression techniques (such as GZIP) for both static and dynamic content to reduce the size of the data being transferred.
Optimize API calls -- Design your APIs to allow for batch requests or to support fetching only the necessary data, minimizing the overhead per request.
Use efficient data formats -- When designing APIs or transferring data between services, choose lightweight and easy-to-parse data formats (like JSON over XML).

Speed as a Service -- Improving Latency with Control Plane

Milliseconds make the difference. A responsive, lag-free experience can dramatically increase user engagement, lower frustration, and set your services apart. Tackling P99 latency is part of your commitment to excellence, aiming for nearly all your user interactions to be as snappy as possible. We've unpacked the strategies and insights needed to get your application's responsiveness exactly where you want it.

Control Plane is a powerful resource, providing the advanced tools and support needed to manage and deploy backend applications and microservices with reduced latency and incredible availability.

Explore Control Plane to learn more!

4 Tips to Improve P99 Latency