Monitoring Latency with Throughput

Monitoring Latency with Throughput

Higher throughput does not imply improved performance. This is a common problem when the need for an application to support more users, you provide higher concurrency and that appears to show the capability to support higher throughput. If we were to look at the Cloudwatch metrics for a RDS database, we can track DB Connections and DB Selects during a benchmark. There is no easy way to track latency. We see that with higher concurrent connections (red bar), we obtain higher select throughput (blue bar). This benchmark tests 2,4,8,16,32,64,96,128,160,192 threads.

RDS Throughput r6g.4xlarge - 192 threads

We get to a point where throughput maxes out. In this example due to a CPU bottleneck (green bar) as this reaches 100% (96 threads). However systems can hit other resource exhaustion including Disk IOPS or network throughput. The problem is, while more threads provide the same throughput, a more serious effect happens.

RDS Throughput r6g.4xlarge - CPU Threshold

What is missing from the data generally analyzed for a database vi Cloudwatch is the user experience which you would find in an alternative monitoring or observability dashboard. This is the sum of many things including the latency to execute SQL queries. If we just evaluate the SQL (Something that EvalQL does), the equal throughput, but higher threads effect has a huge impact to the user experience. If you double the threads (96 to 192), the throughput is the same, but the average latency is 2x slower, the 95th percentile is 3x slower. The primary measure for my benchmarking strategy is latency, and the goal is to identify and better manage database concurrency.

RDS Throughput Latency Increase

It is a well known fact that user retention on a website in a competitive market segment is contingent on response time, and thus latency is critical. Just today I was on a website creating a personalized product to purchase, and the site’s slow performance had me do another search and use a competitor, where I made a purchase.

Performance is about many different and overlapping measurements with latency being a critical measure. Web performance expert Sergey Chernyshev provides a detailed insight into the impact of latency on conversions for your website with his UX Speed calculator . This site has a detailed matrix of variables to tune, with 10 separate measurement outcomes to consider. UX Speed Calculator

I think Sunny Bains at PingCap summed up succinctly when saying …experience up to a 40% increase in query latency without proper scaling solutions .

TiDB Scaling

Tagged with: Benchmarking Latency Throughput

Related Posts

Using Readyset Caching with AWS RDS MySQL

Readyset is a next-generation database caching solution that offers a drop-in; no application code changes; approach to improve database performance. If you are using a legacy application where it is difficult to modify SQL statements, or the database is overloaded due to poorly-designed SQL access patterns, implementing a cache is a common design strategy for addressing database reliability and scalability issues.

Read more

#WDILTW – AWS RDS Proxy

This week I was evaluating AWS RDS Proxy . If you are familiar with the Relational Database Service (RDS) and use MySQL or Postgres, this is an additional option to consider.

Read more

RDS MySQL Aurora 3.07.0 is unusable for upgrades

Yesterday I detailed an incompatible breakage with RDS MySQL Aurora 3.06.0 , and one option stated is to upgrade to the just released 3.07.0. Turns out that does not work. It is not possible to upgrade any version of AWS RDS MySQL Aurora 3.

Read more