Everything fails, Monitor Everything

Ronald Bradford
May 19, 2007

From the recent [MySQL Conference][1] a number of things resonate strongly almost daily with me. These included:

Guy Kawasaki – Don’t let the bozos grind you down.
- Boy, the bozos have ground me down this week. I slept for 16 hrs today, the first day of solid rest in 3 weeks.
Paul Tuckfield – YouTube and his various caching tip insights.
- I’ve seen the promising results of Paul Tuckfield’s comment of pre-fetching for Replication written recently by Farhan .
Ramus – SSL is not secure — This still really scares me.
- How do I tell rather computer illiterate friends about running multiple browsers, clearing caches, never visiting SSL sites after other sites that are insecure etc.
Everything fails, Monitor Everything – Google

What I’ve been working on most briefly lately, and really want to be far more prepared everywhere I go is Monitor Everything.

It’s so easy on site to just do a vmstat 1 in one session and a mysqladmin -r -i extended-status | grep -v ” | 0 “ in another, and you may observe a trend, make some notes, say 25% CPU, 3000 Selects, 4000 Insert/Updates per second etc, but the problem is, the next day you don’t have actual figures to compare. What was the table_lock_waits yesterday, they seem high today.

I also only found a problem on a site when I graphed the results. I’ll give you a specific example. The average CPU for the system was 55%, the target was 50%. When graphing the CPU, it was plainly obvious something was not right. I could see with extremely regularity (and count 12 in one hour) a huge CPU spike for a second or two. It was so regular in the graph it was not possible it was random. So, after further investigation and testing, a 5 minute job on this production server (and not on previous testing servers) took 25% CPU for a second or two, and a huge amount of Page Faults. Did it effect the overall impact of the performance of the system. I don’t know, but it was a significant anonomoly that required investigation.

So, quite simply, always monitor and record so you can later reference, even if you don’t process the raw figures at the time. The question is then, “What do I monitor”. Answer, monitor everything.

The problem is with most monitoring, e.g. vmstat and mysqladmin is the lack of a timestamp for easy comparison. It’s really, really annoying that you can add this to the line output. The simple solution is to segment your data into both manageable chunks and consistent chunks.

Manageable chucks can be as easy as creating log files hourly, ensuring the start exactly at the top of the hour. Use a YYYYMMDD.HHMMSS naming standard for all files and you can never go wrong.
Consistent chunks is to ensure you start all manageable monitor (e.g. hourly) at the exact same time, so you can compare.

You need to monitor at least the following:

vmstat
mysqladmin extended-status
mysqladmin processlist
mysqladmin variables
mysqladmin -r -i [n] extended-status | grep -v ” | 0 “

I haven’t found an appropriate network monitoring, but you should also at that.

The issue here is frequency. Here are some guidelines. vmstat every 5 seconds. extended-status and processlist every 30 seconds, variables every hour, and extended-status differences is difficult, but it saves a lot of number crunching later for quick analysis. I do it every second, but not all the time, you need to work out a trigger to enable, or to say run it for 30 seconds every 15-30 minutes.

So in one hour I could have:

20070519.160000.os.vmstat.log
20070519.160000.mysql.variables.log
20070519.160000.mysql.status.log
20070519.160000.mysql.processlist.log
20070519.160500.mysql.status.increment.log, then 1610, 1620, 1630 etc

I have my own scripts for monitoring under development, and I’ve been revising slowly, particularly to be able to load data into a MySQL database so I can easily use SQL for analysis. One thing I actually do is parse files into CSV for easy loading.

There are two tools out there that I’m reviewing and you should look at. Mark Leith has written a Aggregated SHOW STATUS stat pack, and there is also tool called mysqlreport . These both go some what to ultimately what I want.

I haven’t used it yet, but I’ve seen and been very impressed with the simplicity of Munin for graphing. I really need to get some free time to get this operational.

So Monitor Everything and Graph Everything. Plenty of work to do here.

Tagged with: Databases MySQL MySQL Conference &Amp; Expo 2007

MySQL and Heatwave Summit Presentation

Ronald Bradford
April 30, 2025

Last week I had the opportunity to speak at the MySQL and Heatwave Summit in San Francisco. I discussed the impact of the new MySQL 8.0 default caching_sha2_password authentication, replacing the mysql_native_password authentication that was the default for approximately 20 of the 30 years that MySQL has existed.

Readyset QueryPilot Announcement

Ronald Bradford
April 22, 2025

At the MySQL and Heatwave Summit 2025 today, Readyset announced a new data systems architecture pattern named Readyset QueryPilot . This architecture which can front a MySQL or PostgreSQL database infrastructure, combines the enterprise-grade ProxySQL and Readyset caching with intelligent query monitoring and routing to help support applications scale and produce more predictable results with varied workloads.

More CPUs or Newer CPUs

Ronald Bradford
April 2, 2025

In a CPU-bound database workload, regardless of price, would you scale-up or scale-new? What if price was the driving factor, would you scale-up or scale-new? I am using as a baseline the first available AWS Graviton2 processor for RDS (r6g).

Everything fails, Monitor Everything

Related Posts

MySQL and Heatwave Summit Presentation

Readyset QueryPilot Announcement

More CPUs or Newer CPUs