What alert monitoring do you use?

More importantly, how often to you confirm access to your server and database with that alert monitoring?

With a client yesterday the primary database server while still usable and serving connections for a while, but was not accessible via SSH to investigate performance issues. It eventually became non responsive and required a physical reboot. With alert monitoring for system availability only recorded every 5 minutes this was simply too long a delay.

This lead to a discussion with more questions then answers including.

  • How often should you ping your server(s), both internally and externally?
  • How often do you connect physically to your server for confirmation, e.g. a ssh keyed authentication test?
  • How often do you perform a physical database connection test?
  • How often do you do an end to end test, including http request to database query test?

As with all of these, you also want to time these operations for any deviations.

I’ve created a very simple MySQL Alert Monitoring survey. I would appreciate your input.

Comments

  1. Sayan Chaliha says

    Hi Ronald,

    I use monyog, a MySQL and Linux monitoring tool from Webyog (http://www.webyog.com/). Well, actually, I have been using monyog for a couple of years now, and I’m quite satisfied with the software.

    The best part of monyog is that it is agent-less. Having to install nothing additional on the server machine saves a lot of hassle.

    I’ve configured MONyog to check on my server every minute… and in case there’s any trouble, I get alerts in my mailbox. Monyog also monitors OS metrics, and the web-based interface provides information in a graphical format which is very easy to interpret. Monyog handles my SSH keyed authentication tests as well.

  2. says

    We have a massive array of monitoring including stuff for mysql. Every mysql server gets things like “is the process up” “can I connect” “How many connections are there?” etc monitored. The output of “SHOW GLOBAL STATUS” is monitored in some cases.

    Other tests for individual things which have been seen as problems exist:

    – Memory usage
    – Long-running transactions (seen in SHOW ENGINE INNODB STATUS)
    – Queries taking too long (although the slow log is generally enabled)
    – MyISAM tables getting nearly full

    We also monitor replication, to check that it’s up to date AND that the data are in sync (using mk-table-checksum). We check the schema are in sync too.

    This all feeds back into our Hobbit monitoring system which sends alerts to an on-call engineer who is available 24/7

    We run a lot of mysql servers (100s), in different roles in the application – not all of them require all the monitors above but they all get some monitoring.

    Additionally the web application (which connects to some of the databases) is monitored both by our Hobbit and several third party testing companies – these also alert the on-call engineer.

  3. Vishal Rathi says

    Ronald,

    I am a great fan of “MONyog”(agent-less monitoring tool) and am using it for a long time now.

    I feel it’s a great tool and has saved many sleepless nights of mine.

    Its “Error Log Monitoring” feature monitors the MySQL Error logs and sends alerts through mails or SNMP traps for the errors that are logged in.

    Cheers for MONyog…

Trackbacks