I had the opportunity to review a client’s production Drupal installation recently. This is a new site and traffic is just starting to pick up. Drupal is a popular LAMP stack open source CMS system using the MySQL Database.
Unfortunately I don’t always have the chance to focus on one product when consulting, sometimes the time can be minutes to a few hours. Some observations from looking at Drupal.
Disk footprint
Presently, volume and content is of a low volume, but expecting to ramp up. I do however find 90% of disk volume in one table called ‘watchdog’;
+--------------+--------------+--------------+-------------+--------+
| table_schema | total_mb | data_mb | index_mb | tables |
+--------------+--------------+--------------+-------------+--------+
| xxxxx | 812.95555878 | 745.34520721 | 67.61035156 | 191 |
+--------------+--------------+--------------+-------------+--------+
+-------------------------------------------+--------+------------+------------+----------------+--------------+--------------+-------------+
| table_name | engine | row_format | table_rows | avg_row_length | total_mb | data_mb | index_mb |
+-------------------------------------------+--------+------------+------------+----------------+--------------+--------------+-------------+
| watchdog | MyISAM | Dynamic | 63058 | 210 | 636.42242813 | 607.72516251 | 28.69726563 |
| cache_menu | MyISAM | Dynamic | 145 | 124892 | 25.33553696 | 25.32577133 | 0.00976563 |
| search_index | MyISAM | Dynamic | 472087 | 36 | 23.40134048 | 16.30759048 | 7.09375000 |
| comments | MyISAM | Dynamic | 98272 | 208 | 21.83272934 | 19.58272934 | 2.25000000 |
Investigating the content of the ‘watchdog’ table shows detailed logging. Drilling down just on the key ‘type’ records shows the following.
mysql> select message,count(*) from watchdog where type='page not found' group by message order by 2 desc limit 10;
+--------------------------------------+----------+
| message | count(*) |
+--------------------------------------+----------+
| content/images/loadingAnimation.gif | 17198 |
| see/images/loadingAnimation.gif | 6659 |
| images/loadingAnimation.gif | 6068 |
| node/images/loadingAnimation.gif | 2774 |
| favicon.ico | 1772 |
| sites/all/modules/coppa/coppa.js | 564 |
| users/images/loadingAnimation.gif | 365 |
| syndicate/google-analytics.com/ga.js | 295 |
| content/img_pos_funny_lowsrc.gif | 230 |
| content/google-analytics.com/ga.js | 208 |
+--------------------------------------+----------+
10 rows in set (2.42 sec)
Some 25% of rows is just the reporting one missing file. Correcting this one file cuts down a pile of unnecessary logging.
Repeating Queries
Looking at just 1 random second of SQL logging shows 1200+ SELECT statements.
355 are SELECT changed FROM node
$ grep would_you_rather drupal.1second.log
7 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
5 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
3 Query SELECT field_image_textarea_value AS value FROM content_type_would_you_rather WHERE vid = 24303 LIMIT 0, 1
4 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
6 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
10 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
9 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
8 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1
9 Query SELECT field_image_textarea_value AS value FROM content_type_would_you_rather WHERE vid = 24303 LIMIT 0, 1
There is plenty of information regarding monitoring the Slow Queries in MySQL, but I have also promoted that’s it not the slow queries that ultimately slow a system down, but the 1000’s of repeating fast queries.
MySQL of course has the Query Cache to assist, but this is a course grade solution, and a high volume read/write environment this is meaningless.
There is a clear need for either a application level caching, or a database redesign to pull rather then poll this information, however without more in depth review of Drupal I can not make any judgment calls.