I had the opportunity to review a client’s production Drupal installation recently. This is a new site and traffic is just starting to pick up. Drupal is a popular LAMP stack open source CMS system using the MySQL Database.
Unfortunately I don’t always have the chance to focus on one product when consulting , sometimes the time can be minutes to a few hours. Some observations from looking at Drupal.
Disk footprint
Presently, volume and content is of a low volume, but expecting to ramp up. I do however find 90% of disk volume in one table called ‘watchdog';
+--------------+--------------+--------------+-------------+--------+ | table_schema | total_mb | data_mb | index_mb | tables | +--------------+--------------+--------------+-------------+--------+ | xxxxx | 812.95555878 | 745.34520721 | 67.61035156 | 191 | +--------------+--------------+--------------+-------------+--------+ +-------------------------------------------+--------+------------+------------+----------------+--------------+--------------+-------------+ | table_name | engine | row_format | table_rows | avg_row_length | total_mb | data_mb | index_mb | +-------------------------------------------+--------+------------+------------+----------------+--------------+--------------+-------------+ | watchdog | MyISAM | Dynamic | 63058 | 210 | 636.42242813 | 607.72516251 | 28.69726563 | | cache_menu | MyISAM | Dynamic | 145 | 124892 | 25.33553696 | 25.32577133 | 0.00976563 | | search_index | MyISAM | Dynamic | 472087 | 36 | 23.40134048 | 16.30759048 | 7.09375000 | | comments | MyISAM | Dynamic | 98272 | 208 | 21.83272934 | 19.58272934 | 2.25000000 |
Investigating the content of the ‘watchdog’ table shows detailed logging. Drilling down just on the key ‘type’ records shows the following.
mysql> select message,count(*) from watchdog where type='page not found' group by message order by 2 desc limit 10; +--------------------------------------+----------+ | message | count(*) | +--------------------------------------+----------+ | content/images/loadingAnimation.gif | 17198 | | see/images/loadingAnimation.gif | 6659 | | images/loadingAnimation.gif | 6068 | | node/images/loadingAnimation.gif | 2774 | | favicon.ico | 1772 | | sites/all/modules/coppa/coppa.js | 564 | | users/images/loadingAnimation.gif | 365 | | syndicate/google-analytics.com/ga.js | 295 | | content/img_pos_funny_lowsrc.gif | 230 | | content/google-analytics.com/ga.js | 208 | +--------------------------------------+----------+ 10 rows in set (2.42 sec)
Some 25% of rows is just the reporting one missing file. Correcting this one file cuts down a pile of unnecessary logging.
Repeating Queries
Looking at just 1 random second of SQL logging shows 1200+ SELECT statements.
355 are SELECT changed FROM node
$ grep would_you_rather drupal.1second.log 7 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1 5 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1 3 Query SELECT field_image_textarea_value AS value FROM content_type_would_you_rather WHERE vid = 24303 LIMIT 0, 1 4 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1 6 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1 10 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1 9 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1 8 Query SELECT changed FROM node WHERE type='would_you_rather' AND STATUS=1 ORDER BY created DESC LIMIT 1 9 Query SELECT field_image_textarea_value AS value FROM content_type_would_you_rather WHERE vid = 24303 LIMIT 0, 1
There is plenty of information regarding monitoring the Slow Queries in MySQL, but I have also promoted that’s it not the slow queries that ultimately slow a system down, but the 1000’s of repeating fast queries.
MySQL of course has the Query Cache to assist, but this is a course grade solution, and a high volume read/write environment this is meaningless.
There is a clear need for either a application level caching, or a database redesign to pull rather then poll this information, however without more in depth review of Drupal I can not make any judgment calls.