I was reading only last week the notes from Wikipedia: Site Internals, Configuration and Code Examples, and Management Issues Tutorial by Domas Mituzas at the recent 2007 MySQL Conference. I didn’t attend this session, like a lot of sessions too much good stuff at the same time.
It’s obviously taken a while to catch up on my reading, but with the present MySQL 12 days of Scale-Out I thought I’ll complete my notes for all to see.
If you have never used Wikipedia well, why are you reading this, you should spend an hour there now. Alexa places Wikipedia in one of the top 10 visited sites on the Internet.
Wikipedia runs on the LAMP stack, powered by the MySQL database. Nothing new here, but how Wikipedia scales is. Some of the interesting points involved how a “Content Delivery Network” was build with components including Squid, Lighttpd, Memcached, LVS to improve caching. Appropriate caching is an important component to a successful scale-out infrastructure. An interesting quote however:
The common mistake is to believe that database is too slow and everything in it has to becached somewhere else. In scaled out environments reads are very efficient, and difference of time between efficient MySQL query and memcached request is negligible – both may execute in less than 1ms usually).
Sometimes memcached request may be more expensive than database query simply because it has to establish connection to a server, whereas database connection is usually
open.
Wikipedia has a developed an application Load Balancer. This offers a flexibility in efficient database use and is critical to any scale-out infrastructure. Combined with a good Database API and items such as the Pager class, allows you to write efficient index-based offsets pager (instead of ugly LIMIT 50 OFFSET 10000) syntax for example.
The main ideology in operating database servers is RAIS: – Redundant Array of Inexpensive/Independent/Instable[sic] Servers
- RAID0. Seems to provide additional performance/space. Having half of that capacity with an idea that a failure will degrade performance even more doesn’t sound like an efficient idea. Also, we do notice disk problems earlier. This disk configuration should be probably called AID, instead of RAID0.
- innodb_flush_log_at_trx_commit=0, tempted to do innodb_flush_method=nosync. If a server crashes for any reason, there’s not much need to look at its data consistency. Usually that server will need hardware repairs anyway. InnoDB transaction recovery would take half an hour or more. Failing over to another server will usually be much faster. In case of master failure, last properly replicated event is last event for whole environment. No ‘last transaction contains millions’ worries makes the management of such environment much easier – an idea often forgotten by web applications people.
The thing I found interesting in the RAIS mysql-node configuration was slave-skip-errors=0,1213,1158,1053,1007,1062
However, the greatest tip is “All database interaction is optimized around MySQL’s methods of reading the data.” This includes:
- Every query must have appropriate index for reads…
- Every query result should be sorted by index, not by filesorts. This means strict and predictable path of data access…
- Some fat-big-tables have extended covering indexing just on particular nodes…
- Queries prone to hit multiversioning troubles have to be rewritten accordingly…
Wikipedia is also clever in it’s sharding. A means to implement vertical and horizontal partition of data via the application for optimal scale-out. This comes down to designing your application correct from the start. Wikipedia considers it’s partition via:
- data segments
- tasks
- time
HiveDB (not used by wikipedia) is open source framework for horizontally partitioning MySQL systems. Well worth reviewing.
Wikipedia also makes use of compression. This works when your data can be compressed well like text. This improves performance, however analysis on other projects have shown this does place a CPU impact on the server so it is important to monitor and use appropriately.
Another clever approach is to move searching to tools more appropriate for this task, in this case Lucene. As with any scale-out it is important to leverage the power of appropriate tools for maximum benefit.
I have only summarized Domas’ notes. It’s well worth a detailed read.