MySQL uses disk. Ok, so everybody knew that. MySQL uses disk in two primary ways.
- Random I/O (Reading & Writing Data/Index blocks)
- Sequential I/O (Binary Log, InnoDB Redo Log)
Historically it’s been best practice to separate these onto different spindles, and also separating the OS and tmp space onto a third spindle. With commodity H/W that can be easily done, but today a lot of people use SAN. Is this a good thing for a MySQL Database?
That’s a topic of much discussion at a later time, however I’ll add two points. A lot of SAN configurations are RAID 5, and RAID 10 is a better choice due to removing the requirement to calculate the parity. Second, last week I observed a RAID disk failure and it took an incredible long time for the disk to be re-built. Just how many SAN uses our there have actually timed a disk rebuild on a loaded system and seen the impact on the system in general.
Back on topic, I don’t have access to any variety of hardware, so community here is where you can help. Those that can spare 5 mins, and have some free disk space (< 5GB), here is an exercise.
Commands
$ time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
$ ls -lh testfile.32k
$ rm testfile.32k
You should see something like (FYI: from a 5400rpm laptop drive)
$ time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
3276800000 bytes (3.3 GB) copied, 160.172 seconds, 20.5 MB/s
real 2m40.342s
user 0m0.120s
sys 0m15.277s
$ ls -lh testfile.32k
-rw-r--r-- 1 usr grp 3.1G 2007-06-28 10:02 testfile.32k
If your output doesn’t provide the dd M/B output (like Solaris for example) if you could also add:
$bc
3.2*1024/160
20
^D
NOTE: Replace 160 with the number of seconds from the real time (e.g. 2*60+40)
Of course I’m not collecting a lot of stuff, like comparing different block sizes, or looking at iostat for existing load and introduced load. I thought I’d ask an easy question to see what type of response and output I’d find.
If you want to add any additional information such as Drive Types & Speeds (e.g. SATA 5400rpm), RAID configuration or SAN configuration (e.g. RAID and connection type), OS and File System type that would be great, but completely optional.
If you would like add your results anonymously, please email me directly.
Thanks in advance.
Jason McKay says
Here are my results with two different fiber attached arrays (both Linux 2.6.9-42.0.3.ELsmp):
Fiber attached array(raid5), ext3 filesystem:
$ time dd if=/dev/zero count=100000
of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
real 0m20.191s
user 0m0.155s
sys 0m16.379s
$ ls -lh testfile.32k
-rw-r–r– 1 root root 3.1G Jun 28 11:05 testfile.32k
Fiber attached array(raid5), ext3 filesystem over lvm over drbd:
$ time dd if=/dev/zero
count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
real 0m30.538s
user 0m0.134s
sys 0m19.267s
$ ls -lh testfile.32k
-rw-r–r– 1 root root 3.1G Jun 28 11:16 testfile.32k
Kevin Burton says
Putting the WAL on one disk and the innodb table on another won’t help actually. It’s better to RAID 0 them (or partition the tables).
Innodb can’t write to the WAL AND the data files at the same time so you won’t be performing concurrent work.
The binary log (and slow query log) are antoher story though.
Mark Atwood says
That just measures filesystem append speed.
The underlying engines tend to prefer preallocating the filespace, and then do seek/writes inside the file, since thats usually much faster than filesystem append.
A set of tools that measure random seek-read and random seek-write numbers would be illuminating. However, they can’t just be easy whipped off on the UNIX command line…
Patrick Domack says
I agree with Mark Atwood, I normally use several bonnie runs of bonnie, and a few other testers to check. Other things than drives and raid config affect the performance alot too, like amound of ram on the raid card, write caching. I perfer to stay away from raid5 cause of the slowness with a drive failure. I still wonder about netapp raiddp setup.
This is a 6drive U160 15krpm raid10
3276800000 bytes transferred in 77.806677 secs (42114638 bytes/sec)
real 1m17.834s
user 0m0.206s
sys 0m23.531s
-rw-r–r– 1 root wheel 3.1G Jun 28 22:27 testfile.32k
Stewart Smith says
(all on XFS… on my /home… so a used file system)
time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
3276800000 bytes (3.3 GB) copied, 122.893 seconds, 26.7 MB/s
real 2m3.181s
user 0m0.068s
sys 0m9.637s
yes, my laptop drive is teh roxor!
although seemed tricky to repeat these numbers….
Although… 1MB blocksize and O_DIRECT:
3145728000 bytes (3.1 GB) copied, 119.849 seconds, 26.2 MB/s
real 1m59.871s
user 0m0.004s
sys 0m1.568s
What’s more interesting is the rates at different parts of the disk… there is a difference.
Guillaume Lefranc says
Most good SAN configurations use RAID6 today (like NetApp, which I’ll be testing in a few days).
In the meantime, this is what we use (all these numbers are in production):
6 drive U300 SCSI 15k on LVS aggregate stripe size=1MB, XFS
time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
3276800000 bytes (3.3 GB) copied, 31.4418 seconds, 104 MB/s
real 0m31.449s
user 0m0.024s
sys 0m8.309s
—
2 drive SAS 10k on HW RAID 0, XFS:
time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
3276800000 bytes (3.3 GB) copied, 47.756 seconds, 68.6 MB/s
real 0m47.795s
user 0m0.040s
sys 0m7.370s
I expect better numbers from the FC SAN.
Jean-Pierre Schwickerath says
Software RAID 10 with 32k chunks over 6 S-ATA drives 7200rpm attached to on board S-ATA connectors. Hardened Gentoo Linux 2.6.20 with XFS filesystem, mounted with noatime,logbufs=8 options
time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
3276800000 bytes (3.3 GB) copied, 18.2788 s, 179 MB/s
real 0m18.283s
user 0m0.192s
sys 0m12.301s
Aaron Conaway says
Looks like we’re pushing about 269MB/s on an HP XP128 across a stack of HP SAN switches. The XP128 is set up with a whole mess of RAID5 volumes presented to our Linux boxes which use LVM to manage the volumes. We always have multipathing involved in essential systems (read: mySQL boxes) in an active/passive configuration.
If you do the math, though, we may actually be running into a bottleneck on the inter-switch trunks. Our trunks are running at 2Gpbs.
269MB/s * 8bits/Byte = 2.152Gbps
—–
$ sudo time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
0.06user 8.75system 0:12.17elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+180minor)pagefaults 0swaps
$ ll test*
-rw-r–r– 1 root root 3276800000 Aug 9 10:06 testfile.32k
$ bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty’.
3.2*1024/12.17
269