What's your disk I/O thoughtput?

MySQL uses disk. Ok, so everybody knew that. MySQL uses disk in two primary ways.

  • Random I/O (Reading & Writing Data/Index blocks)
  • Sequential I/O (Binary Log, InnoDB Redo Log)

Historically it’s been best practice to separate these onto different spindles, and also separating the OS and tmp space onto a third spindle. With commodity H/W that can be easily done, but today a lot of people use SAN. Is this a good thing for a MySQL Database?
That’s a topic of much discussion at a later time, however I’ll add two points. A lot of SAN configurations are RAID 5, and RAID 10 is a better choice due to removing the requirement to calculate the parity. Second, last week I observed a RAID disk failure and it took an incredible long time for the disk to be re-built. Just how many SAN uses our there have actually timed a disk rebuild on a loaded system and seen the impact on the system in general.

Back on topic, I don’t have access to any variety of hardware, so community here is where you can help. Those that can spare 5 mins, and have some free disk space (< 5GB), here is an exercise.

Commands

$ time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
$ ls -lh testfile.32k
$ rm testfile.32k

You should see something like (FYI: from a 5400rpm laptop drive)


$ time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
100000+0 records in
100000+0 records out
3276800000 bytes (3.3 GB) copied, 160.172 seconds, 20.5 MB/s
real 2m40.342s
user 0m0.120s
sys 0m15.277s
$ ls -lh testfile.32k
-rw-r--r-- 1 usr grp 3.1G 2007-06-28 10:02 testfile.32k

If your output doesn’t provide the dd M/B output (like Solaris for example) if you could also add:

$bc
3.2*1024/160
20
^D

NOTE: Replace 160 with the number of seconds from the real time (e.g. 2*60+40)

Of course I’m not collecting a lot of stuff, like comparing different block sizes, or looking at iostat for existing load and introduced load. I thought I’d ask an easy question to see what type of response and output I’d find.

If you want to add any additional information such as Drive Types & Speeds (e.g. SATA 5400rpm), RAID configuration or SAN configuration (e.g. RAID and connection type), OS and File System type that would be great, but completely optional.

If you would like add your results anonymously, please email me directly.

Thanks in advance.

Comments

  1. says

    Here are my results with two different fiber attached arrays (both Linux 2.6.9-42.0.3.ELsmp):

    Fiber attached array(raid5), ext3 filesystem:

    $ time dd if=/dev/zero count=100000
    of=testfile.32k bs=32k
    100000+0 records in
    100000+0 records out

    real 0m20.191s
    user 0m0.155s
    sys 0m16.379s
    $ ls -lh testfile.32k
    -rw-r–r– 1 root root 3.1G Jun 28 11:05 testfile.32k

    Fiber attached array(raid5), ext3 filesystem over lvm over drbd:

    $ time dd if=/dev/zero
    count=100000 of=testfile.32k bs=32k
    100000+0 records in
    100000+0 records out

    real 0m30.538s
    user 0m0.134s
    sys 0m19.267s
    $ ls -lh testfile.32k
    -rw-r–r– 1 root root 3.1G Jun 28 11:16 testfile.32k

  2. says

    Putting the WAL on one disk and the innodb table on another won’t help actually. It’s better to RAID 0 them (or partition the tables).

    Innodb can’t write to the WAL AND the data files at the same time so you won’t be performing concurrent work.

    The binary log (and slow query log) are antoher story though.

  3. says

    That just measures filesystem append speed.

    The underlying engines tend to prefer preallocating the filespace, and then do seek/writes inside the file, since thats usually much faster than filesystem append.

    A set of tools that measure random seek-read and random seek-write numbers would be illuminating. However, they can’t just be easy whipped off on the UNIX command line… :)

  4. Patrick Domack says

    I agree with Mark Atwood, I normally use several bonnie runs of bonnie, and a few other testers to check. Other things than drives and raid config affect the performance alot too, like amound of ram on the raid card, write caching. I perfer to stay away from raid5 cause of the slowness with a drive failure. I still wonder about netapp raiddp setup.

    This is a 6drive U160 15krpm raid10

    3276800000 bytes transferred in 77.806677 secs (42114638 bytes/sec)
    real 1m17.834s
    user 0m0.206s
    sys 0m23.531s
    -rw-r–r– 1 root wheel 3.1G Jun 28 22:27 testfile.32k

  5. says

    (all on XFS… on my /home… so a used file system)

    time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
    100000+0 records in
    100000+0 records out
    3276800000 bytes (3.3 GB) copied, 122.893 seconds, 26.7 MB/s

    real 2m3.181s
    user 0m0.068s
    sys 0m9.637s

    yes, my laptop drive is teh roxor!

    although seemed tricky to repeat these numbers….

    Although… 1MB blocksize and O_DIRECT:
    3145728000 bytes (3.1 GB) copied, 119.849 seconds, 26.2 MB/s

    real 1m59.871s
    user 0m0.004s
    sys 0m1.568s

    What’s more interesting is the rates at different parts of the disk… there is a difference.

  6. says

    Most good SAN configurations use RAID6 today (like NetApp, which I’ll be testing in a few days).

    In the meantime, this is what we use (all these numbers are in production):

    6 drive U300 SCSI 15k on LVS aggregate stripe size=1MB, XFS

    time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
    100000+0 records in
    100000+0 records out
    3276800000 bytes (3.3 GB) copied, 31.4418 seconds, 104 MB/s

    real 0m31.449s
    user 0m0.024s
    sys 0m8.309s

    2 drive SAS 10k on HW RAID 0, XFS:

    time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
    100000+0 records in
    100000+0 records out
    3276800000 bytes (3.3 GB) copied, 47.756 seconds, 68.6 MB/s

    real 0m47.795s
    user 0m0.040s
    sys 0m7.370s

    I expect better numbers from the FC SAN.

  7. says

    Software RAID 10 with 32k chunks over 6 S-ATA drives 7200rpm attached to on board S-ATA connectors. Hardened Gentoo Linux 2.6.20 with XFS filesystem, mounted with noatime,logbufs=8 options

    time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
    100000+0 records in
    100000+0 records out
    3276800000 bytes (3.3 GB) copied, 18.2788 s, 179 MB/s

    real 0m18.283s
    user 0m0.192s
    sys 0m12.301s

  8. Aaron Conaway says

    Looks like we’re pushing about 269MB/s on an HP XP128 across a stack of HP SAN switches. The XP128 is set up with a whole mess of RAID5 volumes presented to our Linux boxes which use LVM to manage the volumes. We always have multipathing involved in essential systems (read: mySQL boxes) in an active/passive configuration.

    If you do the math, though, we may actually be running into a bottleneck on the inter-switch trunks. Our trunks are running at 2Gpbs.

    269MB/s * 8bits/Byte = 2.152Gbps

    —–
    $ sudo time dd if=/dev/zero count=100000 of=testfile.32k bs=32k
    100000+0 records in
    100000+0 records out
    0.06user 8.75system 0:12.17elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k
    0inputs+0outputs (0major+180minor)pagefaults 0swaps
    $ ll test*
    -rw-r–r– 1 root root 3276800000 Aug 9 10:06 testfile.32k
    $ bc
    bc 1.06
    Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
    This is free software with ABSOLUTELY NO WARRANTY.
    For details type `warranty’.
    3.2*1024/12.17
    269