What's your disk I/O thoughtput?

June 29, 2007 by ronald

MySQL uses disk. Ok, so everybody knew that. MySQL uses disk in two primary ways.

Random I/O (Reading & Writing Data/Index blocks)
Sequential I/O (Binary Log, InnoDB Redo Log)

Historically it’s been best practice to separate these onto different spindles, and also separating the OS and tmp space onto a third spindle. With commodity H/W that can be easily done, but today a lot of people use SAN. Is this a good thing for a MySQL Database?
That’s a topic of much discussion at a later time, however I’ll add two points. A lot of SAN configurations are RAID 5, and RAID 10 is a better choice due to removing the requirement to calculate the parity. Second, last week I observed a RAID disk failure and it took an incredible long time for the disk to be re-built. Just how many SAN uses our there have actually timed a disk rebuild on a loaded system and seen the impact on the system in general.

Back on topic, I don’t have access to any variety of hardware, so community here is where you can help. Those that can spare 5 mins, and have some free disk space (< 5GB), here is an exercise.

Commands
$ time dd if=/dev/zero count=100000 of=testfile.32k bs=32k $ ls -lh testfile.32k $ rm testfile.32k


You should see something like (FYI: from a 5400rpm laptop drive)


$ time dd if=/dev/zero count=100000 of=testfile.32k bs=32k

100000+0 records in

100000+0 records out

3276800000 bytes (3.3 GB) copied, 160.172 seconds, 20.5 MB/s

real    2m40.342s

user    0m0.120s

sys     0m15.277s

$ ls -lh testfile.32k

-rw-r--r-- 1 usr grp 3.1G 2007-06-28 10:02 testfile.32k


If your output doesn’t provide the dd M/B output (like Solaris for example) if you could also add:



$bc

3.2*1024/160

20

^D


NOTE: Replace 160 with the number of seconds from the real time (e.g. 2*60+40)
Of course I’m not collecting a lot of stuff, like comparing different block sizes, or looking at iostat for existing load and introduced load. I thought I’d ask an easy question to see what type of response and output I’d find.
If you want to add any additional information such as Drive Types & Speeds (e.g. SATA 5400rpm), RAID configuration or SAN configuration (e.g. RAID and connection type), OS and File System type that would be great, but completely optional.
If you would like add your results anonymously, please email me directly.
Thanks in advance.

`Comments`


	
	

		
		
			
				Jason McKay says		 	

			
				June 29, 2007 at 3:59 am			
		

		
			
			Here are my results with two different fiber attached arrays (both Linux 2.6.9-42.0.3.ELsmp):
 Fiber attached array(raid5), ext3 filesystem:
$ time dd if=/dev/zero count=100000

 of=testfile.32k bs=32k

 100000+0 records in

 100000+0 records out
 real    0m20.191s

 user    0m0.155s

 sys     0m16.379s

$ ls -lh testfile.32k

 -rw-r–r–  1 root root 3.1G Jun 28 11:05 testfile.32k
 Fiber attached array(raid5), ext3 filesystem over lvm over drbd:
$ time dd if=/dev/zero

 count=100000 of=testfile.32k bs=32k

 100000+0 records in

 100000+0 records out
 real    0m30.538s

 user    0m0.134s

 sys     0m19.267s

$ ls -lh testfile.32k

 -rw-r–r–  1 root root 3.1G Jun 28 11:16 testfile.32k
		

		
		
	
	


	
	

		
		
			
				Kevin Burton says		 	

			
				June 29, 2007 at 7:30 am			
		

		
			
			Putting the WAL on one disk and the innodb table on another won’t help actually.  It’s better to RAID 0 them (or partition the tables).
Innodb can’t write to the WAL AND the data files at the same time so you won’t be performing concurrent work.
The binary log (and slow query log) are antoher story though.
		

		
		
	
	


	
	

		
		
			
				Mark Atwood says		 	

			
				June 29, 2007 at 9:39 am			
		

		
			
			That just measures filesystem append speed.
The underlying engines tend to prefer preallocating the filespace, and then do seek/writes inside the file, since thats usually much faster than filesystem append.
A set of tools that measure random seek-read and random seek-write numbers would be illuminating.  However, they can’t just be easy whipped off on the UNIX command line…  
		

		
		
	
	


	
	

		
		
			
				Patrick Domack says		 	

			
				June 29, 2007 at 12:36 pm			
		

		
			
			I agree with Mark Atwood, I normally use several bonnie runs of bonnie, and a few other testers to check. Other things than drives and raid config affect the performance alot too, like amound of ram on the raid card, write caching. I perfer to stay away from raid5 cause of the slowness with a drive failure. I still wonder about netapp raiddp setup.
This is a 6drive U160 15krpm raid10
3276800000 bytes transferred in 77.806677 secs (42114638 bytes/sec)

real    1m17.834s

user    0m0.206s

sys     0m23.531s

-rw-r–r–  1 root  wheel   3.1G Jun 28 22:27 testfile.32k
		

		
		
	
	


	
	

		
		
			
				Stewart Smith says		 	

			
				June 29, 2007 at 3:39 pm			
		

		
			
			(all on XFS… on my /home… so a used file system)
 time dd if=/dev/zero count=100000 of=testfile.32k bs=32k

100000+0 records in

100000+0 records out

3276800000 bytes (3.3 GB) copied, 122.893 seconds, 26.7 MB/s
real    2m3.181s

user    0m0.068s

sys     0m9.637s
yes, my laptop drive is teh roxor!
although seemed tricky to repeat these numbers….
Although… 1MB blocksize and O_DIRECT:

3145728000 bytes (3.1 GB) copied, 119.849 seconds, 26.2 MB/s
real    1m59.871s

user    0m0.004s

sys     0m1.568s
What’s more interesting is the rates at different parts of the disk… there is a difference.
		

		
		
	
	


	
	

		
		
			
				Guillaume Lefranc says		 	

			
				June 30, 2007 at 12:59 am			
		

		
			
			Most good SAN configurations use RAID6 today (like NetApp, which I’ll be testing in a few days).
In the meantime, this is what we use (all these numbers are in production):
6 drive U300 SCSI 15k on LVS aggregate stripe size=1MB, XFS
time dd if=/dev/zero count=100000 of=testfile.32k bs=32k

100000+0 records in

100000+0 records out

3276800000 bytes (3.3 GB) copied, 31.4418 seconds, 104 MB/s
real    0m31.449s

user    0m0.024s

sys     0m8.309s
—
2 drive SAS 10k on HW RAID 0, XFS:
time dd if=/dev/zero count=100000 of=testfile.32k bs=32k

100000+0 records in

100000+0 records out

3276800000 bytes (3.3 GB) copied, 47.756 seconds, 68.6 MB/s
real    0m47.795s

user    0m0.040s

sys     0m7.370s
I expect better numbers from the FC SAN.
		

		
		
	
	


	
	

		
		
			
				Jean-Pierre Schwickerath says		 	

			
				July 1, 2007 at 9:10 pm			
		

		
			
			Software RAID 10 with 32k chunks over 6 S-ATA drives 7200rpm attached to on board S-ATA connectors. Hardened Gentoo Linux 2.6.20 with XFS filesystem, mounted with noatime,logbufs=8 options
time dd if=/dev/zero count=100000 of=testfile.32k bs=32k

100000+0 records in

100000+0 records out

3276800000 bytes (3.3 GB) copied, 18.2788 s, 179 MB/s
real    0m18.283s

user    0m0.192s

sys     0m12.301s
		

		
		
	
	


	
	

		
		
			
				Aaron Conaway says		 	

			
				August 10, 2007 at 12:22 am			
		

		
			
			Looks like we’re pushing about 269MB/s on an HP XP128 across a stack of HP SAN switches.  The XP128 is set up with a whole mess of RAID5 volumes presented to our Linux boxes which use LVM to manage the volumes.  We always have multipathing involved in essential systems (read: mySQL boxes) in an active/passive configuration.
If you do the math, though, we may actually be running into a bottleneck on the inter-switch trunks. Our trunks are running at 2Gpbs.
269MB/s * 8bits/Byte = 2.152Gbps
—–

$ sudo time dd if=/dev/zero count=100000 of=testfile.32k bs=32k

100000+0 records in

100000+0 records out

0.06user 8.75system 0:12.17elapsed 72%CPU (0avgtext+0avgdata 0maxresident)k

0inputs+0outputs (0major+180minor)pagefaults 0swaps

$ ll test*

-rw-r–r–  1 root root 3276800000 Aug  9 10:06 testfile.32k

$ bc

bc 1.06

Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.

This is free software with ABSOLUTELY NO WARRANTY.

For details type `warranty’.

3.2*1024/12.17

269