This post is an extension of the performance section of: http://www.emc.com/collateral/hardware/white-papers/h8129-performance-scaling-sql-wp.pdf. Writing a white paper requires some brevity lest it turn into unreadable minutia. Thank goodness for blogs.
Scalable shared database presents a really interesting lesson in I/O performance. As performance demands increase the DB architect has numerous tools available in the performance toolkit. Humans like to file things into boxes, Database workloads are no exception. We like to separate database systems into one of two categories:
- Data warehouse.
An OLTP workload is understood to create a random I/O pattern. Data Warehouse workloads are understood to be sequential. Life has a way of crushing pre-conceived notions. Allowing a group of analysts to query an OLTP server can cause a normally random workload to become a combined random and sequential workload. This increase in workload, especially a different type of workload, can overwhelm the basic OLTP functions of the database.
The most basic way to ensure that the OLTP system can continue to function as it was originally designed is to split the reporting functions onto a separate dedicated reporting server. Over the years I often used Log shipping to keep this reporting server in sync with the master OLTP database server. Unfortunately this required us to take the Reporting server offline whenever updates were applied.
The analysts do not like their reports to go offline. Using scalable shared database, we can use the replication functions of the storage array to keep reporting servers in sync, and load balancing technology to obscure refresh operations from the reporting system users.
Note: When speaking of scalable shared database performance requirements remember that by allowing the use of a single transaction server, and many reporting servers, we are drastically increasing the performance load placed on the storage array.
As described in my earlier post, http://sqlvelocity.typepad.com/blog/2010/08/windows-performance-counters-for-windows.html, the servers and storage array form a closed performance system with measurable performance characteristics. Little's law gives us an algorithm to predict the performance of a closed system. A single database server can exert three different types of I/O patterns:
- Large Block Random (I have also called this excessive head seeking).
The majority of deployed storage uses spinning disks. This presents an interesting problem. Spinning disks react differently to each of the major performance regimens. Understanding this performance is the key to designing a storage system that will achieve given performance requirements.
Modeling Disk Performance
To simply calculate Random performance, divide the rotational speed of the drive by 60 seconds. A 10,000 RPM drive rotates 166 times every second. A 15,000 RPM drive rotates 250 times a second. Because the drive head takes some time to move between disk tracks then we should also account for the time this movement (or seek) takes. Luckily this can be tested and both the average and maximum seek time are provided by drive manufactures.
Modeling disk performance becomes viable with the proper algorithm. The following example will help to calculate the random read performance of a spinning hard disk.
- Start by looking up the average seek time of a given disk drive model.
- Calculate the Maximum rotational latency (MRS) by dividing 60,000 (milliseconds per minute) by the drives rotational speed (revolutions per minute); 60,000/RPM.
- To calculate the number of I/O's per second (IOPS) divide 1000 by the drives Average read seek time (ARST) plus the Maximum rotational latency divided by 2.
For example a 10K RPM drive with a 7 millisecond seek time will produce 100 I/O's Per second;
- 1000/(7 + (6/2)) = 100 IOPS
A 15K drive with an 4 ms seek time will produce 166 I/O's Per Second;
- 1000/(4 + (4/2)) = 166 IOPS
Finally a 7200 RPM 146GB SATA drive with an 11 ms seek time will produce 66 I/O's per second.
- 1000/(11 + (8.3/2) = 66 IOPS
To keep the math really easy for my models I estimate that a 10K drive will produce 100 I/O's per second and a 15K Drive will produce 150 IOPS. This is an extremely conservative calculation that assumes that only one I/O is being issued at a time. As more I/O requests are issued performance will increase at the cost of latency. A good rule of thumb is to issue between 1 and 4 I/O's per second to each physical disk.
To increase Random I/O then we need to add more physical disks. If one 10K drive will provide 100 I/O's per second and we need 8,000 I/O's per second then 80 physical disks will be needed. If 80 disks worth of space are not needed then the space is not going to be used. Space will need to be traded for performance.
Sequential disk performance is much faster than random drive performance. Instead of being restricted by the time it takes to move the head between sectors, sequential performance is limited by the system interconnects between the disk and the server.
If we take a look at the 7200RPM 146GB SATA drive listed in the Random performance section above it will only perform 66 random IOPS. When tested sequentially it will perform 1311 IOPS. This is almost 20 times more performance from the same device. When tested with 64KB reads this 7200RPM drive can only generate 4 Mb per second when accessed randomly. A sequential access pattern will generate 81MB per second.
The only way to increase random performance is to aggregate disks. When a single drive is able to perform faster than 10 randomly accessed disks it is possible to decrease the number of disks in a system.
Scalable shared database performance
Planning a scalable shared database deployment will result in two distinct I/O performance pools. The primary database system, called the factory, will generate the first I/O profile. The reporting servers will generate the second I/O performance profile. As described earlier, the factory server will create a random, a sequential, or a large block random performance profile. Estimating this workload is tricky as the exact workload is often a variance of the three workload types, and it often changes over time. I like to plan for the worst case; large block random I/O. Unfortunately, large block random operations perform far worse than a standard random workload. Take this chart as an example:
The experiment compared a standard random workload with many sequential I/O's. As more work is generated the random workload performance increases. Compare this with many sequential workloads forming a now large block random workload. As the workloads increase the large block random workload continues to decrease overall performance. Quickly the random performance will eclipse that of the large block random workloads.
As stated in the paper; http://www.emc.com/collateral/hardware/white-papers/h8129-performance-scaling-sql-wp.pdf the best solution to this performance profile is to use Enterprise Flash Drives (EFD's) utilizing SLC NAND Flash technology (Head over to Wikipedia for an excellent description of flash technology: http://en.wikipedia.org/wiki/Flash_memory). If you are not able to utilize flash technology then plan for the absolute worst case scenario.
Using the equation: 1000/(ARST+(MRS/2)) replace Average read seek time with Maximum read seek time (you can find Maximum read seek time from the drive manufacturer – it is also called a Full stroke access time).
If we look at a Seagate 15K 450GB FC drive it has an average seek time of 3.4ms and a full stroke seek time of 6.6ms.
1000/(4 + (3.4/2)) = 174 IOPS Random Read
1000/(4 + (6.6/2)) = 136 IOPS Maximum Seek Reads
As you can see the performance threshold for a 15K drive will be closer to that of a 10K drive. Plan appropriately when configuring storage for the shared read only volumes. Remember that more than one server will be sending I/O to the shared volume, so it needs to have performance that can handle the load generated by all of the servers combined.
That wraps up the series on Scalable Shared Data Base. Please Post questions.