Thursday, October 2, 2008

How to increase storage subsystem speed (without increasing disk rotation)

The primary reason that physical disks are around two orders of magnitude slower than the registers and cache of a central processing unit is the simple fact that one relies on physical motion, whilst the other utilizes waves of electrons operating at speeds close to the speed of light. The only limiting factor for the speed of a wave of electrons traveling through a semi-conductor is the material and its fabrication size this is often referred to as the fabrication “Process” used. In turn the process produces the physical limitations of the solid state circuitry.

The theoretical mathematical limitation is described by Taylor and Wheeler (1992)i to be 2L/c where L is the average distance to the memory and c is the maximum celerity.


The primary limiting factor within a central processing unit is propagation delay as well as the distance to the physical memory. Since the next limitation is the core oscillator that acts as the CPU’s clock we see that there are no moving components and the physical limitations of the medium being used impose a speed limit to the clock itself. When room temperature superconductors become a reality, then the only limitation to compute speed will be the mediums permittivity, and information will travel at the speed of light (C ).


When a computing machine accesses physical storage a section of the memory mapped to the devices controller is used as the interface; that interface in turn has to utilize the Hardware Abstraction Layer and Operating System (including all of it’s dependencies) to locate the appropriate driver which then utilizes the native commands to send and receive information from core memory to non-volatile storage via the storage device controller. One order of magnitude of latency is introduced when the core memory is utilized; the second order of magnitude of latency is between the controller and the storage device itself.(Developer Shed, 2004)ii The average core memory latency is measured in microseconds the average hard drive latency is measured in the milliseconds. (Storage Review, 2005)iii


Current methods utilized to increase storage seek, read & write performance are being implemented by utilizing serial buses versus parallel ones(Dorian Cougias, 2003)iv, employing techniques such as command queuing (Patrick Schmidt, 2004)v to offer a type of “branch prediction” to the drive itself and increasing local storage buffer size on the drive so less time is spent physically looking for bits, as well as utilizing multiple drives in a RAID configuration to aggregate available I/O bound bandwidth by utilizing specialized hardware or software, this also includes “Storage Area Networks” which in essence are local drives put somewhere else with gargantuan amounts of input & output bandwidth between the drives and computers that use them.

Performance increases to operational systems latency involves two kinds of improvements, the first family require no hardware modification and fall into the category of software optimizations, the second kind require hardware configuration and architectural changes.


Here’s a list of known Optimizations that will improve storage latency within the Intel PC architecture that I have gained from my personal experience:


1. Utilize a file system layout that puts the most frequently accessed files closest to the spindle

2. Utilize an optimized chunk size within said file system and its application, this is a delicate endeavor, some debate its validity, vendors such as Google use GFS that has file chunks that are 64MB in size; conversely the default NTFS chunk is 64KB; GFS is optimized for web crawling & reliability on commodity hardware. (Ghemawat et al, 2003)vi

3. Ensure that the drive has a dedicated bus; for parallel ATA systems this meant purchasing extra controller hardware, it’s a standard in serial ATA and serial SCSI storage controllers.

4. Optimize all controller driver software to the most current stable version available including the drives and the controller’s firmware.

5. Within the software driver offload as many storage calculations to dedicated systems hardware; these are usually part of the driver options or bios settings and may be implemented within either the system or the controller itself.

6. If multiple disks are available configure a RAID array; again depending on application, two drives connected in a raid 0 array may easily achieve twice the write and read performance; with half the reliability!

7. If a page file is used ensure it’s a static otherwise unless required remove the page file.


Now for architectural changes that would improve storage latency; to increase overall system performance and reduce the amount of primary latency involved the speed of the front side bus or the core memory bandwidth is the first place all systems vendors work to improve. Increasing the available core memory bus bandwidth and I/O latency improves storage latency as it’s the first caveat within systems architecture. Architectural changes are relatively expensive when compared with optimizations and usually take time to implement as the storage vendor consortium must adopt them as manufacturing standards; thus unless they are developed by that consortium or these optimizations are more cost effective than a current technology they will not usually get implemented in the public domain.



The second place to increase available Input output (I/O) bandwidth is the system drive’s bus itself. The current speed of Serial ATA is 300MBs/3Gbps this is achieved across a four pin serial interface, the next generation of serial ATA will be capable of 600MB/s or 6.0Gbps(Leroy Davis, 2008)vii, the increased bus speed will require that future drives have larger local buffer memories and better command queuing. So the next methods to increase drive performance without modifying disk’s rotational speed are as follows:


1. Increase aerial storage density and reduce platter diameter, utilize new magnetic substrates with higher potential well resolutions and smaller drive heads.

2. Increase the drive’s communications buffer size, preferably by orders of 2 (128MB, 256MB, 512MB…), thus reducing the amount of seek, read and write commands actually issued to the platter.

3. Increase the drive controller’s input/output bandwidth on both sides of the controller, eg: from the drive to the controller and from the controller to core memory via the driver and operating system; including increasing the controller’s bus clock rate.

Although this discussion question refers to physical disk storage, there is a trend emerging for non-volatile solid state storage based upon NAND flash technology; IBM also has an organic substrate that has shown promise with areal densities far higher than physical disks entitled the millipede.(Vettiger et al, 1999)viii although advanced concepts such as holographic storage, AFM storage and others have been around for a long time they are yet to be cost effective enough to be adopted as non-volatile storage solutions by industry.


i Edwin F. Taylor, John Archibald Wheeler (1992) Spacetime Physics, 2nd ed. United States: W.H. Freeman & Co


ii Jkbaseball, Developer Shed (2004-11-30) Effects of Memory Latency [Online] World Wide Web, Available From:

http://www.devhardware.com/c/a/Memory/Effects-of-Memory-Latency/

(Accessed on Oct 1st 2008)


iii Charles M Kozierok, Storage Reivew (2005) The PC Guide – Latency [Online] World Wide Web, Available From:

http://www.storagereview.com/guide2000/ref/hdd/perf/perf/spec/posLatency.html

(Accessed on Oct 1st 2008)


iv Dorian Cougias, Search Storage (2003) The advantages of Serial ATA over Parallel ATA [Online] World Wide Web, Available From:

http://searchstorage.techtarget.com/tip/1,289483,sid5_gci934874,00.html

(Accessed on Oct 1st 2008)

v Patrick Schmid, Toms Hardware (Nov 16 2004) Can Command Queuing Turbo Charge SATA Hard Drives? [Online] World Wide Web, Available From:

http://www.tomshardware.com/reviews/command-queuing-turbo-charge-sata,922.html

(Accessed on Oct 1st 2008)

vi Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, 19 ACM Symposium on Operating Principles (Lake George New York, 2003) The Google File System [Online] World Wide Web, Available From:

http://labs.google.com/papers/gfs.html

(Accessed on Oct 2nd 2008)

vii Leroy Davis, Interface Bus (Sep 17 2008) Serial ATA Bus Description [Online] World Wide Web, Available From:

http://www.interfacebus.com/Design_Connector_Serial_ATA.html

(Accessed on Oct 1st 2008)

viii Vettiger et al, IBM Journal of Resarch and Development (1999),The millipede more than one thousand tips for the future of AFM data storage [Online] World Wide Web, Available from:

http://www.research.ibm.com/journal/rd/443/vettiger.html

(Accessed on Oct 2nd 2008)

No comments:

Post a Comment