Wednesday 18 May 2011

Been told that you don’t need to defragment when adding a SAN environment into your network estate?



With massive electronic data growth occurring today, there is now a much greater need for storage. SAN is at the forefront for most storage solutions and providers such as HP, IBM and NetApp, to name a few, are providing the necessary platforms to cater to this. Virtualisation and cloud solutions are promising less hardware in this respect, but data storage if held locally, will inevitably mean acquiring more hardware. This additional hardware is where the problem will lie in terms of cost for many companies today.


Diagram of Disk I/O as it travels from Operating System to SAN LUN

It stands to reason then that making full use of your SAN potential is vital. A common misconception related to SAN storage environments is that they don’t suffer from fragmentation related issues. This is the “party line” being handed out by many of the storage providers. There are plausibly a couple of reasons for this. Firstly, that each SAN provider will have their own propriety logic when it comes to arranging blocks within the SAN environment, and secondly, which is more likely, is that they get to sell more hardware to their customers.

To understand this a little better, every file system is a "virtual" disk, stacking one virtual component over another (i.e. one file system on top of another). What the vendor of a SAN file system does at their SAN file system level is irrelevant to what the Windows file system does underneath ― all Windows file systems fragment, regardless.

SANs typically employ a clustered/SAN file system to pool disk arrays into a virtualized storage volume. This is not Windows NTFS, but rather the proprietary software, provided by a SAN hardware or software vendor. Claims that "you do not need to defragment" may be misunderstood and incorrectly implied to mean "Windows NTFS"― NTFS always needs to be defragmented. It is very possible that you do not need to defragment the "SAN file system". Tips, best practices and SAN I/O optimization methodologies should always be gotten from the respective SAN vendor.

SANs are only ever block-level storage, they do not know what I/Os relate to what files. Therefore they cannot intelligently spread the fragments of a file across multiple disks. A whole mass of separate I/Os writes/reads for fragmented files (which will most certainly be interspersed with other simultaneous data writes/reads) will non-optimally be spread across the disks in the SAN storage pool.

As for NTFS, it still fragments and causes the Windows OS to "split" I/O requests for files sent into the SAN, creating a performance penalty. You can measure this using the Window's built-in PerfMon tool and watch the split I/O counter. You can also use the Average Queued Disk I/O, given that you account for the number of physical spindles.

The only solution offered by SAN vendors to address the split I/O problem is adding more spindles. This solution would mask the problem by dispersing the I/Os across the additional disks. This would mean you will need to add more disks as the I/O bottle neck increases, which would, over a period of time.
The actual problem lies at the NTFS level, for every fragment of a file is a separate I/O that has to be generated to access it. The less fragmented a file is, the less I/Os are required to access the file, compared to when it is heavily fragmented.

For more information see Windows IT Pro whitepaper entitled “Maximize the Performance of your Windows SAN Infrastructure”.

No comments: