SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Pastimes : Computer Learning -- Ignore unavailable to you. Want to Upgrade?


To: PMS Witch who wrote (16578)2/20/2001 11:57:00 AM
From: PMS Witch  Read Replies (1) | Respond to of 110634
 
Disk fragmentation …

Recently, on this thread, a few participants have mentioned various levels of disk fragmentation. The posts hit that magic number where my interest was stimulated enough to look at the matter further, yet no so many that I became bored with the subject.

First example…

I have a partition devoted to storing images of my system. Each image is roughly 150meg. For discussion simplicity, I’ll claim I have ten. When checking this partition, Norton reports zero fragmentation. This is no surprise since these files are never deleted; thus, no holes ever appear for later filling with file fragments. I consider zero fragmentation the simplest case: All files stored in continuous disk space. Simple!

Now, let’s try something. I wipe the partition with the images, create a 75meg file, and store it at the beginning. Next, I store a tiny text file, say a grocery list. Following this, I delete the 75meg file and store a 150meg image. Now I have my image split in half on the disk since the first 75meg occupies the space vacated by my 75meg junk, the tiny file must be written around, and the remaining 75 meg of my image gets written beyond this tiny file. I continue this process until I have ten images, all split in the middle. Then I delete my tiny files. I’m again left with ten huge files.

This time, when I check fragmentation, each and every file on this disk is split, so the disk gets reported as being 100% fragmented. However, the disk contains 20 chunks of continuous data of 75meg each. By all practical standards, this disk has been written very efficiently, but I wouldn’t know it by the report. Running the defragging software against this disk will cause almost 1.5 gig of data to be moved for a profoundly tiny improvement in storage efficiency.

Next example…

Some people save all e-mail. (A practice Microsoft probably wishes their employees hadn’t followed!) If one is sufficiently popular or important, they’ll eventually get a huge pile. For discussion, lets assume 100,000 messages are saved on disk in separate files. Further, lets assume each is less than the minimum cluster size for that disk, (Mine is 4K) which will mean that each and every one of those 100,000 files will be viewed as stored without fragmentation. Lastly, assume the system has 1,000 other files, (Mine has 1,500) and that the disk has never been de-fragmented. Obviously, the system files on this disk will be a real mess, scattered into all parts of the disk, and under normal inspection, likely to report levels of fragmentation that qualify for the record books. But when the fragmentation report gets generated, the system counts 1,000 files in a mess and 100,000 files pristine. The report indicates 1% fragmentation. The user concludes her disk is in excellent shape.

Next example…

Jumping back to the big files with splits in the middle, we see that one break causes the file to register as fragmented. But what if the file had 10 breaks, 100 breaks, or 1,000? Clearly, there are different levels of fragmentation, but currently, they are measured in a binary manner: Continuous or fragmented. Also, as in my first example, the break was a one cluster interruption. The disk would continue to spin, the head would remain on the same cylinder, and the loss would be the latency of one read. The impact would be minimal. If the data was spread all over the disks many surfaces, on various cylinders and sectors, the performance compromise would be severe as many seek times were added to many latencies. Yet in both cases, we’d have one file marked as fragmented, and this one file would be included in the reported statistics identically no matter what its impact on storage efficiency.

Next example…

My disk images are written once, and with luck never used. If I need to restore my system a number of times from the same image, I’ve real problems. In short, the data just sits there and does nothing. Compare this with a program such as Excel on an accountant’s machine. (Excel, for those who may not know, is a spreadsheet program. It displays rows and columns of cells on the screen. Users type formulas into these cells, and Excel returns beeps and error messages.) The Excel program file is 5meg. Our accountant friend may execute this program over 100 times per day. Clearly, any fragmentation in this file would be 100 times more serious than in a relatively unused file. Again, this fragmentation would be reported as yes or no, with no weight given to its impact.

A suggestion…

I think fragmentation should be reported as the number of continuous clusters used by files divided by the total number of clusters used by those same files. In short: F = (1 - CC / TC) or as a percent F = ((1 – CC / TC) / 100) Reporting this value would give users a clearer understanding of just how efficiently or poorly their data is stored on their disk. (This wouldn’t help our accountant friend, because file use is disregarded.) Although Win98 tracks application use, as far as I know, it doesn’t track the use of individual files beyond last access date and time. This lack of data would make implementing any algorithm considering file use difficult; therefor, I’ll limit my wishes to the possible.

Since the percentages reported are inconsistent with storage efficiency anyway, we may be better off if we treat them sceptically, or at least with less concern. I would suggest that when users begin to notice increases in disk activity for routine work, or what sounds like additional clicks as the disk heads flutter back and forth, it may be time to defrag. If the system is performing its work in a reasonably acceptable manner, disk defragging efforts could be more profitably applied elsewhere. I’m convinced that defragging disks has consumed far more time and effort than it’s ever saved once the work has been completed.

Having said this, I still defrag, just less often.

Cheers, PW.