File system fragmentation
Solid-state drives do not physically seek, so their non-sequential data access is hundreds of times faster than moving drives, making fragmentation less of an issue. It is recommended to not manually defragment solid-state storage, because this can prematurely wear drives via unnecessary write–erase operations.[1]
Causes
When
As existing files are deleted or truncated, new regions of free space are created. When existing files are appended to, it is often impossible to resume the write exactly where the file used to end, as another file may already be allocated there; thus, a new fragment has to be allocated. As time goes on, and the same factors are continuously present, free space as well as frequently appended files tend to fragment more. Shorter regions of free space also mean that the file system is no longer able to allocate new files contiguously, and has to break them into fragments. This is especially true when the file system becomes full and large contiguous regions of free space are unavailable.
Example
The following example is a simplification of an otherwise complicated subject. Consider the following scenario: A new disk has had five files, named A, B, C, D and E, saved continuously and sequentially in that order. Each file is using 10 blocks of space. (Here, the block size is unimportant.) The remainder of the disk space is one free block. Thus, additional files can be created and saved after the file E.
If the file B is deleted, a second region of ten blocks of free space is created, and the disk becomes fragmented. The empty space is simply left there, marked as and available for later use, then used again as needed.[b] The file system could defragment the disk immediately after a deletion, but doing so would incur a severe performance penalty at unpredictable times.
Now, a new file called F, which requires seven blocks of space, can be placed into the first seven blocks of the newly freed space formerly holding the file B, and the three blocks following it will remain available. If another new file called G, which needs only three blocks, is added, it could then occupy the space after F and before C.
If subsequently F needs to be expanded, since the space immediately following it is occupied, there are three options for the file system:
- Adding a new block somewhere else and indicating that F has a second extent
- Moving files in the way of the expansion elsewhere, to allow F to remain contiguous
- Moving file F so it can be one contiguous file of the new, larger size
The second option is probably impractical for performance reasons, as is the third when the file is very large. The third option is impossible when there is no single contiguous free space large enough to hold the new file. Thus the usual practice is simply to create an extent somewhere else and chain the new extent onto the old one.
Material added to the end of file F would be part of the same extent. But if there is so much material that no room is available after the last extent, then another extent would have to be created, and so on. Eventually the file system has free segments in many places and some files may be spread over many extents. Access time for those files (or for all files) may become excessively long.
Necessity
This section is written like a personal reflection, personal essay, or argumentative essay that states a Wikipedia editor's personal feelings or presents an original argument about a topic. (June 2019) |
Some early file systems were unable to fragment files. One such example was the Acorn DFS file system used on the BBC Micro. Due to its inability to fragment files, the error message can't extend would at times appear, and the user would often be unable to save a file even if the disk had adequate space for it.
DFS used a very simple disk structure and
Standards of
Types
File system fragmentation may occur on several levels:
- Fragmentation within individual files
- Free space fragmentation
- The decrease of locality of reference between separate, but related files
- Fragmentation within the data structures or special files reserved for the file system itself
File fragmentation
Individual file fragmentation occurs when a single file has been broken into multiple pieces (called extents on extent-based file systems). While disk file systems attempt to keep individual files contiguous, this is not often possible without significant performance penalties. File system check and defragmentation tools typically only account for file fragmentation in their "fragmentation percentage" statistic.
Free space fragmentation
Free (unallocated) space fragmentation occurs when there are several unused areas of the file system where new files or metadata can be written to. Unwanted free space fragmentation is generally caused by deletion or truncation of files, but file systems may also intentionally insert fragments ("bubbles") of free space in order to facilitate extending nearby files (see preventing fragmentation below).
File scattering
File segmentation, also called related-file fragmentation, or application-level (file) fragmentation, refers to the lack of locality of reference (within the storing medium) between related files. Unlike the previous two types of fragmentation, file scattering is a much more vague concept, as it heavily depends on the access pattern of specific applications. This also makes objectively measuring or estimating it very difficult. However, arguably, it is the most critical type of fragmentation, as studies have found that the most frequently accessed files tend to be small compared to available disk throughput per second.[4]
To avoid related file fragmentation and improve locality of reference (in this case called file contiguity), assumptions or active observations about the operation of applications have to be made. A very frequent assumption made is that it is worthwhile to keep smaller files within a single
Data structure fragmentation
The catalogs or indices used by a file system itself can also become fragmented over time, as the entries they contain are created, changed, or deleted. This is more of a concern when the volume contains a multitude of very small files than when a volume is filled with fewer larger files. Depending on the particular file system design, the files or regions containing that data may also become fragmented (as described above for 'regular' files), regardless of any fragmentation of the actual data records maintained within those files or regions.[5]
For some file systems (such as NTFS[c] and HFS/HFS Plus[6]), the collation/sorting/compaction needed to optimize this data cannot easily occur while the file system is in use.[7]
Negative consequences
File system fragmentation is more problematic with consumer-grade
In simple file system benchmarks, the fragmentation factor is often omitted, as realistic aging and fragmentation is difficult to model. Rather, for simplicity of comparison, file system benchmarks are often run on empty file systems. Thus, the results may vary heavily from real-life access patterns.[11]
Mitigation
Several techniques have been developed to fight fragmentation. They can usually be classified into two categories: preemptive and retroactive. Due to the difficulty of predicting access patterns these techniques are most often heuristic in nature and may degrade performance under unexpected workloads.
Preventing fragmentation
Preemptive techniques attempt to keep fragmentation to a minimum at the time data is being written on the disk. The simplest is appending data to an existing fragment in place where possible, instead of allocating new blocks to a new fragment.
Many of today's file systems attempt to pre-allocate longer chunks, or chunks from different free space fragments, called extents to files that are actively appended to. This largely avoids file fragmentation when several files are concurrently being appended to, thus avoiding their becoming excessively intertwined.[9]
If the final size of a file subject to modification is known, storage for the entire file may be preallocated. For example, the
A relatively recent technique is
Defragmentation
Retroactive techniques attempt to reduce fragmentation, or the negative effects of fragmentation, after it has occurred. Many file systems provide
The
The now obsolete Commodore Amiga Smart File System (SFS) defragmented itself while the filesystem was in use. The defragmentation process is almost completely stateless (apart from the location it is working on), so that it can be stopped and started instantly. During defragmentation data integrity is ensured for both metadata and normal data.
See also
- List of defragmentation software
- FAT file fragmentation
- Disk compression
Notes
- ^ Some file systems, such as NTFS and ext2+, might preallocate empty contiguous regions for special purposes.
- ^ The practice of leaving the space occupied by deleted files largely undisturbed is why undelete programs were able to work; they simply recovered the file whose name had been deleted from the directory, but whose contents were still on disk.
- ^ NTFS reserves 12.5% of the volume for the 'MFT zone', but only until that space is needed by other files. (i.e., if the volume ~ever~ becomes more than 87.5% full, an un-fragmented MFT can no longer be guaranteed.)[5]
References
- ^ Fisher, Ryan (2022-02-11). "Should I defrag my SSD?". PC Gamer. Archived from the original on 2022-02-18. Retrieved 2022-04-26.
- ^ http://www.8bs.com/hints/083.txt - Description of the can't extend error
- ^ http://8bs.com/mag/1to4/basegd1.txt - Possible data loss caused by the can't extend error
- .
- ^ a b "How NTFS reserves space for its Master File Table (MFT)". learn.microsoft.com. Microsoft. Retrieved 22 October 2022.
- ^ "DiskWarrior in Depth". Alsoft. Retrieved 22 October 2022.
- ^ "Maintaining Windows 2000 Peak Performance Through Defragmentation". learn.microsoft.com. Microsoft. Retrieved 22 October 2022.
- ^ Kryder, Mark H. (2006-04-03). Future Storage Technologies: A Look Beyond the Horizon (PDF). Storage Networking World conference. Seagate Technology. Archived from the original (PDF) on 17 July 2006.
- ^ Sun Microsystems, Inc.pp. 33–43. Retrieved 2006-12-14.
- ^ a b Hanselman, Scott (3 December 2014). "The real and complete story - Does Windows defragment your SSD?". Scott Hanselman's blog.
- ^ Smith, Keith Arnold (January 2001). "Workload-Specific File System Benchmarks" (PDF). Cambridge, Massachusetts: Harvard University. Archived from the original (PDF) on 2004-11-17. Retrieved 2006-12-14.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Layton, Jeffrey (29 March 2009). "From ext3 to ext4: An Interview with Theodore Ts'o". Linux Magazine. QuinStreet. Archived from the original on April 1, 2009.
{{cite journal}}
: CS1 maint: unfit URL (link) - ^ Singh, Amit (May 2004). "Fragmentation in HFS Plus Volumes". Mac OS X Internals. Archived from the original on 2012-11-18. Retrieved 2009-10-27.
- San Diego, California: Silicon Graphics. Retrieved 2006-12-14.
- ^ Reiser, Hans (2006-02-06). "The Reiser4 Filesystem". Google TechTalks. Archived from the original on 19 May 2011. Retrieved 2006-12-14.
- ISBN 0321278542.
Further reading
- Smith, Keith; Seltzer, Margo. File Layout and File System Performance (PDF) (Paper). Harvard University.