diff options
Diffstat (limited to 'en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml')
-rw-r--r-- | en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml | 284 |
1 files changed, 284 insertions, 0 deletions
diff --git a/en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml b/en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml new file mode 100644 index 0000000000..a7dd93afa0 --- /dev/null +++ b/en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml @@ -0,0 +1,284 @@ +<!DOCTYPE HTML PUBLIC "-//FreeBSD//DTD HTML 4.01 Transitional-Based Extension//EN" [ +<!ENTITY base CDATA "../.."> +<!ENTITY date "$FreeBSD: www/en/projects/bigdisk/index.sgml,v 1.14 2010/05/23 07:16:08 remko Exp $"> +<!ENTITY title "Large data storage in FreeBSD"> +<!ENTITY % navinclude.developers "INCLUDE"> + +<!-- Status levels --> +<!ENTITY status.na "<font color=green>N/A</font>"> +<!ENTITY status.done "<font color=green>Done</font>"> +<!ENTITY status.wip "<font color=blue>In progress</font>"> +<!ENTITY status.untested "<font color=orange>Needs testing</font>"> +<!ENTITY status.new "<font color=red>Not done</font>"> +<!ENTITY status.unknown "<font color=red>Unknown</font>"> + +<!-- The list of contributors was moved to a separate file so that it can + be used by other documents in the FreeBSD web site. --> + +<!ENTITY % developers SYSTEM "../../developers.sgml"> %developers; + +]> + +<html> + &header; + + <h2>Contents</h2> + <ul> + <li><a href="#background">Purpose and background</a></li> + <li><a href="#testing">Testing large capacities</a></li> + <li><a href="#userland">Userland Tools Status</a></li> + <li><a href="#ifnet-status">Kernel Driver Status</a></li> + </ul> + + <a name="background"></a> + <h2>Purpose and background</h2> + <h3>The UFS filesystem</h3> + <p>When the UFS filesystem was introduced to BSD in 1982, its use of 32 + bit offsets and counters to address the storage was considered to be + ahead of its time. Since most fixed-disk storage devices use 512 byte + sectors, 32 bits allowed for 2 Terabytes of storage. That was an almost + un-imaginable quantity for the time. But now that 250 and 400 Gigabyte + disks are available at consumer prices, it's trivial to build a hardware + or software based storage array that can exceed 2TB for a few thousand + dollars.</p> + + <p>The UFS2 filesystem was introduced in 2003 as a replacement to the + original UFS and provides 64 bit counters and offsets. This allows for + files and filesystems to grow to 2^73 bytes (2^64 * 512) in size and + hopefully be sufficient for quite a long time. UFS2 largely solved + the storage size limits imposed by the filesystem. Unfortunately, many + tools and storage mechanisms still use or assume 32 bit values, often + keeping FreeBSD limited to 2TB.</p> + + <p>We need to ensure that FreeBSD supports large storage sizes and that + the benefits of UFS2 can actually be realized so that FreeBSD can remain + relevant in the enterprise world. This page describes known issues and + limits and provides a focus for further auditing, validation, and + fixing.</p> + + <h3>Limits on disk partitioning</h3> + <p>The first limit that is encountered is in disk partitioning. For x86 + and amd64 PC's, the FDISK MBR table is used by the BIOS to partition the + disk into logical extents and identify which partition ('slice' in FreeBSD + terms) to boot from. The MBR is defined to use 32 bit disk offsets, + and since it's an industry standard and interoperability is required, + there is nothing that can be done to change this. As long as booting a + PC requires the MBR, the boot slice in FreeBSD is going to be limited to + 2TB.</p> + + <p>The GPT partitioning scheme was introduced with the ia64 architecture + as an MBR replacement. It provides 64 bit offsets and allows for an + arbitrary number of partitions. It also provides a compatibility mode + with MBR where it can generate an MBR-compatible structure on the disk + for use with systems that don't understand GPT. However, to get the + full benefits for boot storage, the BIOS and the FreeBSD loader must + understand it. For secondary storage, GPT can be used by any + architecture regardless of BIOS or boot support.</p> + + <p>Many systems don't require an MBR or GPT, and even PCs don't require it + if booting and inter-operating with other OS's is not required. The next + limit that comes in, though, is with the BSD disklabel. This label + defines up to 8 partitions on a disk, MBR slice, or other storage extent + for filesystems and swap space. Unfortunately, the on-disk format of the + disk label again uses 32 bit quantities, so it is also limited to 2TB. + Fixing this would require creating a new format that is incompatible + with the old and would require an update to the FreeBSD boot loader. + This would complicate interoperability and the upgrade path. Also, if a + new format is going to be created, it should also address the 8 partition + limit that exists now. Given these requirements, it's tempting to just + adopt the GPT format instead for secondary storage partitioning.</p> + + <a name="testing"></a> + <h2>Testing large capacities</h2> + <p>Even though large drives are cheap, it still isn't always feasible or + economical to test on real hardware. Swap-backed memory disks, via the + md(4) driver, can provide a good substitute for some of the testing. + Backing with swap means that only the pages that are dirtied by data + are actually allocated, so a multi-terabyte storage can be simulated + with a minimal amount of physical RAM+swap. Note that this is less true with + UFS1 since it will initialize all of the inode blocks during newfs, + which will dirty quite a bit of data. But for UFS2, swap-backed md + has the potential for working well. Unfortunately, the kernel md driver + has a number of 32-bit size limits of its own that need to be fixed. + Details are provided below.</p> + + <p>It is still possible to avoid disklabels and MBRs for testing by + using newfs directly on the raw disk or md disk. Sysinstall can be + tested from a running system by just selecting Expert mode and just + performing the MBR and disklabel steps. Beware that sysinstall might + have other bugs that will wipe out your existing system, so care must + be taken here!</p> + + <a name="userland"></a> + <h2>Userland Tool Status</h2> + + <p>The following userland tools need auditing and testing for 64-bit + cleanliness:</p> + + <table class="tblbasic"> + <tr> + <th> Task </th> + <th> Responsible </th> + <th> Last updated </th> + <th> Status </th> + <th> Details </th> + </tr> + + <tr> + <td>newfs</td> + <td>&a.pjd;</td> + <td>19 Sept 2004</td> + <td>&status.done;</td> + <td>Handling of '-s' option was fixed. Newfs should be now fully usable + for large file systems.</td> + </tr> + + <tr> + <td>df</td> + <td> </td> + <td> </td> + <td>&status.new;</td> + <td>An audit is needed to make sure that all reported fields are + 64-bit clean. There are reports with certain fields being incorrect + or negative with NFS volumes, which could either be an NFS or df + problem.</td> + </tr> + + <tr> + <td>du</td> + <td>&a.pjd;</td> + <td>7 Jan 2005</td> + <td>&status.done;</td> + <td>Big files/directories handling was broken. It was fixed and du + should be now fully usable on large file systems with large + files/directories.</td> + </tr> + + <tr> + <td>growfs</td> + <td> </td> + <td>12 Sept 2004</td> + <td>&status.wip;</td> + <td>Growfs has problems with expanding to new cylinder groups. It also + initializes UFS2 inode blocks instead of leaving them for lazy + initialization. It also needs a 64-bit audit.</td> + </tr> + + <tr> + <td>sysinstall</td> + <td> </td> + <td> </td> + <td>&status.new;</td> + <td>A full audit is needed. Reports exist of problems with >1TB + partitions.</td> + </tr> + + <tr> + <td>fsck_ffs</td> + <td>&a.pb;</td> + <td>15 Jan 2005</td> + <td>&status.wip;</td> + <td>A full audit is needed. At least some printf format changes are + necessary.</td> + </tr> + + <tr> + <td>dump/restore</td> + <td> </td> + <td> </td> + <td>&status.new;</td> + <td>A full audit is needed. At least some printf format changes are + necessary in dump(8).</td> + </tr> + + <tr> + <td>fsdb</td> + <td> </td> + <td> </td> + <td>&status.new;</td> + <td>A full audit is needed. At least some printf format changes are + necessary.</td> + </tr> + + <tr> + <td>quota tools</td> + <td>&a.des; & &a.mckusick;</td> + <td> </td> + <td>&status.done;</td> + <td>Extensive changes are need. Disk quotas are currently + handled as 32-bit quantities, which limits the maximum + possible quota at 2TB. Two tasks are needed: 1) have the + current tools (kernel+userland, edquota for example) fail + gracefully when presented with 64-bit quantities and 2) + extend the quota file format and tools to 64-bit while + providing a compatibility mode and/or migration tools.</td> + </tr> + + </table> + + <a name="Kernel"></a> + <h2>Kernel Driver Status</h2> + + <p>Many storage peripherals simply are not designed to handle >2TB + capacities. For those that are, an audit should be done to verify + that their drivers handle the sizes correctly and pass those sizes + correctly to the rest of the kernel.</p> + + <table class="tblbasic"> + <tr> + <th> Task </th> + <th> Responsible </th> + <th> Last updated </th> + <th> Status </th> + <th> Details </th> + </tr> + + <tr> + <td>md</td> + <td>&a.pjd;</td> + <td>17 Sept 2004</td> + <td>&status.done;</td> + <td>Swap backed disks can now be created up to 16TB in size on i386. + This corresponds to 2^32*4096.</td> + </tr> + </table> + + <h2>Subsystem Status</h2> + <p>Some filesystem-related subsystems require testing with >2TB volumes, or + need to be adapted. The following areas have been identified:</p> + + <table class="tblbasic"> + <tr> + <th> Task </th> + <th> Responsible </th> + <th> Last updated </th> + <th> Status </th> + <th> Details </th> + </tr> + + <tr> + <td>snapshots</td> + <td>&a.pb;</td> + <td>15 Jan 2004</td> + <td>&status.wip;</td> + <td>Taking snapshots fails on filesystems >2TB, returning EFBIG + (on a 5TB filesystem) and subsequently crashing the system in + softupdates.</td> + </tr> + <tr> + <td>quotas</td> + <td>&a.des; & &a.mckusick;</td> + <td> </td> + <td>&status.done;</td> + <td>The quota subsystem handles 32-bit quantities, which + limits quotas to 2TB. Blockings of the syncer have been + observed while attempting to set quotas over that limit + (try 4000000000 KBytes as a hard limit in edquota(8) for + some uid, then create somes files owned by that uid). See + also the userland entry for quota tools.</td> + </tr> + </table> + + &footer; + </body> +</html> |