aboutsummaryrefslogtreecommitdiff
path: root/en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml
diff options
context:
space:
mode:
Diffstat (limited to 'en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml')
-rw-r--r--en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml284
1 files changed, 284 insertions, 0 deletions
diff --git a/en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml b/en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml
new file mode 100644
index 0000000000..a7dd93afa0
--- /dev/null
+++ b/en_US.ISO8859-1/htdocs/projects/bigdisk/index.sgml
@@ -0,0 +1,284 @@
+<!DOCTYPE HTML PUBLIC "-//FreeBSD//DTD HTML 4.01 Transitional-Based Extension//EN" [
+<!ENTITY base CDATA "../..">
+<!ENTITY date "$FreeBSD: www/en/projects/bigdisk/index.sgml,v 1.14 2010/05/23 07:16:08 remko Exp $">
+<!ENTITY title "Large data storage in FreeBSD">
+<!ENTITY % navinclude.developers "INCLUDE">
+
+<!-- Status levels -->
+<!ENTITY status.na "<font color=green>N/A</font>">
+<!ENTITY status.done "<font color=green>Done</font>">
+<!ENTITY status.wip "<font color=blue>In progress</font>">
+<!ENTITY status.untested "<font color=orange>Needs testing</font>">
+<!ENTITY status.new "<font color=red>Not done</font>">
+<!ENTITY status.unknown "<font color=red>Unknown</font>">
+
+<!-- The list of contributors was moved to a separate file so that it can
+ be used by other documents in the FreeBSD web site. -->
+
+<!ENTITY % developers SYSTEM "../../developers.sgml"> %developers;
+
+]>
+
+<html>
+ &header;
+
+ <h2>Contents</h2>
+ <ul>
+ <li><a href="#background">Purpose and background</a></li>
+ <li><a href="#testing">Testing large capacities</a></li>
+ <li><a href="#userland">Userland Tools Status</a></li>
+ <li><a href="#ifnet-status">Kernel Driver Status</a></li>
+ </ul>
+
+ <a name="background"></a>
+ <h2>Purpose and background</h2>
+ <h3>The UFS filesystem</h3>
+ <p>When the UFS filesystem was introduced to BSD in 1982, its use of 32
+ bit offsets and counters to address the storage was considered to be
+ ahead of its time. Since most fixed-disk storage devices use 512 byte
+ sectors, 32 bits allowed for 2 Terabytes of storage. That was an almost
+ un-imaginable quantity for the time. But now that 250 and 400 Gigabyte
+ disks are available at consumer prices, it's trivial to build a hardware
+ or software based storage array that can exceed 2TB for a few thousand
+ dollars.</p>
+
+ <p>The UFS2 filesystem was introduced in 2003 as a replacement to the
+ original UFS and provides 64 bit counters and offsets. This allows for
+ files and filesystems to grow to 2^73 bytes (2^64 * 512) in size and
+ hopefully be sufficient for quite a long time. UFS2 largely solved
+ the storage size limits imposed by the filesystem. Unfortunately, many
+ tools and storage mechanisms still use or assume 32 bit values, often
+ keeping FreeBSD limited to 2TB.</p>
+
+ <p>We need to ensure that FreeBSD supports large storage sizes and that
+ the benefits of UFS2 can actually be realized so that FreeBSD can remain
+ relevant in the enterprise world. This page describes known issues and
+ limits and provides a focus for further auditing, validation, and
+ fixing.</p>
+
+ <h3>Limits on disk partitioning</h3>
+ <p>The first limit that is encountered is in disk partitioning. For x86
+ and amd64 PC's, the FDISK MBR table is used by the BIOS to partition the
+ disk into logical extents and identify which partition ('slice' in FreeBSD
+ terms) to boot from. The MBR is defined to use 32 bit disk offsets,
+ and since it's an industry standard and interoperability is required,
+ there is nothing that can be done to change this. As long as booting a
+ PC requires the MBR, the boot slice in FreeBSD is going to be limited to
+ 2TB.</p>
+
+ <p>The GPT partitioning scheme was introduced with the ia64 architecture
+ as an MBR replacement. It provides 64 bit offsets and allows for an
+ arbitrary number of partitions. It also provides a compatibility mode
+ with MBR where it can generate an MBR-compatible structure on the disk
+ for use with systems that don't understand GPT. However, to get the
+ full benefits for boot storage, the BIOS and the FreeBSD loader must
+ understand it. For secondary storage, GPT can be used by any
+ architecture regardless of BIOS or boot support.</p>
+
+ <p>Many systems don't require an MBR or GPT, and even PCs don't require it
+ if booting and inter-operating with other OS's is not required. The next
+ limit that comes in, though, is with the BSD disklabel. This label
+ defines up to 8 partitions on a disk, MBR slice, or other storage extent
+ for filesystems and swap space. Unfortunately, the on-disk format of the
+ disk label again uses 32 bit quantities, so it is also limited to 2TB.
+ Fixing this would require creating a new format that is incompatible
+ with the old and would require an update to the FreeBSD boot loader.
+ This would complicate interoperability and the upgrade path. Also, if a
+ new format is going to be created, it should also address the 8 partition
+ limit that exists now. Given these requirements, it's tempting to just
+ adopt the GPT format instead for secondary storage partitioning.</p>
+
+ <a name="testing"></a>
+ <h2>Testing large capacities</h2>
+ <p>Even though large drives are cheap, it still isn't always feasible or
+ economical to test on real hardware. Swap-backed memory disks, via the
+ md(4) driver, can provide a good substitute for some of the testing.
+ Backing with swap means that only the pages that are dirtied by data
+ are actually allocated, so a multi-terabyte storage can be simulated
+ with a minimal amount of physical RAM+swap. Note that this is less true with
+ UFS1 since it will initialize all of the inode blocks during newfs,
+ which will dirty quite a bit of data. But for UFS2, swap-backed md
+ has the potential for working well. Unfortunately, the kernel md driver
+ has a number of 32-bit size limits of its own that need to be fixed.
+ Details are provided below.</p>
+
+ <p>It is still possible to avoid disklabels and MBRs for testing by
+ using newfs directly on the raw disk or md disk. Sysinstall can be
+ tested from a running system by just selecting Expert mode and just
+ performing the MBR and disklabel steps. Beware that sysinstall might
+ have other bugs that will wipe out your existing system, so care must
+ be taken here!</p>
+
+ <a name="userland"></a>
+ <h2>Userland Tool Status</h2>
+
+ <p>The following userland tools need auditing and testing for 64-bit
+ cleanliness:</p>
+
+ <table class="tblbasic">
+ <tr>
+ <th> Task </th>
+ <th> Responsible </th>
+ <th> Last updated </th>
+ <th> Status </th>
+ <th> Details </th>
+ </tr>
+
+ <tr>
+ <td>newfs</td>
+ <td>&a.pjd;</td>
+ <td>19 Sept 2004</td>
+ <td>&status.done;</td>
+ <td>Handling of '-s' option was fixed. Newfs should be now fully usable
+ for large file systems.</td>
+ </tr>
+
+ <tr>
+ <td>df</td>
+ <td>&nbsp;</td>
+ <td>&nbsp;</td>
+ <td>&status.new;</td>
+ <td>An audit is needed to make sure that all reported fields are
+ 64-bit clean. There are reports with certain fields being incorrect
+ or negative with NFS volumes, which could either be an NFS or df
+ problem.</td>
+ </tr>
+
+ <tr>
+ <td>du</td>
+ <td>&a.pjd;</td>
+ <td>7 Jan 2005</td>
+ <td>&status.done;</td>
+ <td>Big files/directories handling was broken. It was fixed and du
+ should be now fully usable on large file systems with large
+ files/directories.</td>
+ </tr>
+
+ <tr>
+ <td>growfs</td>
+ <td>&nbsp;</td>
+ <td>12 Sept 2004</td>
+ <td>&status.wip;</td>
+ <td>Growfs has problems with expanding to new cylinder groups. It also
+ initializes UFS2 inode blocks instead of leaving them for lazy
+ initialization. It also needs a 64-bit audit.</td>
+ </tr>
+
+ <tr>
+ <td>sysinstall</td>
+ <td>&nbsp;</td>
+ <td>&nbsp;</td>
+ <td>&status.new;</td>
+ <td>A full audit is needed. Reports exist of problems with >1TB
+ partitions.</td>
+ </tr>
+
+ <tr>
+ <td>fsck_ffs</td>
+ <td>&a.pb;</td>
+ <td>15 Jan 2005</td>
+ <td>&status.wip;</td>
+ <td>A full audit is needed. At least some printf format changes are
+ necessary.</td>
+ </tr>
+
+ <tr>
+ <td>dump/restore</td>
+ <td>&nbsp;</td>
+ <td>&nbsp;</td>
+ <td>&status.new;</td>
+ <td>A full audit is needed. At least some printf format changes are
+ necessary in dump(8).</td>
+ </tr>
+
+ <tr>
+ <td>fsdb</td>
+ <td>&nbsp;</td>
+ <td>&nbsp;</td>
+ <td>&status.new;</td>
+ <td>A full audit is needed. At least some printf format changes are
+ necessary.</td>
+ </tr>
+
+ <tr>
+ <td>quota tools</td>
+ <td>&a.des; &amp; &a.mckusick;</td>
+ <td>&nbsp;</td>
+ <td>&status.done;</td>
+ <td>Extensive changes are need. Disk quotas are currently
+ handled as 32-bit quantities, which limits the maximum
+ possible quota at 2TB. Two tasks are needed: 1) have the
+ current tools (kernel+userland, edquota for example) fail
+ gracefully when presented with 64-bit quantities and 2)
+ extend the quota file format and tools to 64-bit while
+ providing a compatibility mode and/or migration tools.</td>
+ </tr>
+
+ </table>
+
+ <a name="Kernel"></a>
+ <h2>Kernel Driver Status</h2>
+
+ <p>Many storage peripherals simply are not designed to handle >2TB
+ capacities. For those that are, an audit should be done to verify
+ that their drivers handle the sizes correctly and pass those sizes
+ correctly to the rest of the kernel.</p>
+
+ <table class="tblbasic">
+ <tr>
+ <th> Task </th>
+ <th> Responsible </th>
+ <th> Last updated </th>
+ <th> Status </th>
+ <th> Details </th>
+ </tr>
+
+ <tr>
+ <td>md</td>
+ <td>&a.pjd;</td>
+ <td>17 Sept 2004</td>
+ <td>&status.done;</td>
+ <td>Swap backed disks can now be created up to 16TB in size on i386.
+ This corresponds to 2^32*4096.</td>
+ </tr>
+ </table>
+
+ <h2>Subsystem Status</h2>
+ <p>Some filesystem-related subsystems require testing with >2TB volumes, or
+ need to be adapted. The following areas have been identified:</p>
+
+ <table class="tblbasic">
+ <tr>
+ <th> Task </th>
+ <th> Responsible </th>
+ <th> Last updated </th>
+ <th> Status </th>
+ <th> Details </th>
+ </tr>
+
+ <tr>
+ <td>snapshots</td>
+ <td>&a.pb;</td>
+ <td>15 Jan 2004</td>
+ <td>&status.wip;</td>
+ <td>Taking snapshots fails on filesystems >2TB, returning EFBIG
+ (on a 5TB filesystem) and subsequently crashing the system in
+ softupdates.</td>
+ </tr>
+ <tr>
+ <td>quotas</td>
+ <td>&a.des; &amp; &a.mckusick;</td>
+ <td>&nbsp;</td>
+ <td>&status.done;</td>
+ <td>The quota subsystem handles 32-bit quantities, which
+ limits quotas to 2TB. Blockings of the syncer have been
+ observed while attempting to set quotas over that limit
+ (try 4000000000 KBytes as a hard limit in edquota(8) for
+ some uid, then create somes files owned by that uid). See
+ also the userland entry for quota tools.</td>
+ </tr>
+ </table>
+
+ &footer;
+ </body>
+</html>