diff options
authorKirk McKusick <mckusick@FreeBSD.org>2021-05-17 00:02:42 +0000
committerKirk McKusick <mckusick@FreeBSD.org>2021-05-17 00:04:11 +0000
commit9a2fac6ba65fbd14d37ccedbc2aec27a190128ea (patch)
parent46269d66ed02598c86756d81a7d7b27ec0f0c5cd (diff)
Fix handling of embedded symbolic links (and history lesson).
The original filesystem release (4.2BSD) had no embedded sysmlinks. Historically symbolic links were just a different type of file, so the content of the symbolic link was contained in a single disk block fragment. We observed that most symbolic links were short enough that they could fit in the area of the inode that normally holds the block pointers. So we created embedded symlinks where the content of the link was held in the inode's pointer area thus avoiding the need to seek and read a data fragment and reducing the pressure on the block cache. At the time we had only UFS1 with 32-bit block pointers, so the test for a fastlink was: di_size < (NDADDR + NIADDR) * sizeof(daddr_t) (where daddr_t would be ufs1_daddr_t today). When embedded symlinks were added, a spare field in the superblock with a known zero value became fs_maxsymlinklen. New filesystems set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus filesystems that preceeded this change always read from blocks (since fs->fs_maxsymlinklen == 0) and newer ones used embedded symlinks if they fit. Similarly symlinks created on pre-embedded symlink filesystems always spill into blocks while newer ones will embed if they fit. At the same time that the embedded symbolic links were added, the on-disk directory structure was changed splitting the former u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen. Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can be used to distinguish old directory formats. In retrospect that should have just been an added flag, but we did not realize we needed to know about that change until it was already in production. Code was split into ufs/ffs so that the log structured filesystem could use ufs functionality while doing its own disk layout. This meant that no ffs superblock fields could be used in the ufs code. Thus ffs superblock fields that were needed in ufs code had to be copied to fields in the mount structure. Since ufs_readlink needed to know if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen. The kernel panic that arose to making this fix was triggered when a disk error created an inode of type symlink with no allocated data blocks but a large size. When readlink was called the uiomove was attempted which segment faulted. static int ufs_readlink(ap) struct vop_readlink_args /* { struct vnode *a_vp; struct uio *a_uio; struct ucred *a_cred; } */ *ap; { struct vnode *vp = ap->a_vp; struct inode *ip = VTOI(vp); doff_t isize; isize = ip->i_size; if ((isize < vp->v_mount->mnt_maxsymlinklen) || DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */ return (uiomove(SHORTLINK(ip), isize, ap->a_uio)); } return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred)); } The second part of the "if" statement that adds DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */ is problematic. It never appeared in BSD released by Berkeley because as noted above mnt_maxsymlinklen is 0 for old format filesystems, so will always fall through to the VOP_READ as it should. I had to dig back through `git blame' to find that Rodney Grimes added it as part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.'' He must have brought it across from an earlier FreeBSD. Unfortunately the source-control logs for FreeBSD up to the merger with the AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the agreement to let FreeBSD remain unencumbered, so I cannot pin-point where that line got added on the FreeBSD side. The one change needed here is that mnt_maxsymlinklen is declared as an `int' and should be changed to be `u_int64_t'. This discovery led us to check out the code that deletes symbolic links. Specifically if (vp->v_type == VLNK && (ip->i_size < vp->v_mount->mnt_maxsymlinklen || datablocks == 0)) { if (length != 0) panic("ffs_truncate: partial truncate of symlink"); bzero(SHORTLINK(ip), (u_int)ip->i_size); ip->i_size = 0; DIP_SET(ip, i_size, 0); UFS_INODE_SET_FLAG(ip, IN_SIZEMOD | IN_CHANGE | IN_UPDATE); if (needextclean) goto extclean; return (ffs_update(vp, waitforupdate)); } Here too our broken symlink inode with no data blocks allocated and a large size will segment fault as we are incorrectly using the test that we have no data blocks to decide that it is an embdedded symbolic link and attempting to bzero past the end of the inode. The test for datablocks == 0 is unnecessary as the test for ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right thing in all cases. The test for datablocks == 0 was added by David Greenman in this commit: Author: David Greenman <dg@FreeBSD.org> Date: Tue Aug 2 13:51:05 1994 +0000 Completed (hopefully) the kernel support for old style "fastlinks". Notes: svn path=/head/; revision=1821 I am guessing that he likely earlier added the incorrect test in the ufs_readlink code. I asked David if he had any recollection of why he made this change. Amazingly, he still had a recollection of why he had made a one-line change more than twenty years ago. And unsurpisingly it was because he had been stuck between a rock and a hard place. FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code base. Prior to that, there were three years of development in all areas of the kernel, including the filesystem code, from the combined set of people including Bill Jolitz, Patchkit contributors, and FreeBSD Project members. The compatibility issue at hand was caused by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite changes David had to find a way to provide compatibility with both the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite. He felt that these changes would provide compatibility with both systems. In his words: ``My recollection is that the 'FASTLINKS' symlinks support in FreeBSD-1.x, as implemented by Curt Mayer, worked differently than 4.4BSD. He used a spare field in the inode to duplicately store the length. When the 4.4BSD-Lite merge was done, the optimized symlinks support for existing filesystems (those that were initialized in FreeBSD-1.x) were broken due to the FFS on-disk structure of 4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to restore the backward compatibility with FreeBSD-1.x filesystems. I think it was the best that could be done in the somewhat urgent circumstances of the post Berkeley-USL settlement. Also, regarding Rod's massive commit with little explanation, some context: John Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to the 386 platform in just 10 days. It was by far the most intense hacking effort of my life. In addition to the porting of tons of FreeBSD-1 code, I think we wrote more than 30,000 lines of new code in that time to deal with the missing pieces and architectural changes of 4.4BSD-Lite. We didn't make many notes along the way. There was a lot of pressure to get something out to the rest of the developer community as fast as possible, so detailed discrete commits didn't happen - it all came as a giant wad, which is why Rod's commit message was worded the way it was.'' Reported by: Chuck Silvers Tested by: Chuck Silvers History by: David Greenman Lawrence MFC after: 1 week Sponsored by: Netflix
4 files changed, 5 insertions, 8 deletions
diff --git a/sys/kern/vfs_subr.c b/sys/kern/vfs_subr.c
index 2be3a81ad799..468bd21dae21 100644
--- a/sys/kern/vfs_subr.c
+++ b/sys/kern/vfs_subr.c
@@ -4469,7 +4469,8 @@ DB_SHOW_COMMAND(mount, db_show_mount)
db_printf(" mnt_writeopcount = %d (with %d in the struct)\n",
vfs_mount_fetch_counter(mp, MNT_COUNT_WRITEOPCOUNT), mp->mnt_writeopcount);
- db_printf(" mnt_maxsymlinklen = %d\n", mp->mnt_maxsymlinklen);
+ db_printf(" mnt_maxsymlinklen = %jd\n",
+ (uintmax_t)mp->mnt_maxsymlinklen);
db_printf(" mnt_iosize_max = %d\n", mp->mnt_iosize_max);
db_printf(" mnt_hashseed = %u\n", mp->mnt_hashseed);
db_printf(" mnt_lockref = %d (with %d in the struct)\n",
diff --git a/sys/sys/mount.h b/sys/sys/mount.h
index a6d750a1ff37..f341370ecd86 100644
--- a/sys/sys/mount.h
+++ b/sys/sys/mount.h
@@ -224,7 +224,7 @@ struct mount {
int mnt_writeopcount; /* (i) write syscalls pending */
struct vfsoptlist *mnt_opt; /* current mount options */
struct vfsoptlist *mnt_optnew; /* new options passed to fs */
- int mnt_maxsymlinklen; /* max size of short symlink */
+ uint64_t mnt_maxsymlinklen; /* max size of short symlink */
struct statfs mnt_stat; /* cache of filesystem stats */
struct ucred *mnt_cred; /* credentials of mounter */
void * mnt_data; /* private data */
diff --git a/sys/ufs/ffs/ffs_inode.c b/sys/ufs/ffs/ffs_inode.c
index 5598c7ba30e8..8fe7f7ab97de 100644
--- a/sys/ufs/ffs/ffs_inode.c
+++ b/sys/ufs/ffs/ffs_inode.c
@@ -329,9 +329,7 @@ ffs_truncate(vp, length, flags, cred)
if ((flags & IO_NORMAL) == 0)
return (0);
- if (vp->v_type == VLNK &&
- (ip->i_size < vp->v_mount->mnt_maxsymlinklen ||
- datablocks == 0)) {
+ if (vp->v_type == VLNK && ip->i_size < vp->v_mount->mnt_maxsymlinklen) {
if (length != 0)
panic("ffs_truncate: partial truncate of symlink");
diff --git a/sys/ufs/ufs/ufs_vnops.c b/sys/ufs/ufs/ufs_vnops.c
index 301c583291d1..70bf1a1d9036 100644
--- a/sys/ufs/ufs/ufs_vnops.c
+++ b/sys/ufs/ufs/ufs_vnops.c
@@ -2458,10 +2458,8 @@ ufs_readlink(ap)
doff_t isize;
isize = ip->i_size;
- if ((isize < vp->v_mount->mnt_maxsymlinklen) ||
- DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
+ if (isize < vp->v_mount->mnt_maxsymlinklen)
return (uiomove(SHORTLINK(ip), isize, ap->a_uio));
- }
return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred));