aboutsummaryrefslogtreecommitdiff
path: root/lib/libsys/sendfile.2
diff options
context:
space:
mode:
Diffstat (limited to 'lib/libsys/sendfile.2')
-rw-r--r--lib/libsys/sendfile.2437
1 files changed, 437 insertions, 0 deletions
diff --git a/lib/libsys/sendfile.2 b/lib/libsys/sendfile.2
new file mode 100644
index 000000000000..6000e3e9828f
--- /dev/null
+++ b/lib/libsys/sendfile.2
@@ -0,0 +1,437 @@
+.\" Copyright (c) 2003, David G. Lawrence
+.\" All rights reserved.
+.\"
+.\" Redistribution and use in source and binary forms, with or without
+.\" modification, are permitted provided that the following conditions
+.\" are met:
+.\" 1. Redistributions of source code must retain the above copyright
+.\" notice unmodified, this list of conditions, and the following
+.\" disclaimer.
+.\" 2. Redistributions in binary form must reproduce the above copyright
+.\" notice, this list of conditions and the following disclaimer in the
+.\" documentation and/or other materials provided with the distribution.
+.\"
+.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
+.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+.\" SUCH DAMAGE.
+.\"
+.Dd June 24, 2025
+.Dt SENDFILE 2
+.Os
+.Sh NAME
+.Nm sendfile
+.Nd send a file to a socket
+.Sh LIBRARY
+.Lb libc
+.Sh SYNOPSIS
+.In sys/types.h
+.In sys/socket.h
+.In sys/uio.h
+.Ft int
+.Fo sendfile
+.Fa "int fd" "int s" "off_t offset" "size_t nbytes"
+.Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags"
+.Fc
+.Sh DESCRIPTION
+The
+.Fn sendfile
+system call
+sends a regular file or shared memory object specified by descriptor
+.Fa fd
+out a stream socket specified by descriptor
+.Fa s .
+.Pp
+The
+.Fa offset
+argument specifies where to begin in the file.
+Should
+.Fa offset
+fall beyond the end of file, the system will return
+success and report 0 bytes sent as described below.
+The
+.Fa nbytes
+argument specifies how many bytes of the file should be sent, with 0 having the special
+meaning of send until the end of file has been reached.
+.Pp
+An optional header and/or trailer can be sent before and after the file data by specifying
+a pointer to a
+.Vt "struct sf_hdtr" ,
+which has the following structure:
+.Pp
+.Bd -literal -offset indent -compact
+struct sf_hdtr {
+ struct iovec *headers; /* pointer to header iovecs */
+ int hdr_cnt; /* number of header iovecs */
+ struct iovec *trailers; /* pointer to trailer iovecs */
+ int trl_cnt; /* number of trailer iovecs */
+};
+.Ed
+.Pp
+The
+.Fa headers
+and
+.Fa trailers
+pointers, if
+.Pf non- Dv NULL ,
+point to arrays of
+.Vt "struct iovec"
+structures.
+See the
+.Fn writev
+system call for information on the iovec structure.
+The number of iovecs in these
+arrays is specified by
+.Fa hdr_cnt
+and
+.Fa trl_cnt .
+.Pp
+If
+.Pf non- Dv NULL ,
+the system will write the total number of bytes sent on the socket to the
+variable pointed to by
+.Fa sbytes .
+.Pp
+The least significant 16 bits of
+.Fa flags
+argument is a bitmap of these values:
+.Bl -tag -offset indent -width "SF_USER_READAHEAD"
+.It Dv SF_NODISKIO
+This flag causes
+.Nm
+to return
+.Er EBUSY
+instead of blocking when a busy page is encountered.
+This rare situation can happen if some other process is now working
+with the same region of the file.
+It is advised to retry the operation after a short period.
+.Pp
+Note that in older
+.Fx
+versions the
+.Dv SF_NODISKIO
+had slightly different notion.
+The flag prevented
+.Nm
+to run I/O operations in case if an invalid (not cached) page is encountered,
+thus avoiding blocking on I/O.
+Starting with
+.Fx 11
+.Nm
+sending files off the
+.Xr ffs 4
+filesystem does not block on I/O
+(see
+.Sx IMPLEMENTATION NOTES
+), so the condition no longer applies.
+However, it is safe if an application utilizes
+.Dv SF_NODISKIO
+and on
+.Er EBUSY
+performs the same action as it did in
+older
+.Fx
+versions, e.g.,
+.Xr aio_read 2 ,
+.Xr read 2
+or
+.Nm
+in a different context.
+.It Dv SF_NOCACHE
+The data sent to socket will not be cached by the virtual memory system,
+and will be freed directly to the pool of free pages.
+.It Dv SF_USER_READAHEAD
+.Nm
+has some internal heuristics to do readahead when sending data.
+This flag forces
+.Nm
+to override any heuristically calculated readahead and use exactly the
+application specified readahead.
+See
+.Sx SETTING READAHEAD
+for more details on readahead.
+.El
+.Pp
+When using a socket marked for non-blocking I/O,
+.Fn sendfile
+may send fewer bytes than requested.
+In this case, the number of bytes successfully
+written is returned in
+.Fa *sbytes
+(if specified),
+and the error
+.Er EAGAIN
+is returned.
+.Sh SETTING READAHEAD
+.Nm
+uses internal heuristics based on request size and file system layout
+to do readahead.
+Additionally application may request extra readahead.
+The most significant 16 bits of
+.Fa flags
+specify amount of pages that
+.Nm
+may read ahead when reading the file.
+A macro
+.Fn SF_FLAGS
+is provided to combine readahead amount and flags.
+An example showing specifying readahead of 16 pages and
+.Dv SF_NOCACHE
+flag:
+.Pp
+.Bd -literal -offset indent -compact
+ SF_FLAGS(16, SF_NOCACHE)
+.Ed
+.Pp
+.Nm
+will use either application specified readahead or internally calculated,
+whichever is bigger.
+Setting flag
+.Dv SF_USER_READAHEAD
+would turn off any heuristics and set maximum possible readahead length to
+the number of pages specified via flags.
+.Sh IMPLEMENTATION NOTES
+The
+.Fx
+implementation of
+.Fn sendfile
+does not block on disk I/O when it sends a file off the
+.Xr ffs 4
+filesystem.
+The syscall returns success before the actual I/O completes, and data
+is put into the socket later unattended.
+However, the order of data in the socket is preserved, so it is safe
+to do further writes to the socket.
+.Pp
+The
+.Fx
+implementation of
+.Fn sendfile
+is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided.
+.Sh TUNING
+.Ss physical paging buffers
+.Fn sendfile
+uses vnode pager to read file pages into memory.
+The pager uses a pool of physical buffers to run its I/O operations.
+When system runs out of pbufs, sendfile will block and report state
+.Dq Li zonelimit .
+Size of the pool can be tuned with
+.Va vm.vnode_pbufs
+.Xr loader.conf 5
+tunable and can be checked with
+.Xr sysctl 8
+OID of the same name at runtime.
+.Ss sendfile(2) buffers
+On some architectures, this system call internally uses a special
+.Fn sendfile
+buffer
+.Pq Vt "struct sf_buf"
+to handle sending file data to the client.
+If the sending socket is
+blocking, and there are not enough
+.Fn sendfile
+buffers available,
+.Fn sendfile
+will block and report a state of
+.Dq Li sfbufa .
+If the sending socket is non-blocking and there are not enough
+.Fn sendfile
+buffers available, the call will block and wait for the
+necessary buffers to become available before finishing the call.
+.Pp
+The number of
+.Vt sf_buf Ns 's
+allocated should be proportional to the number of nmbclusters used to
+send data to a client via
+.Fn sendfile .
+Tune accordingly to avoid blocking!
+Busy installations that make extensive use of
+.Fn sendfile
+may want to increase these values to be inline with their
+.Va kern.ipc.nmbclusters
+(see
+.Xr tuning 7
+for details).
+.Pp
+The number of
+.Fn sendfile
+buffers available is determined at boot time by either the
+.Va kern.ipc.nsfbufs
+.Xr loader.conf 5
+variable or the
+.Dv NSFBUFS
+kernel configuration tunable.
+The number of
+.Fn sendfile
+buffers scales with
+.Va kern.maxusers .
+The
+.Va kern.ipc.nsfbufsused
+and
+.Va kern.ipc.nsfbufspeak
+read-only
+.Xr sysctl 8
+variables show current and peak
+.Fn sendfile
+buffers usage respectively.
+These values may also be viewed through
+.Nm netstat Fl m .
+.Pp
+If
+.Xr sysctl 8
+OID
+.Va kern.ipc.nsfbufs
+doesn't exist, your architecture does not need to use
+.Fn sendfile
+buffers because their task can be efficiently performed
+by the generic virtual memory structures.
+.Sh RETURN VALUES
+.Rv -std sendfile
+.Sh ERRORS
+.Bl -tag -width Er
+.It Bq Er EAGAIN
+The socket is marked for non-blocking I/O and not all data was sent due to
+the socket buffer being filled.
+If specified, the number of bytes successfully sent will be returned in
+.Fa *sbytes .
+.It Bq Er EBADF
+The
+.Fa fd
+argument
+is not a valid file descriptor.
+.It Bq Er EBADF
+The
+.Fa s
+argument
+is not a valid socket descriptor.
+.It Bq Er EBUSY
+A busy page was encountered and
+.Dv SF_NODISKIO
+had been specified.
+Partial data may have been sent.
+.It Bq Er EFAULT
+An invalid address was specified for an argument.
+.It Bq Er EINTR
+A signal interrupted
+.Fn sendfile
+before it could be completed.
+If specified, the number
+of bytes successfully sent will be returned in
+.Fa *sbytes .
+.It Bq Er EINVAL
+The
+.Fa fd
+argument
+is not a regular file.
+.It Bq Er EINVAL
+The
+.Fa s
+argument
+is not a SOCK_STREAM type socket.
+.It Bq Er EINVAL
+The
+.Fa offset
+argument
+is negative.
+.It Bq Er EIO
+An error occurred while reading from
+.Fa fd .
+.It Bq Er EINTEGRITY
+Corrupted data was detected while reading from
+.Fa fd .
+.It Bq Er ENOTCAPABLE
+The
+.Fa fd
+or the
+.Fa s
+argument has insufficient rights.
+.It Bq Er ENOBUFS
+The system was unable to allocate an internal buffer.
+.It Bq Er ENOTCONN
+The
+.Fa s
+argument
+points to an unconnected socket.
+.It Bq Er ENOTSOCK
+The
+.Fa s
+argument
+is not a socket.
+.It Bq Er EOPNOTSUPP
+The file system for descriptor
+.Fa fd
+does not support
+.Fn sendfile .
+.It Bq Er EPIPE
+The socket peer has closed the connection.
+.El
+.Sh SEE ALSO
+.Xr netstat 1 ,
+.Xr open 2 ,
+.Xr send 2 ,
+.Xr socket 2 ,
+.Xr writev 2 ,
+.Xr loader.conf 5 ,
+.Xr tuning 7 ,
+.Xr sysctl 8
+.Rs
+.%A K. Elmeleegy
+.%A A. Chanda
+.%A A. L. Cox
+.%A W. Zwaenepoel
+.%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management
+.%J The Proceedings of the 2005 USENIX Annual Technical Conference
+.%P pp 223-236
+.%D 2005
+.Re
+.Sh HISTORY
+The
+.Fn sendfile
+system call
+first appeared in
+.Fx 3.0 .
+This manual page first appeared in
+.Fx 3.1 .
+In
+.Fx 10
+support for sending shared memory descriptors had been introduced.
+In
+.Fx 11
+a non-blocking implementation had been introduced.
+.Sh AUTHORS
+The initial implementation of
+.Fn sendfile
+system call
+and this manual page were written by
+.An David G. Lawrence Aq Mt dg@dglawrence.com .
+The
+.Fx 11
+implementation was written by
+.An Gleb Smirnoff Aq Mt glebius@FreeBSD.org .
+.Sh BUGS
+The
+.Fn sendfile
+system call will not fail, i.e., return
+.Dv -1
+and set
+.Va errno
+to
+.Er EFAULT ,
+if provided an invalid address for
+.Fa sbytes .
+The
+.Fn sendfile
+system call does not support SCTP sockets,
+it will return
+.Dv -1
+and set
+.Va errno
+to
+.Er EINVAL .