aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJohn Baldwin <jhb@FreeBSD.org>2019-08-27 00:01:56 +0000
committerJohn Baldwin <jhb@FreeBSD.org>2019-08-27 00:01:56 +0000
commitb2e60773c6b015f06fcd71510cd20a91eb43bcaa (patch)
tree7e88792669a12b900d38f75be531ec5f19459f8b
parenta70e17eecad468b652f0e201bf47c4dd65cddfc5 (diff)
downloadsrc-b2e60773c6b015f06fcd71510cd20a91eb43bcaa.tar.gz
src-b2e60773c6b015f06fcd71510cd20a91eb43bcaa.zip
Add kernel-side support for in-kernel TLS.
KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys. Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type. At present, rekeying is not supported though the in-kernel framework should support rekeying. KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session. KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS. Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue(). A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption. (Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.) KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators. Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs. ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.) ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled. Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes. In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS. Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS. KTLS is enabled via the KERN_TLS kernel option. This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload. Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277
Notes
Notes: svn path=/head/; revision=351522
-rw-r--r--sbin/ifconfig/ifconfig.824
-rw-r--r--sbin/ifconfig/ifconfig.c4
-rw-r--r--share/man/man4/tcp.447
-rw-r--r--sys/conf/NOTES4
-rw-r--r--sys/conf/files1
-rw-r--r--sys/conf/options1
-rw-r--r--sys/kern/kern_mbuf.c25
-rw-r--r--sys/kern/kern_sendfile.c101
-rw-r--r--sys/kern/uipc_ktls.c1450
-rw-r--r--sys/kern/uipc_sockbuf.c22
-rw-r--r--sys/kern/uipc_socket.c96
-rw-r--r--sys/modules/Makefile2
-rw-r--r--sys/modules/ktls_ocf/Makefile8
-rw-r--r--sys/net/ieee8023ad_lacp.c3
-rw-r--r--sys/net/ieee8023ad_lacp.h2
-rw-r--r--sys/net/if.h3
-rw-r--r--sys/net/if_lagg.c11
-rw-r--r--sys/net/if_var.h11
-rw-r--r--sys/net/if_vlan.c25
-rw-r--r--sys/netinet/ip_output.c36
-rw-r--r--sys/netinet/tcp.h12
-rw-r--r--sys/netinet/tcp_output.c58
-rw-r--r--sys/netinet/tcp_stacks/rack.c11
-rw-r--r--sys/netinet/tcp_subr.c118
-rw-r--r--sys/netinet/tcp_usrreq.c35
-rw-r--r--sys/netinet/tcp_var.h2
-rw-r--r--sys/netinet6/ip6_output.c36
-rw-r--r--sys/opencrypto/ktls_ocf.c308
-rw-r--r--sys/sys/ktls.h194
-rw-r--r--sys/sys/mbuf.h19
-rw-r--r--sys/sys/param.h2
-rw-r--r--sys/sys/sockbuf.h5
-rw-r--r--tools/tools/switch_tls/Makefile6
-rw-r--r--tools/tools/switch_tls/switch_tls.c381
34 files changed, 3015 insertions, 48 deletions
diff --git a/sbin/ifconfig/ifconfig.8 b/sbin/ifconfig/ifconfig.8
index b34af49f72fc..3b6d29657e80 100644
--- a/sbin/ifconfig/ifconfig.8
+++ b/sbin/ifconfig/ifconfig.8
@@ -28,7 +28,7 @@
.\" From: @(#)ifconfig.8 8.3 (Berkeley) 1/5/94
.\" $FreeBSD$
.\"
-.Dd August 15, 2019
+.Dd August 26, 2019
.Dt IFCONFIG 8
.Os
.Sh NAME
@@ -538,6 +538,28 @@ large receive offloading, enable LRO on the interface.
If the driver supports
.Xr tcp 4
large receive offloading, disable LRO on the interface.
+.It Cm txtls
+Transmit TLS offload encrypts Transport Layer Security (TLS) records and
+segments the encrypted record into one or more
+.Xr tcp 4
+segments over either
+.Xr ip 4
+or
+.Xr ip6 4 .
+If the driver supports transmit TLS offload,
+enable transmit TLS offload on the interface.
+Some drivers may not be able to support transmit TLS offload for
+.Xr ip 4
+and
+.Xr ip6 4
+packets, so they may enable only one of them.
+.It Fl txtls
+If the driver supports transmit TLS offload,
+disable transmit TLS offload on the interface.
+It will always disable TLS for
+.Xr ip 4
+and
+.Xr ip6 4 .
.It Cm nomap
If the driver supports unmapped network buffers,
enable them on the interface.
diff --git a/sbin/ifconfig/ifconfig.c b/sbin/ifconfig/ifconfig.c
index b1e4aa947c5a..78fc6a7e1e1e 100644
--- a/sbin/ifconfig/ifconfig.c
+++ b/sbin/ifconfig/ifconfig.c
@@ -1257,7 +1257,7 @@ unsetifdescr(const char *val, int value, int s, const struct afswtch *afp)
"\020\1RXCSUM\2TXCSUM\3NETCONS\4VLAN_MTU\5VLAN_HWTAGGING\6JUMBO_MTU\7POLLING" \
"\10VLAN_HWCSUM\11TSO4\12TSO6\13LRO\14WOL_UCAST\15WOL_MCAST\16WOL_MAGIC" \
"\17TOE4\20TOE6\21VLAN_HWFILTER\23VLAN_HWTSO\24LINKSTATE\25NETMAP" \
-"\26RXCSUM_IPV6\27TXCSUM_IPV6\31TXRTLMT\32HWRXTSTMP\33NOMAP"
+"\26RXCSUM_IPV6\27TXCSUM_IPV6\31TXRTLMT\32HWRXTSTMP\33NOMAP\34TXTLS4\35TXTLS6"
/*
* Print the status of the interface. If an address family was
@@ -1585,6 +1585,8 @@ static struct cmd basic_cmds[] = {
DEF_CMD("-toe", -IFCAP_TOE, setifcap),
DEF_CMD("lro", IFCAP_LRO, setifcap),
DEF_CMD("-lro", -IFCAP_LRO, setifcap),
+ DEF_CMD("txtls", IFCAP_TXTLS, setifcap),
+ DEF_CMD("-txtls", -IFCAP_TXTLS, setifcap),
DEF_CMD("wol", IFCAP_WOL, setifcap),
DEF_CMD("-wol", -IFCAP_WOL, setifcap),
DEF_CMD("wol_ucast", IFCAP_WOL_UCAST, setifcap),
diff --git a/share/man/man4/tcp.4 b/share/man/man4/tcp.4
index 58ea54e2561b..e1545c1a0161 100644
--- a/share/man/man4/tcp.4
+++ b/share/man/man4/tcp.4
@@ -34,7 +34,7 @@
.\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93
.\" $FreeBSD$
.\"
-.Dd July 23, 2019
+.Dd August 26, 2019
.Dt TCP 4
.Os
.Sh NAME
@@ -293,6 +293,51 @@ If an SADB entry cannot be found for the destination,
the system does not send any outgoing segments and drops any inbound segments.
.Pp
Each dropped segment is taken into account in the TCP protocol statistics.
+.It Dv TCP_TXTLS_ENABLE
+Enable in-kernel Transport Layer Security (TLS) for data written to this
+socket.
+The
+.Vt struct tls_so_enable
+argument defines the encryption and authentication algorithms and keys
+used to encrypt the socket data as well as the maximum TLS record
+payload size.
+.Pp
+All data written to this socket will be encapsulated in TLS records
+and subsequently encrypted.
+By default all data written to this socket is treated as application data.
+Individual TLS records with a type other than application data
+(for example, handshake messages),
+may be transmitted by invoking
+.Xr sendmsg 2
+with a custom TLS record type set in a
+.Dv TLS_SET_RECORD_TYPE
+control message.
+The payload of this control message is a single byte holding the desired
+TLS record type.
+.Pp
+Data read from this socket will still be encrypted and must be parsed by
+a TLS-aware consumer.
+.Pp
+At present, only a single key may be set on a socket.
+As such, users of this option must disable rekeying.
+.It Dv TCP_TXTLS_MODE
+The integer argument can be used to get or set the current TLS mode of a
+socket.
+Setting the mode can only used to toggle between software and NIC TLS after
+TLS has been initially enabled via the
+.Dv TCP_TXTLS_ENABLE
+option.
+The available modes are:
+.Bl -tag -width "Dv TCP_TLS_MODE_IFNET"
+.It Dv TCP_TLS_MODE_NONE
+In-kernel TLS framing and encryption is not enabled for this socket.
+.It Dv TCP_TLS_MODE_SW
+TLS records are encrypted by the kernel prior to placing the data in the
+socket buffer.
+Typically this encryption is performed in software.
+.It Dv TCP_TLS_MODE_IFNET
+TLS records are encrypted by the network interface card (NIC).
+.El
.El
.Pp
The option level for the
diff --git a/sys/conf/NOTES b/sys/conf/NOTES
index 0d58cd3a57f3..66b12ce6a46f 100644
--- a/sys/conf/NOTES
+++ b/sys/conf/NOTES
@@ -654,6 +654,10 @@ options IPSEC #IP security (requires device crypto)
options IPSEC_SUPPORT
#options IPSEC_DEBUG #debug for IP security
+
+# TLS framing and encryption of data transmitted over TCP sockets.
+options KERN_TLS # TLS transmit offload
+
#
# SMB/CIFS requester
# NETSMB enables support for SMB protocol, it requires LIBMCHAIN and LIBICONV
diff --git a/sys/conf/files b/sys/conf/files
index fd84e0a9878b..fc01f5551f11 100644
--- a/sys/conf/files
+++ b/sys/conf/files
@@ -3862,6 +3862,7 @@ kern/tty_ttydisc.c standard
kern/uipc_accf.c standard
kern/uipc_debug.c optional ddb
kern/uipc_domain.c standard
+kern/uipc_ktls.c optional kern_tls
kern/uipc_mbuf.c standard
kern/uipc_mbuf2.c standard
kern/uipc_mbufhash.c standard
diff --git a/sys/conf/options b/sys/conf/options
index 18cb43d2703b..6957a2d236ef 100644
--- a/sys/conf/options
+++ b/sys/conf/options
@@ -435,6 +435,7 @@ IPSEC opt_ipsec.h
IPSEC_DEBUG opt_ipsec.h
IPSEC_SUPPORT opt_ipsec.h
IPSTEALTH
+KERN_TLS
KRPC
LIBALIAS
LIBMCHAIN
diff --git a/sys/kern/kern_mbuf.c b/sys/kern/kern_mbuf.c
index f331b8a11877..dff4c2c0a56c 100644
--- a/sys/kern/kern_mbuf.c
+++ b/sys/kern/kern_mbuf.c
@@ -31,6 +31,7 @@
__FBSDID("$FreeBSD$");
#include "opt_param.h"
+#include "opt_kern_tls.h"
#include <sys/param.h>
#include <sys/conf.h>
@@ -41,10 +42,12 @@ __FBSDID("$FreeBSD$");
#include <sys/domain.h>
#include <sys/eventhandler.h>
#include <sys/kernel.h>
+#include <sys/ktls.h>
#include <sys/limits.h>
#include <sys/lock.h>
#include <sys/mutex.h>
#include <sys/protosw.h>
+#include <sys/refcount.h>
#include <sys/sf_buf.h>
#include <sys/smp.h>
#include <sys/socket.h>
@@ -112,10 +115,10 @@ int nmbjumbop; /* limits number of page size jumbo clusters */
int nmbjumbo9; /* limits number of 9k jumbo clusters */
int nmbjumbo16; /* limits number of 16k jumbo clusters */
-bool mb_use_ext_pgs; /* use EXT_PGS mbufs for sendfile */
+bool mb_use_ext_pgs; /* use EXT_PGS mbufs for sendfile & TLS */
SYSCTL_BOOL(_kern_ipc, OID_AUTO, mb_use_ext_pgs, CTLFLAG_RWTUN,
&mb_use_ext_pgs, 0,
- "Use unmapped mbufs for sendfile(2)");
+ "Use unmapped mbufs for sendfile(2) and TLS offload");
static quad_t maxmbufmem; /* overall real memory limit for all mbufs */
@@ -1281,13 +1284,27 @@ mb_free_ext(struct mbuf *m)
uma_zfree(zone_jumbo16, m->m_ext.ext_buf);
uma_zfree(zone_mbuf, mref);
break;
- case EXT_PGS:
+ case EXT_PGS: {
+#ifdef KERN_TLS
+ struct mbuf_ext_pgs *pgs;
+ struct ktls_session *tls;
+#endif
+
KASSERT(mref->m_ext.ext_free != NULL,
("%s: ext_free not set", __func__));
mref->m_ext.ext_free(mref);
- uma_zfree(zone_extpgs, mref->m_ext.ext_pgs);
+#ifdef KERN_TLS
+ pgs = mref->m_ext.ext_pgs;
+ tls = pgs->tls;
+ if (tls != NULL &&
+ !refcount_release_if_not_last(&tls->refcount))
+ ktls_enqueue_to_free(pgs);
+ else
+#endif
+ uma_zfree(zone_extpgs, mref->m_ext.ext_pgs);
uma_zfree(zone_mbuf, mref);
break;
+ }
case EXT_SFBUF:
case EXT_NET_DRV:
case EXT_MOD_TYPE:
diff --git a/sys/kern/kern_sendfile.c b/sys/kern/kern_sendfile.c
index 9674e74fcd37..5c8dfa7e5b41 100644
--- a/sys/kern/kern_sendfile.c
+++ b/sys/kern/kern_sendfile.c
@@ -30,12 +30,15 @@
#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");
+#include "opt_kern_tls.h"
+
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/capsicum.h>
#include <sys/kernel.h>
#include <netinet/in.h>
#include <sys/lock.h>
+#include <sys/ktls.h>
#include <sys/mutex.h>
#include <sys/sysproto.h>
#include <sys/malloc.h>
@@ -85,6 +88,7 @@ struct sf_io {
int npages;
struct socket *so;
struct mbuf *m;
+ struct ktls_session *tls;
vm_page_t pa[];
};
@@ -262,6 +266,15 @@ sendfile_iodone(void *arg, vm_page_t *pg, int count, int error)
if (!refcount_release(&sfio->nios))
return;
+#ifdef INVARIANTS
+ if ((sfio->m->m_flags & M_EXT) != 0 &&
+ sfio->m->m_ext.ext_type == EXT_PGS)
+ KASSERT(sfio->tls == sfio->m->m_ext.ext_pgs->tls,
+ ("TLS session mismatch"));
+ else
+ KASSERT(sfio->tls == NULL,
+ ("non-ext_pgs mbuf with TLS session"));
+#endif
CURVNET_SET(so->so_vnet);
if (sfio->error) {
/*
@@ -279,12 +292,29 @@ sendfile_iodone(void *arg, vm_page_t *pg, int count, int error)
so->so_error = EIO;
mb_free_notready(sfio->m, sfio->npages);
+#ifdef KERN_TLS
+ } else if (sfio->tls != NULL && sfio->tls->sw_encrypt != NULL) {
+ /*
+ * I/O operation is complete, but we still need to
+ * encrypt. We cannot do this in the interrupt thread
+ * of the disk controller, so forward the mbufs to a
+ * different thread.
+ *
+ * Donate the socket reference from sfio to rather
+ * than explicitly invoking soref().
+ */
+ ktls_enqueue(sfio->m, so, sfio->npages);
+ goto out_with_ref;
+#endif
} else
(void)(so->so_proto->pr_usrreqs->pru_ready)(so, sfio->m,
sfio->npages);
SOCK_LOCK(so);
sorele(so);
+#ifdef KERN_TLS
+out_with_ref:
+#endif
CURVNET_RESTORE();
free(sfio, M_TEMP);
}
@@ -526,6 +556,9 @@ vn_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio,
struct vnode *vp;
struct vm_object *obj;
struct socket *so;
+#ifdef KERN_TLS
+ struct ktls_session *tls;
+#endif
struct mbuf_ext_pgs *ext_pgs;
struct mbuf *m, *mh, *mhtail;
struct sf_buf *sf;
@@ -534,12 +567,18 @@ vn_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio,
struct vattr va;
off_t off, sbytes, rem, obj_size;
int bsize, error, ext_pgs_idx, hdrlen, max_pgs, softerr;
+#ifdef KERN_TLS
+ int tls_enq_cnt;
+#endif
bool use_ext_pgs;
obj = NULL;
so = NULL;
m = mh = NULL;
sfs = NULL;
+#ifdef KERN_TLS
+ tls = NULL;
+#endif
hdrlen = sbytes = 0;
softerr = 0;
use_ext_pgs = false;
@@ -576,6 +615,9 @@ vn_sendfile(struct file *fp, int sockfd, struct uio *hdr_uio,
* we implement that, but possibly shouldn't.
*/
(void)sblock(&so->so_snd, SBL_WAIT | SBL_NOINTR);
+#ifdef KERN_TLS
+ tls = ktls_hold(so->so_snd.sb_tls_info);
+#endif
/*
* Loop through the pages of the file, starting with the requested
@@ -669,7 +711,14 @@ retry_space:
if (hdr_uio != NULL && hdr_uio->uio_resid > 0) {
hdr_uio->uio_td = td;
hdr_uio->uio_rw = UIO_WRITE;
- mh = m_uiotombuf(hdr_uio, M_WAITOK, space, 0, 0);
+#ifdef KERN_TLS
+ if (tls != NULL)
+ mh = m_uiotombuf(hdr_uio, M_WAITOK, space,
+ tls->params.max_frame_len, M_NOMAP);
+ else
+#endif
+ mh = m_uiotombuf(hdr_uio, M_WAITOK,
+ space, 0, 0);
hdrlen = m_length(mh, &mhtail);
space -= hdrlen;
/*
@@ -743,6 +792,15 @@ retry_space:
sfio->so = so;
sfio->error = 0;
+#ifdef KERN_TLS
+ /*
+ * This doesn't use ktls_hold() because sfio->m will
+ * also have a reference on 'tls' that will be valid
+ * for all of sfio's lifetime.
+ */
+ sfio->tls = tls;
+#endif
+
error = sendfile_swapin(obj, sfio, &nios, off, space, npages,
rhpages, flags);
if (error != 0) {
@@ -763,11 +821,22 @@ retry_space:
* bufs are restricted to TCP as that is what has been
* tested. In particular, unmapped mbufs have not
* been tested with UNIX-domain sockets.
+ *
+ * TLS frames always require unmapped mbufs.
*/
- if (mb_use_ext_pgs &&
- so->so_proto->pr_protocol == IPPROTO_TCP) {
+ if ((mb_use_ext_pgs &&
+ so->so_proto->pr_protocol == IPPROTO_TCP)
+#ifdef KERN_TLS
+ || tls != NULL
+#endif
+ ) {
use_ext_pgs = true;
- max_pgs = MBUF_PEXT_MAX_PGS;
+#ifdef KERN_TLS
+ if (tls != NULL)
+ max_pgs = num_pages(tls->params.max_frame_len);
+ else
+#endif
+ max_pgs = MBUF_PEXT_MAX_PGS;
/* Start at last index, to wrap on first use. */
ext_pgs_idx = max_pgs - 1;
@@ -946,6 +1015,14 @@ prepend_header:
__func__, m_length(m, NULL), space, hdrlen));
CURVNET_SET(so->so_vnet);
+#ifdef KERN_TLS
+ if (tls != NULL) {
+ error = ktls_frame(m, tls, &tls_enq_cnt,
+ TLS_RLTYPE_APP);
+ if (error != 0)
+ goto done;
+ }
+#endif
if (nios == 0) {
/*
* If sendfile_swapin() didn't initiate any I/Os,
@@ -954,8 +1031,16 @@ prepend_header:
* PRUS_NOTREADY flag.
*/
free(sfio, M_TEMP);
- error = (*so->so_proto->pr_usrreqs->pru_send)
- (so, 0, m, NULL, NULL, td);
+#ifdef KERN_TLS
+ if (tls != NULL && tls->sw_encrypt != NULL) {
+ error = (*so->so_proto->pr_usrreqs->pru_send)
+ (so, PRUS_NOTREADY, m, NULL, NULL, td);
+ soref(so);
+ ktls_enqueue(m, so, tls_enq_cnt);
+ } else
+#endif
+ error = (*so->so_proto->pr_usrreqs->pru_send)
+ (so, 0, m, NULL, NULL, td);
} else {
sfio->npages = npages;
soref(so);
@@ -1019,6 +1104,10 @@ out:
mtx_destroy(&sfs->mtx);
free(sfs, M_TEMP);
}
+#ifdef KERN_TLS
+ if (tls != NULL)
+ ktls_free(tls);
+#endif
if (error == ERESTART)
error = EINTR;
diff --git a/sys/kern/uipc_ktls.c b/sys/kern/uipc_ktls.c
new file mode 100644
index 000000000000..62838a356f55
--- /dev/null
+++ b/sys/kern/uipc_ktls.c
@@ -0,0 +1,1450 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright (c) 2014-2019 Netflix Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include "opt_inet.h"
+#include "opt_inet6.h"
+#include "opt_rss.h"
+
+#include <sys/param.h>
+#include <sys/kernel.h>
+#include <sys/ktls.h>
+#include <sys/lock.h>
+#include <sys/mbuf.h>
+#include <sys/mutex.h>
+#include <sys/rmlock.h>
+#include <sys/proc.h>
+#include <sys/protosw.h>
+#include <sys/refcount.h>
+#include <sys/smp.h>
+#include <sys/socket.h>
+#include <sys/socketvar.h>
+#include <sys/sysctl.h>
+#include <sys/taskqueue.h>
+#include <sys/kthread.h>
+#include <sys/uio.h>
+#include <sys/vmmeter.h>
+#if defined(__aarch64__) || defined(__amd64__) || defined(__i386__)
+#include <machine/pcb.h>
+#endif
+#include <machine/vmparam.h>
+#ifdef RSS
+#include <net/netisr.h>
+#include <net/rss_config.h>
+#endif
+#if defined(INET) || defined(INET6)
+#include <netinet/in.h>
+#include <netinet/in_pcb.h>
+#endif
+#include <netinet/tcp_var.h>
+#include <opencrypto/xform.h>
+#include <vm/uma_dbg.h>
+#include <vm/vm.h>
+#include <vm/vm_pageout.h>
+#include <vm/vm_page.h>
+
+struct ktls_wq {
+ struct mtx mtx;
+ STAILQ_HEAD(, mbuf_ext_pgs) head;
+ bool running;
+} __aligned(CACHE_LINE_SIZE);
+
+static struct ktls_wq *ktls_wq;
+static struct proc *ktls_proc;
+LIST_HEAD(, ktls_crypto_backend) ktls_backends;
+static struct rmlock ktls_backends_lock;
+static uma_zone_t ktls_session_zone;
+static uint16_t ktls_cpuid_lookup[MAXCPU];
+
+SYSCTL_NODE(_kern_ipc, OID_AUTO, tls, CTLFLAG_RW, 0,
+ "Kernel TLS offload");
+SYSCTL_NODE(_kern_ipc_tls, OID_AUTO, stats, CTLFLAG_RW, 0,
+ "Kernel TLS offload stats");
+
+static int ktls_allow_unload;
+SYSCTL_INT(_kern_ipc_tls, OID_AUTO, allow_unload, CTLFLAG_RDTUN,
+ &ktls_allow_unload, 0, "Allow software crypto modules to unload");
+
+#ifdef RSS
+static int ktls_bind_threads = 1;
+#else
+static int ktls_bind_threads;
+#endif
+SYSCTL_INT(_kern_ipc_tls, OID_AUTO, bind_threads, CTLFLAG_RDTUN,
+ &ktls_bind_threads, 0,
+ "Bind crypto threads to cores or domains at boot");
+
+static u_int ktls_maxlen = 16384;
+SYSCTL_UINT(_kern_ipc_tls, OID_AUTO, maxlen, CTLFLAG_RWTUN,
+ &ktls_maxlen, 0, "Maximum TLS record size");
+
+static int ktls_number_threads;
+SYSCTL_INT(_kern_ipc_tls_stats, OID_AUTO, threads, CTLFLAG_RD,
+ &ktls_number_threads, 0,
+ "Number of TLS threads in thread-pool");
+
+static bool ktls_offload_enable;
+SYSCTL_BOOL(_kern_ipc_tls, OID_AUTO, enable, CTLFLAG_RW,
+ &ktls_offload_enable, 0,
+ "Enable support for kernel TLS offload");
+
+static bool ktls_cbc_enable = true;
+SYSCTL_BOOL(_kern_ipc_tls, OID_AUTO, cbc_enable, CTLFLAG_RW,
+ &ktls_cbc_enable, 1,
+ "Enable Support of AES-CBC crypto for kernel TLS");
+
+static counter_u64_t ktls_tasks_active;
+SYSCTL_COUNTER_U64(_kern_ipc_tls, OID_AUTO, tasks_active, CTLFLAG_RD,
+ &ktls_tasks_active, "Number of active tasks");
+
+static counter_u64_t ktls_cnt_on;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, so_inqueue, CTLFLAG_RD,
+ &ktls_cnt_on, "Number of TLS records in queue to tasks for SW crypto");
+
+static counter_u64_t ktls_offload_total;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, offload_total,
+ CTLFLAG_RD, &ktls_offload_total,
+ "Total successful TLS setups (parameters set)");
+
+static counter_u64_t ktls_offload_enable_calls;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, enable_calls,
+ CTLFLAG_RD, &ktls_offload_enable_calls,
+ "Total number of TLS enable calls made");
+
+static counter_u64_t ktls_offload_active;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, active, CTLFLAG_RD,
+ &ktls_offload_active, "Total Active TLS sessions");
+
+static counter_u64_t ktls_offload_failed_crypto;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, failed_crypto, CTLFLAG_RD,
+ &ktls_offload_failed_crypto, "Total TLS crypto failures");
+
+static counter_u64_t ktls_switch_to_ifnet;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, switch_to_ifnet, CTLFLAG_RD,
+ &ktls_switch_to_ifnet, "TLS sessions switched from SW to ifnet");
+
+static counter_u64_t ktls_switch_to_sw;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, switch_to_sw, CTLFLAG_RD,
+ &ktls_switch_to_sw, "TLS sessions switched from ifnet to SW");
+
+static counter_u64_t ktls_switch_failed;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, switch_failed, CTLFLAG_RD,
+ &ktls_switch_failed, "TLS sessions unable to switch between SW and ifnet");
+
+SYSCTL_NODE(_kern_ipc_tls, OID_AUTO, sw, CTLFLAG_RD, 0,
+ "Software TLS session stats");
+SYSCTL_NODE(_kern_ipc_tls, OID_AUTO, ifnet, CTLFLAG_RD, 0,
+ "Hardware (ifnet) TLS session stats");
+
+static counter_u64_t ktls_sw_cbc;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_sw, OID_AUTO, cbc, CTLFLAG_RD, &ktls_sw_cbc,
+ "Active number of software TLS sessions using AES-CBC");
+
+static counter_u64_t ktls_sw_gcm;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_sw, OID_AUTO, gcm, CTLFLAG_RD, &ktls_sw_gcm,
+ "Active number of software TLS sessions using AES-GCM");
+
+static counter_u64_t ktls_ifnet_cbc;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_ifnet, OID_AUTO, cbc, CTLFLAG_RD,
+ &ktls_ifnet_cbc,
+ "Active number of ifnet TLS sessions using AES-CBC");
+
+static counter_u64_t ktls_ifnet_gcm;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_ifnet, OID_AUTO, gcm, CTLFLAG_RD,
+ &ktls_ifnet_gcm,
+ "Active number of ifnet TLS sessions using AES-GCM");
+
+static counter_u64_t ktls_ifnet_reset;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_ifnet, OID_AUTO, reset, CTLFLAG_RD,
+ &ktls_ifnet_reset, "TLS sessions updated to a new ifnet send tag");
+
+static counter_u64_t ktls_ifnet_reset_dropped;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_ifnet, OID_AUTO, reset_dropped, CTLFLAG_RD,
+ &ktls_ifnet_reset_dropped,
+ "TLS sessions dropped after failing to update ifnet send tag");
+
+static counter_u64_t ktls_ifnet_reset_failed;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_ifnet, OID_AUTO, reset_failed, CTLFLAG_RD,
+ &ktls_ifnet_reset_failed,
+ "TLS sessions that failed to allocate a new ifnet send tag");
+
+static int ktls_ifnet_permitted;
+SYSCTL_UINT(_kern_ipc_tls_ifnet, OID_AUTO, permitted, CTLFLAG_RWTUN,
+ &ktls_ifnet_permitted, 1,
+ "Whether to permit hardware (ifnet) TLS sessions");
+
+static MALLOC_DEFINE(M_KTLS, "ktls", "Kernel TLS");
+
+static void ktls_cleanup(struct ktls_session *tls);
+#if defined(INET) || defined(INET6)
+static void ktls_reset_send_tag(void *context, int pending);
+#endif
+static void ktls_work_thread(void *ctx);
+
+int
+ktls_crypto_backend_register(struct ktls_crypto_backend *be)
+{
+ struct ktls_crypto_backend *curr_be, *tmp;
+
+ if (be->api_version != KTLS_API_VERSION) {
+ printf("KTLS: API version mismatch (%d vs %d) for %s\n",
+ be->api_version, KTLS_API_VERSION,
+ be->name);
+ return (EINVAL);
+ }
+
+ rm_wlock(&ktls_backends_lock);
+ printf("KTLS: Registering crypto method %s with prio %d\n",
+ be->name, be->prio);
+ if (LIST_EMPTY(&ktls_backends)) {
+ LIST_INSERT_HEAD(&ktls_backends, be, next);
+ } else {
+ LIST_FOREACH_SAFE(curr_be, &ktls_backends, next, tmp) {
+ if (curr_be->prio < be->prio) {
+ LIST_INSERT_BEFORE(curr_be, be, next);
+ break;
+ }
+ if (LIST_NEXT(curr_be, next) == NULL) {
+ LIST_INSERT_AFTER(curr_be, be, next);
+ break;
+ }
+ }
+ }
+ rm_wunlock(&ktls_backends_lock);
+ return (0);
+}
+
+int
+ktls_crypto_backend_deregister(struct ktls_crypto_backend *be)
+{
+ struct ktls_crypto_backend *tmp;
+
+ /*
+ * Don't error if the backend isn't registered. This permits
+ * MOD_UNLOAD handlers to use this function unconditionally.
+ */
+ rm_wlock(&ktls_backends_lock);
+ LIST_FOREACH(tmp, &ktls_backends, next) {
+ if (tmp == be)
+ break;
+ }
+ if (tmp == NULL) {
+ rm_wunlock(&ktls_backends_lock);
+ return (0);
+ }
+
+ if (!ktls_allow_unload) {
+ rm_wunlock(&ktls_backends_lock);
+ printf(
+ "KTLS: Deregistering crypto method %s is not supported\n",
+ be->name);
+ return (EBUSY);
+ }
+
+ if (be->use_count) {
+ rm_wunlock(&ktls_backends_lock);
+ return (EBUSY);
+ }
+
+ LIST_REMOVE(be, next);
+ rm_wunlock(&ktls_backends_lock);
+ return (0);
+}
+
+#if defined(INET) || defined(INET6)
+static uint16_t
+ktls_get_cpu(struct socket *so)
+{
+ struct inpcb *inp;
+ uint16_t cpuid;
+
+ inp = sotoinpcb(so);
+#ifdef RSS
+ cpuid = rss_hash2cpuid(inp->inp_flowid, inp->inp_flowtype);
+ if (cpuid != NETISR_CPUID_NONE)
+ return (cpuid);
+#endif
+ /*
+ * Just use the flowid to shard connections in a repeatable
+ * fashion. Note that some crypto backends rely on the
+ * serialization provided by having the same connection use
+ * the same queue.
+ */
+ cpuid = ktls_cpuid_lookup[inp->inp_flowid % ktls_number_threads];
+ return (cpuid);
+}
+#endif
+
+static void
+ktls_init(void *dummy __unused)
+{
+ struct thread *td;
+ struct pcpu *pc;
+ cpuset_t mask;
+ int error, i;
+
+ ktls_tasks_active = counter_u64_alloc(M_WAITOK);
+ ktls_cnt_on = counter_u64_alloc(M_WAITOK);
+ ktls_offload_total = counter_u64_alloc(M_WAITOK);
+ ktls_offload_enable_calls = counter_u64_alloc(M_WAITOK);
+ ktls_offload_active = counter_u64_alloc(M_WAITOK);
+ ktls_offload_failed_crypto = counter_u64_alloc(M_WAITOK);
+ ktls_switch_to_ifnet = counter_u64_alloc(M_WAITOK);
+ ktls_switch_to_sw = counter_u64_alloc(M_WAITOK);
+ ktls_switch_failed = counter_u64_alloc(M_WAITOK);
+ ktls_sw_cbc = counter_u64_alloc(M_WAITOK);
+ ktls_sw_gcm = counter_u64_alloc(M_WAITOK);
+ ktls_ifnet_cbc = counter_u64_alloc(M_WAITOK);
+ ktls_ifnet_gcm = counter_u64_alloc(M_WAITOK);
+ ktls_ifnet_reset = counter_u64_alloc(M_WAITOK);
+ ktls_ifnet_reset_dropped = counter_u64_alloc(M_WAITOK);
+ ktls_ifnet_reset_failed = counter_u64_alloc(M_WAITOK);
+
+ rm_init(&ktls_backends_lock, "ktls backends");
+ LIST_INIT(&ktls_backends);
+
+ ktls_wq = malloc(sizeof(*ktls_wq) * (mp_maxid + 1), M_KTLS,
+ M_WAITOK | M_ZERO);
+
+ ktls_session_zone = uma_zcreate("ktls_session",
+ sizeof(struct ktls_session),
+#ifdef INVARIANTS
+ trash_ctor, trash_dtor, trash_init, trash_fini,
+#else
+ NULL, NULL, NULL, NULL,
+#endif
+ UMA_ALIGN_CACHE, 0);
+
+ /*
+ * Initialize the workqueues to run the TLS work. We create a
+ * work queue for each CPU.
+ */
+ CPU_FOREACH(i) {
+ STAILQ_INIT(&ktls_wq[i].head);
+ mtx_init(&ktls_wq[i].mtx, "ktls work queue", NULL, MTX_DEF);
+ error = kproc_kthread_add(ktls_work_thread, &ktls_wq[i],
+ &ktls_proc, &td, 0, 0, "KTLS", "ktls_thr_%d", i);
+ if (error)
+ panic("Can't add KTLS thread %d error %d", i, error);
+
+ /*
+ * Bind threads to cores. If ktls_bind_threads is >
+ * 1, then we bind to the NUMA domain.
+ */
+ if (ktls_bind_threads) {
+ if (ktls_bind_threads > 1) {
+ pc = pcpu_find(i);
+ CPU_COPY(&cpuset_domain[pc->pc_domain], &mask);
+ } else {
+ CPU_SETOF(i, &mask);
+ }
+ error = cpuset_setthread(td->td_tid, &mask);
+ if (error)
+ panic(
+ "Unable to bind KTLS thread for CPU %d error %d",
+ i, error);
+ }
+ ktls_cpuid_lookup[ktls_number_threads] = i;
+ ktls_number_threads++;
+ }
+ printf("KTLS: Initialized %d threads\n", ktls_number_threads);
+}
+SYSINIT(ktls, SI_SUB_SMP + 1, SI_ORDER_ANY, ktls_init, NULL);
+
+#if defined(INET) || defined(INET6)
+static int
+ktls_create_session(struct socket *so, struct tls_enable *en,
+ struct ktls_session **tlsp)
+{
+ struct ktls_session *tls;
+ int error;
+
+ /* Only TLS 1.0 - 1.2 are supported. */
+ if (en->tls_vmajor != TLS_MAJOR_VER_ONE)
+ return (EINVAL);
+ if (en->tls_vminor < TLS_MINOR_VER_ZERO ||
+ en->tls_vminor > TLS_MINOR_VER_TWO)
+ return (EINVAL);
+
+ if (en->auth_key_len < 0 || en->auth_key_len > TLS_MAX_PARAM_SIZE)
+ return (EINVAL);
+ if (en->cipher_key_len < 0 || en->cipher_key_len > TLS_MAX_PARAM_SIZE)
+ return (EINVAL);
+ if (en->iv_len < 0 || en->iv_len > TLS_MAX_PARAM_SIZE)
+ return (EINVAL);
+
+ /* All supported algorithms require a cipher key. */
+ if (en->cipher_key_len == 0)
+ return (EINVAL);
+
+ /* No flags are currently supported. */
+ if (en->flags != 0)
+ return (EINVAL);
+
+ /* Common checks for supported algorithms. */
+ switch (en->cipher_algorithm) {
+ case CRYPTO_AES_NIST_GCM_16:
+ /*
+ * auth_algorithm isn't used, but permit GMAC values
+ * for compatibility.
+ */
+ switch (en->auth_algorithm) {
+ case 0:
+ case CRYPTO_AES_128_NIST_GMAC:
+ case CRYPTO_AES_192_NIST_GMAC:
+ case CRYPTO_AES_256_NIST_GMAC:
+ break;
+ default:
+ return (EINVAL);
+ }
+ if (en->auth_key_len != 0)
+ return (EINVAL);
+ if (en->iv_len != TLS_AEAD_GCM_LEN)
+ return (EINVAL);
+ break;
+ case CRYPTO_AES_CBC:
+ switch (en->auth_algorithm) {
+ case CRYPTO_SHA1_HMAC:
+ /*
+ * TLS 1.0 requires an implicit IV. TLS 1.1+
+ * all use explicit IVs.
+ */
+ if (en->tls_vminor == TLS_MINOR_VER_ZERO) {
+ if (en->iv_len != TLS_CBC_IMPLICIT_IV_LEN)
+ return (EINVAL);
+ break;
+ }
+
+ /* FALLTHROUGH */
+ case CRYPTO_SHA2_256_HMAC:
+ case CRYPTO_SHA2_384_HMAC:
+ /* Ignore any supplied IV. */
+ en->iv_len = 0;
+ break;
+ default:
+ return (EINVAL);
+ }
+ if (en->auth_key_len == 0)
+ return (EINVAL);
+ break;
+ default:
+ return (EINVAL);
+ }
+
+ tls = uma_zalloc(ktls_session_zone, M_WAITOK | M_ZERO);
+
+ counter_u64_add(ktls_offload_active, 1);
+
+ refcount_init(&tls->refcount, 1);
+ TASK_INIT(&tls->reset_tag_task, 0, ktls_reset_send_tag, tls);
+
+ tls->wq_index = ktls_get_cpu(so);
+
+ tls->params.cipher_algorithm = en->cipher_algorithm;
+ tls->params.auth_algorithm = en->auth_algorithm;
+ tls->params.tls_vmajor = en->tls_vmajor;
+ tls->params.tls_vminor = en->tls_vminor;
+ tls->params.flags = en->flags;
+ tls->params.max_frame_len = min(TLS_MAX_MSG_SIZE_V10_2, ktls_maxlen);
+
+ /* Set the header and trailer lengths. */
+ tls->params.tls_hlen = sizeof(struct tls_record_layer);
+ switch (en->cipher_algorithm) {
+ case CRYPTO_AES_NIST_GCM_16:
+ tls->params.tls_hlen += 8;
+ tls->params.tls_tlen = AES_GMAC_HASH_LEN;
+ tls->params.tls_bs = 1;
+ break;
+ case CRYPTO_AES_CBC:
+ switch (en->auth_algorithm) {
+ case CRYPTO_SHA1_HMAC:
+ if (en->tls_vminor == TLS_MINOR_VER_ZERO) {
+ /* Implicit IV, no nonce. */
+ } else {
+ tls->params.tls_hlen += AES_BLOCK_LEN;
+ }
+ tls->params.tls_tlen = AES_BLOCK_LEN +
+ SHA1_HASH_LEN;
+ break;
+ case CRYPTO_SHA2_256_HMAC:
+ tls->params.tls_hlen += AES_BLOCK_LEN;
+ tls->params.tls_tlen = AES_BLOCK_LEN +
+ SHA2_256_HASH_LEN;
+ break;
+ case CRYPTO_SHA2_384_HMAC:
+ tls->params.tls_hlen += AES_BLOCK_LEN;
+ tls->params.tls_tlen = AES_BLOCK_LEN +
+ SHA2_384_HASH_LEN;
+ break;
+ default:
+ panic("invalid hmac");
+ }
+ tls->params.tls_bs = AES_BLOCK_LEN;
+ break;
+ default:
+ panic("invalid cipher");
+ }
+
+ KASSERT(tls->params.tls_hlen <= MBUF_PEXT_HDR_LEN,
+ ("TLS header length too long: %d", tls->params.tls_hlen));
+ KASSERT(tls->params.tls_tlen <= MBUF_PEXT_TRAIL_LEN,
+ ("TLS trailer length too long: %d", tls->params.tls_tlen));
+
+ if (en->auth_key_len != 0) {
+ tls->params.auth_key_len = en->auth_key_len;
+ tls->params.auth_key = malloc(en->auth_key_len, M_KTLS,
+ M_WAITOK);
+ error = copyin(en->auth_key, tls->params.auth_key,
+ en->auth_key_len);
+ if (error)
+ goto out;
+ }
+
+ tls->params.cipher_key_len = en->cipher_key_len;
+ tls->params.cipher_key = malloc(en->cipher_key_len, M_KTLS, M_WAITOK);
+ error = copyin(en->cipher_key, tls->params.cipher_key,
+ en->cipher_key_len);
+ if (error)
+ goto out;
+
+ /*
+ * This holds the implicit portion of the nonce for GCM and
+ * the initial implicit IV for TLS 1.0. The explicit portions
+ * of the IV are generated in ktls_frame() and ktls_seq().
+ */
+ if (en->iv_len != 0) {
+ MPASS(en->iv_len <= sizeof(tls->params.iv));
+ tls->params.iv_len = en->iv_len;
+ error = copyin(en->iv, tls->params.iv, en->iv_len);
+ if (error)
+ goto out;
+ }
+
+ *tlsp = tls;
+ return (0);
+
+out:
+ ktls_cleanup(tls);
+ return (error);
+}
+
+static struct ktls_session *
+ktls_clone_session(struct ktls_session *tls)
+{
+ struct ktls_session *tls_new;
+
+ tls_new = uma_zalloc(ktls_session_zone, M_WAITOK | M_ZERO);
+
+ counter_u64_add(ktls_offload_active, 1);
+
+ refcount_init(&tls_new->refcount, 1);
+
+ /* Copy fields from existing session. */
+ tls_new->params = tls->params;
+ tls_new->wq_index = tls->wq_index;
+
+ /* Deep copy keys. */
+ if (tls_new->params.auth_key != NULL) {
+ tls_new->params.auth_key = malloc(tls->params.auth_key_len,
+ M_KTLS, M_WAITOK);
+ memcpy(tls_new->params.auth_key, tls->params.auth_key,
+ tls->params.auth_key_len);
+ }
+
+ tls_new->params.cipher_key = malloc(tls->params.cipher_key_len, M_KTLS,
+ M_WAITOK);
+ memcpy(tls_new->params.cipher_key, tls->params.cipher_key,
+ tls->params.cipher_key_len);
+
+ return (tls_new);
+}
+#endif
+
+static void
+ktls_cleanup(struct ktls_session *tls)
+{
+
+ counter_u64_add(ktls_offload_active, -1);
+ if (tls->free != NULL) {
+ MPASS(tls->be != NULL);
+ switch (tls->params.cipher_algorithm) {
+ case CRYPTO_AES_CBC:
+ counter_u64_add(ktls_sw_cbc, -1);
+ break;
+ case CRYPTO_AES_NIST_GCM_16:
+ counter_u64_add(ktls_sw_gcm, -1);
+ break;
+ }
+ tls->free(tls);
+ } else if (tls->snd_tag != NULL) {
+ switch (tls->params.cipher_algorithm) {
+ case CRYPTO_AES_CBC:
+ counter_u64_add(ktls_ifnet_cbc, -1);
+ break;
+ case CRYPTO_AES_NIST_GCM_16:
+ counter_u64_add(ktls_ifnet_gcm, -1);
+ break;
+ }
+ m_snd_tag_rele(tls->snd_tag);
+ }
+ if (tls->params.auth_key != NULL) {
+ explicit_bzero(tls->params.auth_key, tls->params.auth_key_len);
+ free(tls->params.auth_key, M_KTLS);
+ tls->params.auth_key = NULL;
+ tls->params.auth_key_len = 0;
+ }
+ if (tls->params.cipher_key != NULL) {
+ explicit_bzero(tls->params.cipher_key,
+ tls->params.cipher_key_len);
+ free(tls->params.cipher_key, M_KTLS);
+ tls->params.cipher_key = NULL;
+ tls->params.cipher_key_len = 0;
+ }
+ explicit_bzero(tls->params.iv, sizeof(tls->params.iv));
+}
+
+#if defined(INET) || defined(INET6)
+/*
+ * Common code used when first enabling ifnet TLS on a connection or
+ * when allocating a new ifnet TLS session due to a routing change.
+ * This function allocates a new TLS send tag on whatever interface
+ * the connection is currently routed over.
+ */
+static int
+ktls_alloc_snd_tag(struct inpcb *inp, struct ktls_session *tls, bool force,
+ struct m_snd_tag **mstp)
+{
+ union if_snd_tag_alloc_params params;
+ struct ifnet *ifp;
+ struct rtentry *rt;
+ struct tcpcb *tp;
+ int error;
+
+ INP_RLOCK(inp);
+ if (inp->inp_flags2 & INP_FREED) {
+ INP_RUNLOCK(inp);
+ return (ECONNRESET);
+ }
+ if (inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) {
+ INP_RUNLOCK(inp);
+ return (ECONNRESET);
+ }
+ if (inp->inp_socket == NULL) {
+ INP_RUNLOCK(inp);
+ return (ECONNRESET);
+ }
+ tp = intotcpcb(inp);
+
+ /*
+ * Check administrative controls on ifnet TLS to determine if
+ * ifnet TLS should be denied.
+ *
+ * - Always permit 'force' requests.
+ * - ktls_ifnet_permitted == 0: always deny.
+ */
+ if (!force && ktls_ifnet_permitted == 0) {
+ INP_RUNLOCK(inp);
+ return (ENXIO);
+ }
+
+ /*
+ * XXX: Use the cached route in the inpcb to find the
+ * interface. This should perhaps instead use
+ * rtalloc1_fib(dst, 0, 0, fibnum). Since KTLS is only
+ * enabled after a connection has completed key negotiation in
+ * userland, the cached route will be present in practice.
+ */
+ rt = inp->inp_route.ro_rt;
+ if (rt == NULL || rt->rt_ifp == NULL) {
+ INP_RUNLOCK(inp);
+ return (ENXIO);
+ }
+ ifp = rt->rt_ifp;
+ if_ref(ifp);
+
+ params.hdr.type = IF_SND_TAG_TYPE_TLS;
+ params.hdr.flowid = inp->inp_flowid;
+ params.hdr.flowtype = inp->inp_flowtype;
+ params.tls.inp = inp;
+ params.tls.tls = tls;
+ INP_RUNLOCK(inp);
+
+ if (ifp->if_snd_tag_alloc == NULL) {
+ error = EOPNOTSUPP;
+ goto out;
+ }
+ if ((ifp->if_capenable & IFCAP_NOMAP) == 0) {
+ error = EOPNOTSUPP;
+ goto out;
+ }
+ if (inp->inp_vflag & INP_IPV6) {
+ if ((ifp->if_capenable & IFCAP_TXTLS6) == 0) {
+ error = EOPNOTSUPP;
+ goto out;
+ }
+ } else {
+ if ((ifp->if_capenable & IFCAP_TXTLS4) == 0) {
+ error = EOPNOTSUPP;
+ goto out;
+ }
+ }
+ error = ifp->if_snd_tag_alloc(ifp, &params, mstp);
+out:
+ if_rele(ifp);
+ return (error);
+}
+
+static int
+ktls_try_ifnet(struct socket *so, struct ktls_session *tls, bool force)
+{
+ struct m_snd_tag *mst;
+ int error;
+
+ error = ktls_alloc_snd_tag(so->so_pcb, tls, force, &mst);
+ if (error == 0) {
+ tls->snd_tag = mst;
+ switch (tls->params.cipher_algorithm) {
+ case CRYPTO_AES_CBC:
+ counter_u64_add(ktls_ifnet_cbc, 1);
+ break;
+ case CRYPTO_AES_NIST_GCM_16:
+ counter_u64_add(ktls_ifnet_gcm, 1);
+ break;
+ }
+ }
+ return (error);
+}
+
+static int
+ktls_try_sw(struct socket *so, struct ktls_session *tls)
+{
+ struct rm_priotracker prio;
+ struct ktls_crypto_backend *be;
+
+ /*
+ * Choose the best software crypto backend. Backends are
+ * stored in sorted priority order (larget value == most
+ * important at the head of the list), so this just stops on
+ * the first backend that claims the session by returning
+ * success.
+ */
+ if (ktls_allow_unload)
+ rm_rlock(&ktls_backends_lock, &prio);
+ LIST_FOREACH(be, &ktls_backends, next) {
+ if (be->try(so, tls) == 0)
+ break;
+ KASSERT(tls->cipher == NULL,
+ ("ktls backend leaked a cipher pointer"));
+ }
+ if (be != NULL) {
+ if (ktls_allow_unload)
+ be->use_count++;
+ tls->be = be;
+ }
+ if (ktls_allow_unload)
+ rm_runlock(&ktls_backends_lock, &prio);
+ if (be == NULL)
+ return (EOPNOTSUPP);
+ switch (tls->params.cipher_algorithm) {
+ case CRYPTO_AES_CBC:
+ counter_u64_add(ktls_sw_cbc, 1);
+ break;
+ case CRYPTO_AES_NIST_GCM_16:
+ counter_u64_add(ktls_sw_gcm, 1);
+ break;
+ }
+ return (0);
+}
+
+int
+ktls_enable_tx(struct socket *so, struct tls_enable *en)
+{
+ struct ktls_session *tls;
+ int error;
+
+ if (!ktls_offload_enable)
+ return (ENOTSUP);
+
+ counter_u64_add(ktls_offload_enable_calls, 1);
+
+ /*
+ * This should always be true since only the TCP socket option
+ * invokes this function.
+ */
+ if (so->so_proto->pr_protocol != IPPROTO_TCP)
+ return (EINVAL);
+
+ /*
+ * XXX: Don't overwrite existing sessions. We should permit
+ * this to support rekeying in the future.
+ */
+ if (so->so_snd.sb_tls_info != NULL)
+ return (EALREADY);
+
+ if (en->cipher_algorithm == CRYPTO_AES_CBC && !ktls_cbc_enable)
+ return (ENOTSUP);
+
+ /* TLS requires ext pgs */
+ if (mb_use_ext_pgs == 0)
+ return (ENXIO);
+
+ error = ktls_create_session(so, en, &tls);
+ if (error)
+ return (error);
+
+ /* Prefer ifnet TLS over software TLS. */
+ error = ktls_try_ifnet(so, tls, false);
+ if (error)
+ error = ktls_try_sw(so, tls);
+
+ if (error) {
+ ktls_cleanup(tls);
+ return (error);
+ }
+
+ error = sblock(&so->so_snd, SBL_WAIT);
+ if (error) {
+ ktls_cleanup(tls);
+ return (error);
+ }
+
+ SOCKBUF_LOCK(&so->so_snd);
+ so->so_snd.sb_tls_info = tls;
+ if (tls->sw_encrypt == NULL)
+ so->so_snd.sb_flags |= SB_TLS_IFNET;
+ SOCKBUF_UNLOCK(&so->so_snd);
+ sbunlock(&so->so_snd);
+
+ counter_u64_add(ktls_offload_total, 1);
+
+ return (0);
+}
+
+int
+ktls_get_tx_mode(struct socket *so)
+{
+ struct ktls_session *tls;
+ struct inpcb *inp;
+ int mode;
+
+ inp = so->so_pcb;
+ INP_WLOCK_ASSERT(inp);
+ SOCKBUF_LOCK(&so->so_snd);
+ tls = so->so_snd.sb_tls_info;
+ if (tls == NULL)
+ mode = TCP_TLS_MODE_NONE;
+ else if (tls->sw_encrypt != NULL)
+ mode = TCP_TLS_MODE_SW;
+ else
+ mode = TCP_TLS_MODE_IFNET;
+ SOCKBUF_UNLOCK(&so->so_snd);
+ return (mode);
+}
+
+/*
+ * Switch between SW and ifnet TLS sessions as requested.
+ */
+int
+ktls_set_tx_mode(struct socket *so, int mode)
+{
+ struct ktls_session *tls, *tls_new;
+ struct inpcb *inp;
+ int error;
+
+ MPASS(mode == TCP_TLS_MODE_SW || mode == TCP_TLS_MODE_IFNET);
+
+ inp = so->so_pcb;
+ INP_WLOCK_ASSERT(inp);
+ SOCKBUF_LOCK(&so->so_snd);
+ tls = so->so_snd.sb_tls_info;
+ if (tls == NULL) {
+ SOCKBUF_UNLOCK(&so->so_snd);
+ return (0);
+ }
+
+ if ((tls->sw_encrypt != NULL && mode == TCP_TLS_MODE_SW) ||
+ (tls->sw_encrypt == NULL && mode == TCP_TLS_MODE_IFNET)) {
+ SOCKBUF_UNLOCK(&so->so_snd);
+ return (0);
+ }
+
+ tls = ktls_hold(tls);
+ SOCKBUF_UNLOCK(&so->so_snd);
+ INP_WUNLOCK(inp);
+
+ tls_new = ktls_clone_session(tls);
+
+ if (mode == TCP_TLS_MODE_IFNET)
+ error = ktls_try_ifnet(so, tls_new, true);
+ else
+ error = ktls_try_sw(so, tls_new);
+ if (error) {
+ counter_u64_add(ktls_switch_failed, 1);
+ ktls_free(tls_new);
+ ktls_free(tls);
+ INP_WLOCK(inp);
+ return (error);
+ }
+
+ error = sblock(&so->so_snd, SBL_WAIT);
+ if (error) {
+ counter_u64_add(ktls_switch_failed, 1);
+ ktls_free(tls_new);
+ ktls_free(tls);
+ INP_WLOCK(inp);
+ return (error);
+ }
+
+ /*
+ * If we raced with another session change, keep the existing
+ * session.
+ */
+ if (tls != so->so_snd.sb_tls_info) {
+ counter_u64_add(ktls_switch_failed, 1);
+ sbunlock(&so->so_snd);
+ ktls_free(tls_new);
+ ktls_free(tls);
+ INP_WLOCK(inp);
+ return (EBUSY);
+ }
+
+ SOCKBUF_LOCK(&so->so_snd);
+ so->so_snd.sb_tls_info = tls_new;
+ if (tls_new->sw_encrypt == NULL)
+ so->so_snd.sb_flags |= SB_TLS_IFNET;
+ SOCKBUF_UNLOCK(&so->so_snd);
+ sbunlock(&so->so_snd);
+
+ /*
+ * Drop two references on 'tls'. The first is for the
+ * ktls_hold() above. The second drops the reference from the
+ * socket buffer.
+ */
+ KASSERT(tls->refcount >= 2, ("too few references on old session"));
+ ktls_free(tls);
+ ktls_free(tls);
+
+ if (mode == TCP_TLS_MODE_IFNET)
+ counter_u64_add(ktls_switch_to_ifnet, 1);
+ else
+ counter_u64_add(ktls_switch_to_sw, 1);
+
+ INP_WLOCK(inp);
+ return (0);
+}
+
+/*
+ * Try to allocate a new TLS send tag. This task is scheduled when
+ * ip_output detects a route change while trying to transmit a packet
+ * holding a TLS record. If a new tag is allocated, replace the tag
+ * in the TLS session. Subsequent packets on the connection will use
+ * the new tag. If a new tag cannot be allocated, drop the
+ * connection.
+ */
+static void
+ktls_reset_send_tag(void *context, int pending)
+{
+ struct epoch_tracker et;
+ struct ktls_session *tls;
+ struct m_snd_tag *old, *new;
+ struct inpcb *inp;
+ struct tcpcb *tp;
+ int error;
+
+ MPASS(pending == 1);
+
+ tls = context;
+ inp = tls->inp;
+
+ /*
+ * Free the old tag first before allocating a new one.
+ * ip[6]_output_send() will treat a NULL send tag the same as
+ * an ifp mismatch and drop packets until a new tag is
+ * allocated.
+ *
+ * Write-lock the INP when changing tls->snd_tag since
+ * ip[6]_output_send() holds a read-lock when reading the
+ * pointer.
+ */
+ INP_WLOCK(inp);
+ old = tls->snd_tag;
+ tls->snd_tag = NULL;
+ INP_WUNLOCK(inp);
+ if (old != NULL)
+ m_snd_tag_rele(old);
+
+ error = ktls_alloc_snd_tag(inp, tls, true, &new);
+
+ if (error == 0) {
+ INP_WLOCK(inp);
+ tls->snd_tag = new;
+ mtx_pool_lock(mtxpool_sleep, tls);
+ tls->reset_pending = false;
+ mtx_pool_unlock(mtxpool_sleep, tls);
+ if (!in_pcbrele_wlocked(inp))
+ INP_WUNLOCK(inp);
+
+ counter_u64_add(ktls_ifnet_reset, 1);
+
+ /*
+ * XXX: Should we kick tcp_output explicitly now that
+ * the send tag is fixed or just rely on timers?
+ */
+ } else {
+ INP_INFO_RLOCK_ET(&V_tcbinfo, et);
+ INP_WLOCK(inp);
+ if (!in_pcbrele_wlocked(inp)) {
+ if (!(inp->inp_flags & INP_TIMEWAIT) &&
+ !(inp->inp_flags & INP_DROPPED)) {
+ tp = intotcpcb(inp);
+ tp = tcp_drop(tp, ECONNABORTED);
+ if (tp != NULL)
+ INP_WUNLOCK(inp);
+ counter_u64_add(ktls_ifnet_reset_dropped, 1);
+ } else
+ INP_WUNLOCK(inp);
+ }
+ INP_INFO_RUNLOCK_ET(&V_tcbinfo, et);
+
+ counter_u64_add(ktls_ifnet_reset_failed, 1);
+
+ /*
+ * Leave reset_pending true to avoid future tasks while
+ * the socket goes away.
+ */
+ }
+
+ ktls_free(tls);
+}
+
+int
+ktls_output_eagain(struct inpcb *inp, struct ktls_session *tls)
+{
+
+ if (inp == NULL)
+ return (ENOBUFS);
+
+ INP_LOCK_ASSERT(inp);
+
+ /*
+ * See if we should schedule a task to update the send tag for
+ * this session.
+ */
+ mtx_pool_lock(mtxpool_sleep, tls);
+ if (!tls->reset_pending) {
+ (void) ktls_hold(tls);
+ in_pcbref(inp);
+ tls->inp = inp;
+ tls->reset_pending = true;
+ taskqueue_enqueue(taskqueue_thread, &tls->reset_tag_task);
+ }
+ mtx_pool_unlock(mtxpool_sleep, tls);
+ return (ENOBUFS);
+}
+#endif
+
+void
+ktls_destroy(struct ktls_session *tls)
+{
+ struct rm_priotracker prio;
+
+ ktls_cleanup(tls);
+ if (tls->be != NULL && ktls_allow_unload) {
+ rm_rlock(&ktls_backends_lock, &prio);
+ tls->be->use_count--;
+ rm_runlock(&ktls_backends_lock, &prio);
+ }
+ uma_zfree(ktls_session_zone, tls);
+}
+
+void
+ktls_seq(struct sockbuf *sb, struct mbuf *m)
+{
+ struct mbuf_ext_pgs *pgs;
+ struct tls_record_layer *tlshdr;
+ uint64_t seqno;
+
+ for (; m != NULL; m = m->m_next) {
+ KASSERT((m->m_flags & M_NOMAP) != 0,
+ ("ktls_seq: mapped mbuf %p", m));
+
+ pgs = m->m_ext.ext_pgs;
+ pgs->seqno = sb->sb_tls_seqno;
+
+ /*
+ * Store the sequence number in the TLS header as the
+ * explicit part of the IV for GCM.
+ */
+ if (pgs->tls->params.cipher_algorithm ==
+ CRYPTO_AES_NIST_GCM_16) {
+ tlshdr = (void *)pgs->hdr;
+ seqno = htobe64(pgs->seqno);
+ memcpy(tlshdr + 1, &seqno, sizeof(seqno));
+ }
+ sb->sb_tls_seqno++;
+ }
+}
+
+/*
+ * Add TLS framing (headers and trailers) to a chain of mbufs. Each
+ * mbuf in the chain must be an unmapped mbuf. The payload of the
+ * mbuf must be populated with the payload of each TLS record.
+ *
+ * The record_type argument specifies the TLS record type used when
+ * populating the TLS header.
+ *
+ * The enq_count argument on return is set to the number of pages of
+ * payload data for this entire chain that need to be encrypted via SW
+ * encryption. The returned value should be passed to ktls_enqueue
+ * when scheduling encryption of this chain of mbufs.
+ */
+int
+ktls_frame(struct mbuf *top, struct ktls_session *tls, int *enq_cnt,
+ uint8_t record_type)
+{
+ struct tls_record_layer *tlshdr;
+ struct mbuf *m;
+ struct mbuf_ext_pgs *pgs;
+ uint16_t tls_len;
+ int maxlen;
+
+ maxlen = tls->params.max_frame_len;
+ *enq_cnt = 0;
+ for (m = top; m != NULL; m = m->m_next) {
+ /*
+ * All mbufs in the chain should be non-empty TLS
+ * records whose payload does not exceed the maximum
+ * frame length.
+ */
+ if (m->m_len > maxlen || m->m_len == 0)
+ return (EINVAL);
+ tls_len = m->m_len;
+
+ /*
+ * TLS frames require unmapped mbufs to store session
+ * info.
+ */
+ KASSERT((m->m_flags & M_NOMAP) != 0,
+ ("ktls_frame: mapped mbuf %p (top = %p)\n", m, top));
+
+ pgs = m->m_ext.ext_pgs;
+
+ /* Save a reference to the session. */
+ pgs->tls = ktls_hold(tls);
+
+ pgs->hdr_len = tls->params.tls_hlen;
+ pgs->trail_len = tls->params.tls_tlen;
+ if (tls->params.cipher_algorithm == CRYPTO_AES_CBC) {
+ int bs, delta;
+
+ /*
+ * AES-CBC pads messages to a multiple of the
+ * block size. Note that the padding is
+ * applied after the digest and the encryption
+ * is done on the "plaintext || mac || padding".
+ * At least one byte of padding is always
+ * present.
+ *
+ * Compute the final trailer length assuming
+ * at most one block of padding.
+ * tls->params.sb_tls_tlen is the maximum
+ * possible trailer length (padding + digest).
+ * delta holds the number of excess padding
+ * bytes if the maximum were used. Those
+ * extra bytes are removed.
+ */
+ bs = tls->params.tls_bs;
+ delta = (tls_len + tls->params.tls_tlen) & (bs - 1);
+ pgs->trail_len -= delta;
+ }
+ m->m_len += pgs->hdr_len + pgs->trail_len;
+
+ /* Populate the TLS header. */
+ tlshdr = (void *)pgs->hdr;
+ tlshdr->tls_vmajor = tls->params.tls_vmajor;
+ tlshdr->tls_vminor = tls->params.tls_vminor;
+ tlshdr->tls_type = record_type;
+ tlshdr->tls_length = htons(m->m_len - sizeof(*tlshdr));
+
+ /*
+ * For GCM, the sequence number is stored in the
+ * header by ktls_seq(). For CBC, a random nonce is
+ * inserted for TLS 1.1+.
+ */
+ if (tls->params.cipher_algorithm == CRYPTO_AES_CBC &&
+ tls->params.tls_vminor >= TLS_MINOR_VER_ONE)
+ arc4rand(tlshdr + 1, AES_BLOCK_LEN, 0);
+
+ /*
+ * When using SW encryption, mark the mbuf not ready.
+ * It will be marked ready via sbready() after the
+ * record has been encrypted.
+ *
+ * When using ifnet TLS, unencrypted TLS records are
+ * sent down the stack to the NIC.
+ */
+ if (tls->sw_encrypt != NULL) {
+ m->m_flags |= M_NOTREADY;
+ pgs->nrdy = pgs->npgs;
+ *enq_cnt += pgs->npgs;
+ }
+ }
+ return (0);
+}
+
+void
+ktls_enqueue_to_free(struct mbuf_ext_pgs *pgs)
+{
+ struct ktls_wq *wq;
+ bool running;
+
+ /* Mark it for freeing. */
+ pgs->mbuf = NULL;
+ wq = &ktls_wq[pgs->tls->wq_index];
+ mtx_lock(&wq->mtx);
+ STAILQ_INSERT_TAIL(&wq->head, pgs, stailq);
+ running = wq->running;
+ mtx_unlock(&wq->mtx);
+ if (!running)
+ wakeup(wq);
+}
+
+void
+ktls_enqueue(struct mbuf *m, struct socket *so, int page_count)
+{
+ struct mbuf_ext_pgs *pgs;
+ struct ktls_wq *wq;
+ bool running;
+
+ KASSERT(((m->m_flags & (M_NOMAP | M_NOTREADY)) ==
+ (M_NOMAP | M_NOTREADY)),
+ ("ktls_enqueue: %p not unready & nomap mbuf\n", m));
+ KASSERT(page_count != 0, ("enqueueing TLS mbuf with zero page count"));
+
+ pgs = m->m_ext.ext_pgs;
+
+ KASSERT(pgs->tls->sw_encrypt != NULL, ("ifnet TLS mbuf"));
+
+ pgs->enc_cnt = page_count;
+ pgs->mbuf = m;
+
+ /*
+ * Save a pointer to the socket. The caller is responsible
+ * for taking an additional reference via soref().
+ */
+ pgs->so = so;
+
+ wq = &ktls_wq[pgs->tls->wq_index];
+ mtx_lock(&wq->mtx);
+ STAILQ_INSERT_TAIL(&wq->head, pgs, stailq);
+ running = wq->running;
+ mtx_unlock(&wq->mtx);
+ if (!running)
+ wakeup(wq);
+ counter_u64_add(ktls_cnt_on, 1);
+}
+
+static __noinline void
+ktls_encrypt(struct mbuf_ext_pgs *pgs)
+{
+ struct ktls_session *tls;
+ struct socket *so;
+ struct mbuf *m, *top;
+ vm_paddr_t parray[1 + btoc(TLS_MAX_MSG_SIZE_V10_2)];
+ struct iovec src_iov[1 + btoc(TLS_MAX_MSG_SIZE_V10_2)];
+ struct iovec dst_iov[1 + btoc(TLS_MAX_MSG_SIZE_V10_2)];
+ vm_page_t pg;
+ int error, i, len, npages, off, total_pages;
+ bool is_anon;
+
+ so = pgs->so;
+ tls = pgs->tls;
+ top = pgs->mbuf;
+ KASSERT(tls != NULL, ("tls = NULL, top = %p, pgs = %p\n", top, pgs));
+ KASSERT(so != NULL, ("so = NULL, top = %p, pgs = %p\n", top, pgs));
+#ifdef INVARIANTS
+ pgs->so = NULL;
+ pgs->mbuf = NULL;
+#endif
+ total_pages = pgs->enc_cnt;
+ npages = 0;
+
+ /*
+ * Encrypt the TLS records in the chain of mbufs starting with
+ * 'top'. 'total_pages' gives us a total count of pages and is
+ * used to know when we have finished encrypting the TLS
+ * records originally queued with 'top'.
+ *
+ * NB: These mbufs are queued in the socket buffer and
+ * 'm_next' is traversing the mbufs in the socket buffer. The
+ * socket buffer lock is not held while traversing this chain.
+ * Since the mbufs are all marked M_NOTREADY their 'm_next'
+ * pointers should be stable. However, the 'm_next' of the
+ * last mbuf encrypted is not necessarily NULL. It can point
+ * to other mbufs appended while 'top' was on the TLS work
+ * queue.
+ *
+ * Each mbuf holds an entire TLS record.
+ */
+ error = 0;
+ for (m = top; npages != total_pages; m = m->m_next) {
+ pgs = m->m_ext.ext_pgs;
+
+ KASSERT(pgs->tls == tls,
+ ("different TLS sessions in a single mbuf chain: %p vs %p",
+ tls, pgs->tls));
+ KASSERT((m->m_flags & (M_NOMAP | M_NOTREADY)) ==
+ (M_NOMAP | M_NOTREADY),
+ ("%p not unready & nomap mbuf (top = %p)\n", m, top));
+ KASSERT(npages + pgs->npgs <= total_pages,
+ ("page count mismatch: top %p, total_pages %d, m %p", top,
+ total_pages, m));
+
+ /*
+ * Generate source and destination ivoecs to pass to
+ * the SW encryption backend. For writable mbufs, the
+ * destination iovec is a copy of the source and
+ * encryption is done in place. For file-backed mbufs
+ * (from sendfile), anonymous wired pages are
+ * allocated and assigned to the destination iovec.
+ */
+ is_anon = M_WRITABLE(m);
+
+ off = pgs->first_pg_off;
+ for (i = 0; i < pgs->npgs; i++, off = 0) {
+ len = mbuf_ext_pg_len(pgs, i, off);
+ src_iov[i].iov_len = len;
+ src_iov[i].iov_base =
+ (char *)(void *)PHYS_TO_DMAP(pgs->pa[i]) + off;
+
+ if (is_anon) {
+ dst_iov[i].iov_base = src_iov[i].iov_base;
+ dst_iov[i].iov_len = src_iov[i].iov_len;
+ continue;
+ }
+retry_page:
+ pg = vm_page_alloc(NULL, 0, VM_ALLOC_NORMAL |
+ VM_ALLOC_NOOBJ | VM_ALLOC_NODUMP | VM_ALLOC_WIRED);
+ if (pg == NULL) {
+ vm_wait(NULL);
+ goto retry_page;
+ }
+ parray[i] = VM_PAGE_TO_PHYS(pg);
+ dst_iov[i].iov_base =
+ (char *)(void *)PHYS_TO_DMAP(parray[i]) + off;
+ dst_iov[i].iov_len = len;
+ }
+
+ npages += i;
+
+ error = (*tls->sw_encrypt)(tls,
+ (const struct tls_record_layer *)pgs->hdr,
+ pgs->trail, src_iov, dst_iov, i, pgs->seqno);
+ if (error) {
+ counter_u64_add(ktls_offload_failed_crypto, 1);
+ break;
+ }
+
+ /*
+ * For file-backed mbufs, release the file-backed
+ * pages and replace them in the ext_pgs array with
+ * the anonymous wired pages allocated above.
+ */
+ if (!is_anon) {
+ /* Free the old pages. */
+ m->m_ext.ext_free(m);
+
+ /* Replace them with the new pages. */
+ for (i = 0; i < pgs->npgs; i++)
+ pgs->pa[i] = parray[i];
+
+ /* Use the basic free routine. */
+ m->m_ext.ext_free = mb_free_mext_pgs;
+ }
+
+ /*
+ * Drop a reference to the session now that it is no
+ * longer needed. Existing code depends on encrypted
+ * records having no associated session vs
+ * yet-to-be-encrypted records having an associated
+ * session.
+ */
+ pgs->tls = NULL;
+ ktls_free(tls);
+ }
+
+ CURVNET_SET(so->so_vnet);
+ if (error == 0) {
+ (void)(*so->so_proto->pr_usrreqs->pru_ready)(so, top, npages);
+ } else {
+ so->so_proto->pr_usrreqs->pru_abort(so);
+ so->so_error = EIO;
+ mb_free_notready(top, total_pages);
+ }
+
+ SOCK_LOCK(so);
+ sorele(so);
+ CURVNET_RESTORE();
+}
+
+static void
+ktls_work_thread(void *ctx)
+{
+ struct ktls_wq *wq = ctx;
+ struct mbuf_ext_pgs *p, *n;
+ struct ktls_session *tls;
+ STAILQ_HEAD(, mbuf_ext_pgs) local_head;
+
+#if defined(__aarch64__) || defined(__amd64__) || defined(__i386__)
+ fpu_kern_thread(0);
+#endif
+ for (;;) {
+ mtx_lock(&wq->mtx);
+ while (STAILQ_EMPTY(&wq->head)) {
+ wq->running = false;
+ mtx_sleep(wq, &wq->mtx, 0, "-", 0);
+ wq->running = true;
+ }
+
+ STAILQ_INIT(&local_head);
+ STAILQ_CONCAT(&local_head, &wq->head);
+ mtx_unlock(&wq->mtx);
+
+ STAILQ_FOREACH_SAFE(p, &local_head, stailq, n) {
+ if (p->mbuf != NULL) {
+ ktls_encrypt(p);
+ counter_u64_add(ktls_cnt_on, -1);
+ } else {
+ tls = p->tls;
+ ktls_free(tls);
+ uma_zfree(zone_extpgs, p);
+ }
+ }
+ }
+}
diff --git a/sys/kern/uipc_sockbuf.c b/sys/kern/uipc_sockbuf.c
index 32145179109a..3a43ee873791 100644
--- a/sys/kern/uipc_sockbuf.c
+++ b/sys/kern/uipc_sockbuf.c
@@ -34,11 +34,13 @@
#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");
+#include "opt_kern_tls.h"
#include "opt_param.h"
#include <sys/param.h>
#include <sys/aio.h> /* for aio_swake proto */
#include <sys/kernel.h>
+#include <sys/ktls.h>
#include <sys/lock.h>
#include <sys/malloc.h>
#include <sys/mbuf.h>
@@ -112,7 +114,8 @@ sbready_compress(struct sockbuf *sb, struct mbuf *m0, struct mbuf *end)
MPASS((m->m_flags & M_NOTREADY) == 0);
/* Compress small unmapped mbufs into plain mbufs. */
- if ((m->m_flags & M_NOMAP) && m->m_len <= MLEN) {
+ if ((m->m_flags & M_NOMAP) && m->m_len <= MLEN &&
+ !mbuf_has_tls_session(m)) {
MPASS(m->m_flags & M_EXT);
ext_size = m->m_ext.ext_size;
if (mb_unmapped_compress(m) == 0) {
@@ -133,6 +136,8 @@ sbready_compress(struct sockbuf *sb, struct mbuf *m0, struct mbuf *end)
while ((n != NULL) && (n != end) && (m->m_flags & M_EOR) == 0 &&
M_WRITABLE(m) &&
(m->m_flags & M_NOMAP) == 0 &&
+ !mbuf_has_tls_session(n) &&
+ !mbuf_has_tls_session(m) &&
n->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */
n->m_len <= M_TRAILINGSPACE(m) &&
m->m_type == n->m_type) {
@@ -668,6 +673,11 @@ sbdestroy(struct sockbuf *sb, struct socket *so)
{
sbrelease_internal(sb, so);
+#ifdef KERN_TLS
+ if (sb->sb_tls_info != NULL)
+ ktls_free(sb->sb_tls_info);
+ sb->sb_tls_info = NULL;
+#endif
}
/*
@@ -831,6 +841,11 @@ sbappendstream_locked(struct sockbuf *sb, struct mbuf *m, int flags)
SBLASTMBUFCHK(sb);
+#ifdef KERN_TLS
+ if (sb->sb_tls_info != NULL)
+ ktls_seq(sb, m);
+#endif
+
/* Remove all packet headers and mbuf tags to get a pure data chain. */
m_demote(m, 1, flags & PRUS_NOTREADY ? M_NOTREADY : 0);
@@ -1134,6 +1149,8 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n)
((sb->sb_flags & SB_NOCOALESCE) == 0) &&
!(m->m_flags & M_NOTREADY) &&
!(n->m_flags & (M_NOTREADY | M_NOMAP)) &&
+ !mbuf_has_tls_session(m) &&
+ !mbuf_has_tls_session(n) &&
m->m_len <= MCLBYTES / 4 && /* XXX: Don't copy too much */
m->m_len <= M_TRAILINGSPACE(n) &&
n->m_type == m->m_type) {
@@ -1149,7 +1166,8 @@ sbcompress(struct sockbuf *sb, struct mbuf *m, struct mbuf *n)
continue;
}
if (m->m_len <= MLEN && (m->m_flags & M_NOMAP) &&
- (m->m_flags & M_NOTREADY) == 0)
+ (m->m_flags & M_NOTREADY) == 0 &&
+ !mbuf_has_tls_session(m))
(void)mb_unmapped_compress(m);
if (n)
n->m_next = m;
diff --git a/sys/kern/uipc_socket.c b/sys/kern/uipc_socket.c
index 98ab98d8f61c..1f480462db2a 100644
--- a/sys/kern/uipc_socket.c
+++ b/sys/kern/uipc_socket.c
@@ -107,6 +107,7 @@ __FBSDID("$FreeBSD$");
#include "opt_inet.h"
#include "opt_inet6.h"
+#include "opt_kern_tls.h"
#include "opt_sctp.h"
#include <sys/param.h>
@@ -123,6 +124,7 @@ __FBSDID("$FreeBSD$");
#include <sys/hhook.h>
#include <sys/kernel.h>
#include <sys/khelp.h>
+#include <sys/ktls.h>
#include <sys/event.h>
#include <sys/eventhandler.h>
#include <sys/poll.h>
@@ -141,6 +143,7 @@ __FBSDID("$FreeBSD$");
#include <sys/jail.h>
#include <sys/syslog.h>
#include <netinet/in.h>
+#include <netinet/tcp.h>
#include <net/vnet.h>
@@ -1442,7 +1445,15 @@ sosend_generic(struct socket *so, struct sockaddr *addr, struct uio *uio,
ssize_t resid;
int clen = 0, error, dontroute;
int atomic = sosendallatonce(so) || top;
-
+ int pru_flag;
+#ifdef KERN_TLS
+ struct ktls_session *tls;
+ int tls_enq_cnt, tls_pruflag;
+ uint8_t tls_rtype;
+
+ tls = NULL;
+ tls_rtype = TLS_RLTYPE_APP;
+#endif
if (uio != NULL)
resid = uio->uio_resid;
else
@@ -1474,6 +1485,28 @@ sosend_generic(struct socket *so, struct sockaddr *addr, struct uio *uio,
if (error)
goto out;
+#ifdef KERN_TLS
+ tls_pruflag = 0;
+ tls = ktls_hold(so->so_snd.sb_tls_info);
+ if (tls != NULL) {
+ if (tls->sw_encrypt != NULL)
+ tls_pruflag = PRUS_NOTREADY;
+
+ if (control != NULL) {
+ struct cmsghdr *cm = mtod(control, struct cmsghdr *);
+
+ if (clen >= sizeof(*cm) &&
+ cm->cmsg_type == TLS_SET_RECORD_TYPE) {
+ tls_rtype = *((uint8_t *)CMSG_DATA(cm));
+ clen = 0;
+ m_freem(control);
+ control = NULL;
+ atomic = 1;
+ }
+ }
+ }
+#endif
+
restart:
do {
SOCKBUF_LOCK(&so->so_snd);
@@ -1551,10 +1584,27 @@ restart:
* is a workaround to prevent protocol send
* methods to panic.
*/
- top = m_uiotombuf(uio, M_WAITOK, space,
- (atomic ? max_hdr : 0),
- (atomic ? M_PKTHDR : 0) |
- ((flags & MSG_EOR) ? M_EOR : 0));
+#ifdef KERN_TLS
+ if (tls != NULL) {
+ top = m_uiotombuf(uio, M_WAITOK, space,
+ tls->params.max_frame_len,
+ M_NOMAP |
+ ((flags & MSG_EOR) ? M_EOR : 0));
+ if (top != NULL) {
+ error = ktls_frame(top, tls,
+ &tls_enq_cnt, tls_rtype);
+ if (error) {
+ m_freem(top);
+ goto release;
+ }
+ }
+ tls_rtype = TLS_RLTYPE_APP;
+ } else
+#endif
+ top = m_uiotombuf(uio, M_WAITOK, space,
+ (atomic ? max_hdr : 0),
+ (atomic ? M_PKTHDR : 0) |
+ ((flags & MSG_EOR) ? M_EOR : 0));
if (top == NULL) {
error = EFAULT; /* only possible error */
goto release;
@@ -1578,8 +1628,8 @@ restart:
* this.
*/
VNET_SO_ASSERT(so);
- error = (*so->so_proto->pr_usrreqs->pru_send)(so,
- (flags & MSG_OOB) ? PRUS_OOB :
+
+ pru_flag = (flags & MSG_OOB) ? PRUS_OOB :
/*
* If the user set MSG_EOF, the protocol understands
* this flag and nothing left to send then use
@@ -1591,13 +1641,37 @@ restart:
PRUS_EOF :
/* If there is more to send set PRUS_MORETOCOME. */
(flags & MSG_MORETOCOME) ||
- (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0,
- top, addr, control, td);
+ (resid > 0 && space > 0) ? PRUS_MORETOCOME : 0;
+
+#ifdef KERN_TLS
+ pru_flag |= tls_pruflag;
+#endif
+
+ error = (*so->so_proto->pr_usrreqs->pru_send)(so,
+ pru_flag, top, addr, control, td);
+
if (dontroute) {
SOCK_LOCK(so);
so->so_options &= ~SO_DONTROUTE;
SOCK_UNLOCK(so);
}
+
+#ifdef KERN_TLS
+ if (tls != NULL && tls->sw_encrypt != NULL) {
+ /*
+ * Note that error is intentionally
+ * ignored.
+ *
+ * Like sendfile(), we rely on the
+ * completion routine (pru_ready())
+ * to free the mbufs in the event that
+ * pru_send() encountered an error and
+ * did not append them to the sockbuf.
+ */
+ soref(so);
+ ktls_enqueue(top, so, tls_enq_cnt);
+ }
+#endif
clen = 0;
control = NULL;
top = NULL;
@@ -1609,6 +1683,10 @@ restart:
release:
sbunlock(&so->so_snd);
out:
+#ifdef KERN_TLS
+ if (tls != NULL)
+ ktls_free(tls);
+#endif
if (top != NULL)
m_freem(top);
if (control != NULL)
diff --git a/sys/modules/Makefile b/sys/modules/Makefile
index 999fedd2e5f0..f5c0995af6a0 100644
--- a/sys/modules/Makefile
+++ b/sys/modules/Makefile
@@ -200,6 +200,7 @@ SUBDIR= \
khelp \
krpc \
ksyms \
+ ${_ktls_ocf} \
le \
lge \
libalias \
@@ -412,6 +413,7 @@ _crypto= crypto
_cryptodev= cryptodev
_random_fortuna=random_fortuna
_random_other= random_other
+_ktls_ocf= ktls_ocf
.endif
.endif
diff --git a/sys/modules/ktls_ocf/Makefile b/sys/modules/ktls_ocf/Makefile
new file mode 100644
index 000000000000..01e6fe87177b
--- /dev/null
+++ b/sys/modules/ktls_ocf/Makefile
@@ -0,0 +1,8 @@
+# $FreeBSD$
+
+.PATH: ${SRCTOP}/sys/opencrypto
+
+KMOD= ktls_ocf
+SRCS= ktls_ocf.c
+
+.include <bsd.kmod.mk>
diff --git a/sys/net/ieee8023ad_lacp.c b/sys/net/ieee8023ad_lacp.c
index b6f41b204f9c..7358b7cfa5e0 100644
--- a/sys/net/ieee8023ad_lacp.c
+++ b/sys/net/ieee8023ad_lacp.c
@@ -32,6 +32,7 @@
#include <sys/cdefs.h>
__FBSDID("$FreeBSD$");
+#include "opt_kern_tls.h"
#include "opt_ratelimit.h"
#include <sys/param.h>
@@ -882,7 +883,7 @@ lacp_select_tx_port(struct lagg_softc *sc, struct mbuf *m)
return (lp->lp_lagg);
}
-#ifdef RATELIMIT
+#if defined(RATELIMIT) || defined(KERN_TLS)
struct lagg_port *
lacp_select_tx_port_by_hash(struct lagg_softc *sc, uint32_t flowid)
{
diff --git a/sys/net/ieee8023ad_lacp.h b/sys/net/ieee8023ad_lacp.h
index 8d6438c1ec59..b6a0860ff1e0 100644
--- a/sys/net/ieee8023ad_lacp.h
+++ b/sys/net/ieee8023ad_lacp.h
@@ -293,7 +293,7 @@ struct lacp_softc {
struct mbuf *lacp_input(struct lagg_port *, struct mbuf *);
struct lagg_port *lacp_select_tx_port(struct lagg_softc *, struct mbuf *);
-#ifdef RATELIMIT
+#if defined(RATELIMIT) || defined(KERN_TLS)
struct lagg_port *lacp_select_tx_port_by_hash(struct lagg_softc *, uint32_t);
#endif
void lacp_attach(struct lagg_softc *);
diff --git a/sys/net/if.h b/sys/net/if.h
index 3c22a408f45c..add4df55a3d9 100644
--- a/sys/net/if.h
+++ b/sys/net/if.h
@@ -247,6 +247,8 @@ struct if_data {
#define IFCAP_TXRTLMT 0x1000000 /* hardware supports TX rate limiting */
#define IFCAP_HWRXTSTMP 0x2000000 /* hardware rx timestamping */
#define IFCAP_NOMAP 0x4000000 /* can TX unmapped mbufs */
+#define IFCAP_TXTLS4 0x8000000 /* can do TLS encryption and segmentation for TCP */
+#define IFCAP_TXTLS6 0x10000000 /* can do TLS encryption and segmentation for TCP6 */
#define IFCAP_HWCSUM_IPV6 (IFCAP_RXCSUM_IPV6 | IFCAP_TXCSUM_IPV6)
@@ -254,6 +256,7 @@ struct if_data {
#define IFCAP_TSO (IFCAP_TSO4 | IFCAP_TSO6)
#define IFCAP_WOL (IFCAP_WOL_UCAST | IFCAP_WOL_MCAST | IFCAP_WOL_MAGIC)
#define IFCAP_TOE (IFCAP_TOE4 | IFCAP_TOE6)
+#define IFCAP_TXTLS (IFCAP_TXTLS4 | IFCAP_TXTLS6)
#define IFCAP_CANTCHANGE (IFCAP_NETMAP)
diff --git a/sys/net/if_lagg.c b/sys/net/if_lagg.c
index 911f9c0cdbb6..d9f1ac47d137 100644
--- a/sys/net/if_lagg.c
+++ b/sys/net/if_lagg.c
@@ -23,6 +23,7 @@ __FBSDID("$FreeBSD$");
#include "opt_inet.h"
#include "opt_inet6.h"
+#include "opt_kern_tls.h"
#include "opt_ratelimit.h"
#include <sys/param.h>
@@ -135,7 +136,7 @@ static void lagg_port2req(struct lagg_port *, struct lagg_reqport *);
static void lagg_init(void *);
static void lagg_stop(struct lagg_softc *);
static int lagg_ioctl(struct ifnet *, u_long, caddr_t);
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
static int lagg_snd_tag_alloc(struct ifnet *,
union if_snd_tag_alloc_params *,
struct m_snd_tag **);
@@ -534,7 +535,7 @@ lagg_clone_create(struct if_clone *ifc, int unit, caddr_t params)
ifp->if_ioctl = lagg_ioctl;
ifp->if_get_counter = lagg_get_counter;
ifp->if_flags = IFF_SIMPLEX | IFF_BROADCAST | IFF_MULTICAST;
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
ifp->if_snd_tag_alloc = lagg_snd_tag_alloc;
ifp->if_snd_tag_modify = lagg_snd_tag_modify;
ifp->if_snd_tag_query = lagg_snd_tag_query;
@@ -1550,7 +1551,7 @@ lagg_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
return (error);
}
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
static inline struct lagg_snd_tag *
mst_to_lst(struct m_snd_tag *mst)
{
@@ -1811,7 +1812,7 @@ lagg_transmit(struct ifnet *ifp, struct mbuf *m)
struct lagg_softc *sc = (struct lagg_softc *)ifp->if_softc;
int error;
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
if (m->m_pkthdr.csum_flags & CSUM_SND_TAG)
MPASS(m->m_pkthdr.snd_tag->ifp == ifp);
#endif
@@ -2007,7 +2008,7 @@ int
lagg_enqueue(struct ifnet *ifp, struct mbuf *m)
{
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
if (m->m_pkthdr.csum_flags & CSUM_SND_TAG) {
struct lagg_snd_tag *lst;
struct m_snd_tag *mst;
diff --git a/sys/net/if_var.h b/sys/net/if_var.h
index 1e81e481f8eb..94581357e011 100644
--- a/sys/net/if_var.h
+++ b/sys/net/if_var.h
@@ -188,11 +188,13 @@ struct if_encap_req {
* m_snd_tag" comes from the network driver and it is free to allocate
* as much additional space as it wants for its own use.
*/
+struct ktls_session;
struct m_snd_tag;
#define IF_SND_TAG_TYPE_RATE_LIMIT 0
#define IF_SND_TAG_TYPE_UNLIMITED 1
-#define IF_SND_TAG_TYPE_MAX 2
+#define IF_SND_TAG_TYPE_TLS 2
+#define IF_SND_TAG_TYPE_MAX 3
struct if_snd_tag_alloc_header {
uint32_t type; /* send tag type, see IF_SND_TAG_XXX */
@@ -207,6 +209,12 @@ struct if_snd_tag_alloc_rate_limit {
uint32_t reserved; /* alignment */
};
+struct if_snd_tag_alloc_tls {
+ struct if_snd_tag_alloc_header hdr;
+ struct inpcb *inp;
+ const struct ktls_session *tls;
+};
+
struct if_snd_tag_rate_limit_params {
uint64_t max_rate; /* in bytes/s */
uint32_t queue_level; /* 0 (empty) .. 65535 (full) */
@@ -219,6 +227,7 @@ union if_snd_tag_alloc_params {
struct if_snd_tag_alloc_header hdr;
struct if_snd_tag_alloc_rate_limit rate_limit;
struct if_snd_tag_alloc_rate_limit unlimited;
+ struct if_snd_tag_alloc_tls tls;
};
union if_snd_tag_modify_params {
diff --git a/sys/net/if_vlan.c b/sys/net/if_vlan.c
index 681d651e7796..5571277f0923 100644
--- a/sys/net/if_vlan.c
+++ b/sys/net/if_vlan.c
@@ -46,6 +46,7 @@
__FBSDID("$FreeBSD$");
#include "opt_inet.h"
+#include "opt_kern_tls.h"
#include "opt_vlan.h"
#include "opt_ratelimit.h"
@@ -103,7 +104,7 @@ struct ifvlantrunk {
int refcnt;
};
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
struct vlan_snd_tag {
struct m_snd_tag com;
struct m_snd_tag *tag;
@@ -278,7 +279,7 @@ static void trunk_destroy(struct ifvlantrunk *trunk);
static void vlan_init(void *foo);
static void vlan_input(struct ifnet *ifp, struct mbuf *m);
static int vlan_ioctl(struct ifnet *ifp, u_long cmd, caddr_t addr);
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
static int vlan_snd_tag_alloc(struct ifnet *,
union if_snd_tag_alloc_params *, struct m_snd_tag **);
static int vlan_snd_tag_modify(struct m_snd_tag *,
@@ -1064,7 +1065,7 @@ vlan_clone_create(struct if_clone *ifc, char *name, size_t len, caddr_t params)
ifp->if_transmit = vlan_transmit;
ifp->if_qflush = vlan_qflush;
ifp->if_ioctl = vlan_ioctl;
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
ifp->if_snd_tag_alloc = vlan_snd_tag_alloc;
ifp->if_snd_tag_modify = vlan_snd_tag_modify;
ifp->if_snd_tag_query = vlan_snd_tag_query;
@@ -1157,7 +1158,7 @@ vlan_transmit(struct ifnet *ifp, struct mbuf *m)
BPF_MTAP(ifp, m);
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
if (m->m_pkthdr.csum_flags & CSUM_SND_TAG) {
struct vlan_snd_tag *vst;
struct m_snd_tag *mst;
@@ -1741,6 +1742,20 @@ vlan_capabilities(struct ifvlan *ifv)
cap |= (p->if_capabilities & IFCAP_NOMAP);
ena |= (mena & IFCAP_NOMAP);
+ /*
+ * If the parent interface can offload encryption and segmentation
+ * of TLS records over TCP, propagate it's capability to the VLAN
+ * interface.
+ *
+ * All TLS drivers in the tree today can deal with VLANs. If
+ * this ever changes, then a new IFCAP_VLAN_TXTLS can be
+ * defined.
+ */
+ if (p->if_capabilities & IFCAP_TXTLS)
+ cap |= p->if_capabilities & IFCAP_TXTLS;
+ if (p->if_capenable & IFCAP_TXTLS)
+ ena |= mena & IFCAP_TXTLS;
+
ifp->if_capabilities = cap;
ifp->if_capenable = ena;
ifp->if_hwassist = hwa;
@@ -1972,7 +1987,7 @@ vlan_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data)
return (error);
}
-#ifdef RATELIMIT
+#if defined(KERN_TLS) || defined(RATELIMIT)
static int
vlan_snd_tag_alloc(struct ifnet *ifp,
union if_snd_tag_alloc_params *params,
diff --git a/sys/netinet/ip_output.c b/sys/netinet/ip_output.c
index 223262003086..085040f25e64 100644
--- a/sys/netinet/ip_output.c
+++ b/sys/netinet/ip_output.c
@@ -36,6 +36,7 @@ __FBSDID("$FreeBSD$");
#include "opt_inet.h"
#include "opt_ipsec.h"
+#include "opt_kern_tls.h"
#include "opt_mbuf_stress_test.h"
#include "opt_mpath.h"
#include "opt_ratelimit.h"
@@ -46,6 +47,7 @@ __FBSDID("$FreeBSD$");
#include <sys/param.h>
#include <sys/systm.h>
#include <sys/kernel.h>
+#include <sys/ktls.h>
#include <sys/lock.h>
#include <sys/malloc.h>
#include <sys/mbuf.h>
@@ -212,14 +214,39 @@ static int
ip_output_send(struct inpcb *inp, struct ifnet *ifp, struct mbuf *m,
const struct sockaddr_in *gw, struct route *ro)
{
+#ifdef KERN_TLS
+ struct ktls_session *tls = NULL;
+#endif
struct m_snd_tag *mst;
int error;
MPASS((m->m_pkthdr.csum_flags & CSUM_SND_TAG) == 0);
mst = NULL;
+#ifdef KERN_TLS
+ /*
+ * If this is an unencrypted TLS record, save a reference to
+ * the record. This local reference is used to call
+ * ktls_output_eagain after the mbuf has been freed (thus
+ * dropping the mbuf's reference) in if_output.
+ */
+ if (m->m_next != NULL && mbuf_has_tls_session(m->m_next)) {
+ tls = ktls_hold(m->m_next->m_ext.ext_pgs->tls);
+ mst = tls->snd_tag;
+
+ /*
+ * If a TLS session doesn't have a valid tag, it must
+ * have had an earlier ifp mismatch, so drop this
+ * packet.
+ */
+ if (mst == NULL) {
+ error = EAGAIN;
+ goto done;
+ }
+ }
+#endif
#ifdef RATELIMIT
- if (inp != NULL) {
+ if (inp != NULL && mst == NULL) {
if ((inp->inp_flags2 & INP_RATE_LIMIT_CHANGED) != 0 ||
(inp->inp_snd_tag != NULL &&
inp->inp_snd_tag->ifp != ifp))
@@ -246,6 +273,13 @@ ip_output_send(struct inpcb *inp, struct ifnet *ifp, struct mbuf *m,
done:
/* Check for route change invalidating send tags. */
+#ifdef KERN_TLS
+ if (tls != NULL) {
+ if (error == EAGAIN)
+ error = ktls_output_eagain(inp, tls);
+ ktls_free(tls);
+ }
+#endif
#ifdef RATELIMIT
if (error == EAGAIN)
in_pcboutput_eagain(inp);
diff --git a/sys/netinet/tcp.h b/sys/netinet/tcp.h
index 6531decb0bfe..508d4b5fbc17 100644
--- a/sys/netinet/tcp.h
+++ b/sys/netinet/tcp.h
@@ -174,6 +174,8 @@ struct tcphdr {
#define TCP_LOGDUMP 37 /* dump connection log events to device */
#define TCP_LOGDUMPID 38 /* dump events from connections with same ID to
device */
+#define TCP_TXTLS_ENABLE 39 /* TLS framing and encryption for transmit */
+#define TCP_TXTLS_MODE 40 /* Transmit TLS mode */
#define TCP_CONGESTION 64 /* get/set congestion control algorithm */
#define TCP_CCALGOOPT 65 /* get/set cc algorithm specific options */
#define TCP_DELACK 72 /* socket option for delayed ack */
@@ -350,4 +352,14 @@ struct tcp_function_set {
uint32_t pcbcnt;
};
+/* TLS modes for TCP_TXTLS_MODE */
+#define TCP_TLS_MODE_NONE 0
+#define TCP_TLS_MODE_SW 1
+#define TCP_TLS_MODE_IFNET 2
+
+/*
+ * TCP Control message types
+ */
+#define TLS_SET_RECORD_TYPE 1
+
#endif /* !_NETINET_TCP_H_ */
diff --git a/sys/netinet/tcp_output.c b/sys/netinet/tcp_output.c
index f329703c3ffb..8904da5fe2d4 100644
--- a/sys/netinet/tcp_output.c
+++ b/sys/netinet/tcp_output.c
@@ -37,6 +37,7 @@ __FBSDID("$FreeBSD$");
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_ipsec.h"
+#include "opt_kern_tls.h"
#include "opt_tcpdebug.h"
#include <sys/param.h>
@@ -46,6 +47,9 @@ __FBSDID("$FreeBSD$");
#include <sys/hhook.h>
#endif
#include <sys/kernel.h>
+#ifdef KERN_TLS
+#include <sys/ktls.h>
+#endif
#include <sys/lock.h>
#include <sys/mbuf.h>
#include <sys/mutex.h>
@@ -219,6 +223,11 @@ tcp_output(struct tcpcb *tp)
isipv6 = (tp->t_inpcb->inp_vflag & INP_IPV6) != 0;
#endif
+#ifdef KERN_TLS
+ const bool hw_tls = (so->so_snd.sb_flags & SB_TLS_IFNET) != 0;
+#else
+ const bool hw_tls = false;
+#endif
INP_WLOCK_ASSERT(tp->t_inpcb);
@@ -1000,7 +1009,7 @@ send:
* to the offset in the socket buffer chain.
*/
mb = sbsndptr_noadv(&so->so_snd, off, &moff);
- if (len <= MHLEN - hdrlen - max_linkhdr) {
+ if (len <= MHLEN - hdrlen - max_linkhdr && !hw_tls) {
m_copydata(mb, moff, len,
mtod(m, caddr_t) + hdrlen);
if (SEQ_LT(tp->snd_nxt, tp->snd_max))
@@ -1013,7 +1022,7 @@ send:
msb = &so->so_snd;
m->m_next = tcp_m_copym(mb, moff,
&len, if_hw_tsomaxsegcount,
- if_hw_tsomaxsegsize, msb);
+ if_hw_tsomaxsegsize, msb, hw_tls);
if (len <= (tp->t_maxseg - optlen)) {
/*
* Must have ran out of mbufs for the copy
@@ -1816,8 +1825,12 @@ tcp_addoptions(struct tcpopt *to, u_char *optp)
*/
struct mbuf *
tcp_m_copym(struct mbuf *m, int32_t off0, int32_t *plen,
- int32_t seglimit, int32_t segsize, struct sockbuf *sb)
+ int32_t seglimit, int32_t segsize, struct sockbuf *sb, bool hw_tls)
{
+#ifdef KERN_TLS
+ struct ktls_session *tls, *ntls;
+ struct mbuf *start;
+#endif
struct mbuf *n, **np;
struct mbuf *top;
int32_t off = off0;
@@ -1849,6 +1862,13 @@ tcp_m_copym(struct mbuf *m, int32_t off0, int32_t *plen,
np = &top;
top = NULL;
pkthdrlen = NULL;
+#ifdef KERN_TLS
+ if (m->m_flags & M_NOMAP)
+ tls = m->m_ext.ext_pgs->tls;
+ else
+ tls = NULL;
+ start = m;
+#endif
while (len > 0) {
if (m == NULL) {
KASSERT(len == M_COPYALL,
@@ -1858,6 +1878,38 @@ tcp_m_copym(struct mbuf *m, int32_t off0, int32_t *plen,
*pkthdrlen = len_cp;
break;
}
+#ifdef KERN_TLS
+ if (hw_tls) {
+ if (m->m_flags & M_NOMAP)
+ ntls = m->m_ext.ext_pgs->tls;
+ else
+ ntls = NULL;
+
+ /*
+ * Avoid mixing TLS records with handshake
+ * data or TLS records from different
+ * sessions.
+ */
+ if (tls != ntls) {
+ MPASS(m != start);
+ *plen = len_cp;
+ if (pkthdrlen != NULL)
+ *pkthdrlen = len_cp;
+ break;
+ }
+
+ /*
+ * Don't end a send in the middle of a TLS
+ * record if it spans multiple TLS records.
+ */
+ if (tls != NULL && (m != start) && len < m->m_len) {
+ *plen = len_cp;
+ if (pkthdrlen != NULL)
+ *pkthdrlen = len_cp;
+ break;
+ }
+ }
+#endif
mlen = min(len, m->m_len - off);
if (seglimit) {
/*
diff --git a/sys/netinet/tcp_stacks/rack.c b/sys/netinet/tcp_stacks/rack.c
index 77adead5b100..21080532f946 100644
--- a/sys/netinet/tcp_stacks/rack.c
+++ b/sys/netinet/tcp_stacks/rack.c
@@ -6971,6 +6971,12 @@ rack_output(struct tcpcb *tp)
struct ip6_hdr *ip6 = NULL;
int32_t isipv6;
#endif
+#ifdef KERN_TLS
+ const bool hw_tls = (so->so_snd.sb_flags & SB_TLS_IFNET) != 0;
+#else
+ const bool hw_tls = false;
+#endif
+
/* setup and take the cache hits here */
rack = (struct tcp_rack *)tp->t_fb_ptr;
inp = rack->rc_inp;
@@ -7946,7 +7952,7 @@ send:
* sb_offset in the socket buffer chain.
*/
mb = sbsndptr_noadv(sb, sb_offset, &moff);
- if (len <= MHLEN - hdrlen - max_linkhdr) {
+ if (len <= MHLEN - hdrlen - max_linkhdr && !hw_tls) {
m_copydata(mb, moff, (int)len,
mtod(m, caddr_t)+hdrlen);
if (SEQ_LT(tp->snd_nxt, tp->snd_max))
@@ -7960,7 +7966,8 @@ send:
else
msb = sb;
m->m_next = tcp_m_copym(/*tp, */ mb, moff, &len,
- if_hw_tsomaxsegcount, if_hw_tsomaxsegsize, msb /*, 0, NULL*/);
+ if_hw_tsomaxsegcount, if_hw_tsomaxsegsize, msb,
+ hw_tls /*, NULL */);
if (len <= (tp->t_maxseg - optlen)) {
/*
* Must have ran out of mbufs for the copy
diff --git a/sys/netinet/tcp_subr.c b/sys/netinet/tcp_subr.c
index 9b4ce70a045f..c8d101596f23 100644
--- a/sys/netinet/tcp_subr.c
+++ b/sys/netinet/tcp_subr.c
@@ -37,6 +37,7 @@ __FBSDID("$FreeBSD$");
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_ipsec.h"
+#include "opt_kern_tls.h"
#include "opt_tcpdebug.h"
#include <sys/param.h>
@@ -50,6 +51,9 @@ __FBSDID("$FreeBSD$");
#ifdef TCP_HHOOK
#include <sys/khelp.h>
#endif
+#ifdef KERN_TLS
+#include <sys/ktls.h>
+#endif
#include <sys/sysctl.h>
#include <sys/jail.h>
#include <sys/malloc.h>
@@ -3076,6 +3080,120 @@ SYSCTL_PROC(_net_inet_tcp, TCPCTL_DROP, drop,
CTLFLAG_VNET | CTLTYPE_STRUCT | CTLFLAG_WR | CTLFLAG_SKIP, NULL,
0, sysctl_drop, "", "Drop TCP connection");
+#ifdef KERN_TLS
+static int
+sysctl_switch_tls(SYSCTL_HANDLER_ARGS)
+{
+ /* addrs[0] is a foreign socket, addrs[1] is a local one. */
+ struct sockaddr_storage addrs[2];
+ struct inpcb *inp;
+ struct sockaddr_in *fin, *lin;
+ struct epoch_tracker et;
+#ifdef INET6
+ struct sockaddr_in6 *fin6, *lin6;
+#endif
+ int error;
+
+ inp = NULL;
+ fin = lin = NULL;
+#ifdef INET6
+ fin6 = lin6 = NULL;
+#endif
+ error = 0;
+
+ if (req->oldptr != NULL || req->oldlen != 0)
+ return (EINVAL);
+ if (req->newptr == NULL)
+ return (EPERM);
+ if (req->newlen < sizeof(addrs))
+ return (ENOMEM);
+ error = SYSCTL_IN(req, &addrs, sizeof(addrs));
+ if (error)
+ return (error);
+
+ switch (addrs[0].ss_family) {
+#ifdef INET6
+ case AF_INET6:
+ fin6 = (struct sockaddr_in6 *)&addrs[0];
+ lin6 = (struct sockaddr_in6 *)&addrs[1];
+ if (fin6->sin6_len != sizeof(struct sockaddr_in6) ||
+ lin6->sin6_len != sizeof(struct sockaddr_in6))
+ return (EINVAL);
+ if (IN6_IS_ADDR_V4MAPPED(&fin6->sin6_addr)) {
+ if (!IN6_IS_ADDR_V4MAPPED(&lin6->sin6_addr))
+ return (EINVAL);
+ in6_sin6_2_sin_in_sock((struct sockaddr *)&addrs[0]);
+ in6_sin6_2_sin_in_sock((struct sockaddr *)&addrs[1]);
+ fin = (struct sockaddr_in *)&addrs[0];
+ lin = (struct sockaddr_in *)&addrs[1];
+ break;
+ }
+ error = sa6_embedscope(fin6, V_ip6_use_defzone);
+ if (error)
+ return (error);
+ error = sa6_embedscope(lin6, V_ip6_use_defzone);
+ if (error)
+ return (error);
+ break;
+#endif
+#ifdef INET
+ case AF_INET:
+ fin = (struct sockaddr_in *)&addrs[0];
+ lin = (struct sockaddr_in *)&addrs[1];
+ if (fin->sin_len != sizeof(struct sockaddr_in) ||
+ lin->sin_len != sizeof(struct sockaddr_in))
+ return (EINVAL);
+ break;
+#endif
+ default:
+ return (EINVAL);
+ }
+ INP_INFO_RLOCK_ET(&V_tcbinfo, et);
+ switch (addrs[0].ss_family) {
+#ifdef INET6
+ case AF_INET6:
+ inp = in6_pcblookup(&V_tcbinfo, &fin6->sin6_addr,
+ fin6->sin6_port, &lin6->sin6_addr, lin6->sin6_port,
+ INPLOOKUP_WLOCKPCB, NULL);
+ break;
+#endif
+#ifdef INET
+ case AF_INET:
+ inp = in_pcblookup(&V_tcbinfo, fin->sin_addr, fin->sin_port,
+ lin->sin_addr, lin->sin_port, INPLOOKUP_WLOCKPCB, NULL);
+ break;
+#endif
+ }
+ INP_INFO_RUNLOCK_ET(&V_tcbinfo, et);
+ if (inp != NULL) {
+ if ((inp->inp_flags & (INP_TIMEWAIT | INP_DROPPED)) != 0 ||
+ inp->inp_socket == NULL) {
+ error = ECONNRESET;
+ INP_WUNLOCK(inp);
+ } else {
+ struct socket *so;
+
+ so = inp->inp_socket;
+ soref(so);
+ error = ktls_set_tx_mode(so,
+ arg2 == 0 ? TCP_TLS_MODE_SW : TCP_TLS_MODE_IFNET);
+ INP_WUNLOCK(inp);
+ SOCK_LOCK(so);
+ sorele(so);
+ }
+ } else
+ error = ESRCH;
+ return (error);
+}
+
+SYSCTL_PROC(_net_inet_tcp, OID_AUTO, switch_to_sw_tls,
+ CTLFLAG_VNET | CTLTYPE_STRUCT | CTLFLAG_WR | CTLFLAG_SKIP, NULL,
+ 0, sysctl_switch_tls, "", "Switch TCP connection to SW TLS");
+SYSCTL_PROC(_net_inet_tcp, OID_AUTO, switch_to_ifnet_tls,
+ CTLFLAG_VNET | CTLTYPE_STRUCT | CTLFLAG_WR | CTLFLAG_SKIP, NULL,
+ 1, sysctl_switch_tls, "", "Switch TCP connection to ifnet TLS");
+#endif
+
/*
* Generate a standardized TCP log line for use throughout the
* tcp subsystem. Memory allocation is done with M_NOWAIT to
diff --git a/sys/netinet/tcp_usrreq.c b/sys/netinet/tcp_usrreq.c
index 10e21d3a892d..4812912d7d7c 100644
--- a/sys/netinet/tcp_usrreq.c
+++ b/sys/netinet/tcp_usrreq.c
@@ -44,6 +44,7 @@ __FBSDID("$FreeBSD$");
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_ipsec.h"
+#include "opt_kern_tls.h"
#include "opt_tcpdebug.h"
#include <sys/param.h>
@@ -52,6 +53,7 @@ __FBSDID("$FreeBSD$");
#include <sys/malloc.h>
#include <sys/refcount.h>
#include <sys/kernel.h>
+#include <sys/ktls.h>
#include <sys/sysctl.h>
#include <sys/mbuf.h>
#ifdef INET6
@@ -1755,6 +1757,9 @@ tcp_default_ctloutput(struct socket *so, struct sockopt *sopt, struct inpcb *inp
int error, opt, optval;
u_int ui;
struct tcp_info ti;
+#ifdef KERN_TLS
+ struct tls_enable tls;
+#endif
struct cc_algo *algo;
char *pbuf, buf[TCP_LOG_ID_LEN];
size_t len;
@@ -1917,6 +1922,29 @@ unlock_and_done:
INP_WUNLOCK(inp);
break;
+#ifdef KERN_TLS
+ case TCP_TXTLS_ENABLE:
+ INP_WUNLOCK(inp);
+ error = sooptcopyin(sopt, &tls, sizeof(tls),
+ sizeof(tls));
+ if (error)
+ break;
+ error = ktls_enable_tx(so, &tls);
+ break;
+ case TCP_TXTLS_MODE:
+ INP_WUNLOCK(inp);
+ error = sooptcopyin(sopt, &ui, sizeof(ui), sizeof(ui));
+ if (error)
+ return (error);
+ if (ui != TCP_TLS_MODE_SW && ui != TCP_TLS_MODE_IFNET)
+ return (EINVAL);
+
+ INP_WLOCK_RECHECK(inp);
+ error = ktls_set_tx_mode(so, ui);
+ INP_WUNLOCK(inp);
+ break;
+#endif
+
case TCP_KEEPIDLE:
case TCP_KEEPINTVL:
case TCP_KEEPINIT:
@@ -2198,6 +2226,13 @@ unlock_and_done:
error = EINVAL;
break;
#endif
+#ifdef KERN_TLS
+ case TCP_TXTLS_MODE:
+ optval = ktls_get_tx_mode(so);
+ INP_WUNLOCK(inp);
+ error = sooptcopyout(sopt, &optval, sizeof(optval));
+ break;
+#endif
default:
INP_WUNLOCK(inp);
error = ENOPROTOOPT;
diff --git a/sys/netinet/tcp_var.h b/sys/netinet/tcp_var.h
index a36da12b6817..dfc6d150eed1 100644
--- a/sys/netinet/tcp_var.h
+++ b/sys/netinet/tcp_var.h
@@ -952,7 +952,7 @@ uint32_t tcp_compute_initwnd(uint32_t);
void tcp_sndbuf_autoscale(struct tcpcb *, struct socket *, uint32_t);
struct mbuf *
tcp_m_copym(struct mbuf *m, int32_t off0, int32_t *plen,
- int32_t seglimit, int32_t segsize, struct sockbuf *sb);
+ int32_t seglimit, int32_t segsize, struct sockbuf *sb, bool hw_tls);
static inline void
diff --git a/sys/netinet6/ip6_output.c b/sys/netinet6/ip6_output.c
index 013042147618..08dffde61721 100644
--- a/sys/netinet6/ip6_output.c
+++ b/sys/netinet6/ip6_output.c
@@ -68,6 +68,7 @@ __FBSDID("$FreeBSD$");
#include "opt_inet.h"
#include "opt_inet6.h"
#include "opt_ipsec.h"
+#include "opt_kern_tls.h"
#include "opt_ratelimit.h"
#include "opt_route.h"
#include "opt_rss.h"
@@ -75,6 +76,7 @@ __FBSDID("$FreeBSD$");
#include <sys/param.h>
#include <sys/kernel.h>
+#include <sys/ktls.h>
#include <sys/malloc.h>
#include <sys/mbuf.h>
#include <sys/errno.h>
@@ -280,14 +282,39 @@ static int
ip6_output_send(struct inpcb *inp, struct ifnet *ifp, struct ifnet *origifp,
struct mbuf *m, struct sockaddr_in6 *dst, struct route_in6 *ro)
{
+#ifdef KERN_TLS
+ struct ktls_session *tls = NULL;
+#endif
struct m_snd_tag *mst;
int error;
MPASS((m->m_pkthdr.csum_flags & CSUM_SND_TAG) == 0);
mst = NULL;
+#ifdef KERN_TLS
+ /*
+ * If this is an unencrypted TLS record, save a reference to
+ * the record. This local reference is used to call
+ * ktls_output_eagain after the mbuf has been freed (thus
+ * dropping the mbuf's reference) in if_output.
+ */
+ if (m->m_next != NULL && mbuf_has_tls_session(m->m_next)) {
+ tls = ktls_hold(m->m_next->m_ext.ext_pgs->tls);
+ mst = tls->snd_tag;
+
+ /*
+ * If a TLS session doesn't have a valid tag, it must
+ * have had an earlier ifp mismatch, so drop this
+ * packet.
+ */
+ if (mst == NULL) {
+ error = EAGAIN;
+ goto done;
+ }
+ }
+#endif
#ifdef RATELIMIT
- if (inp != NULL) {
+ if (inp != NULL && mst == NULL) {
if ((inp->inp_flags2 & INP_RATE_LIMIT_CHANGED) != 0 ||
(inp->inp_snd_tag != NULL &&
inp->inp_snd_tag->ifp != ifp))
@@ -314,6 +341,13 @@ ip6_output_send(struct inpcb *inp, struct ifnet *ifp, struct ifnet *origifp,
done:
/* Check for route change invalidating send tags. */
+#ifdef KERN_TLS
+ if (tls != NULL) {
+ if (error == EAGAIN)
+ error = ktls_output_eagain(inp, tls);
+ ktls_free(tls);
+ }
+#endif
#ifdef RATELIMIT
if (error == EAGAIN)
in_pcboutput_eagain(inp);
diff --git a/sys/opencrypto/ktls_ocf.c b/sys/opencrypto/ktls_ocf.c
new file mode 100644
index 000000000000..953fc1c9b6e3
--- /dev/null
+++ b/sys/opencrypto/ktls_ocf.c
@@ -0,0 +1,308 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause
+ *
+ * Copyright (c) 2019 Netflix Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <sys/param.h>
+#include <sys/systm.h>
+#include <sys/counter.h>
+#include <sys/endian.h>
+#include <sys/kernel.h>
+#include <sys/ktls.h>
+#include <sys/lock.h>
+#include <sys/malloc.h>
+#include <sys/module.h>
+#include <sys/mutex.h>
+#include <sys/sysctl.h>
+#include <sys/uio.h>
+#include <opencrypto/cryptodev.h>
+
+struct ocf_session {
+ crypto_session_t sid;
+ int crda_alg;
+ struct mtx lock;
+};
+
+struct ocf_operation {
+ struct ocf_session *os;
+ bool done;
+ struct iovec iov[0];
+};
+
+static MALLOC_DEFINE(M_KTLS_OCF, "ktls_ocf", "OCF KTLS");
+
+SYSCTL_DECL(_kern_ipc_tls);
+SYSCTL_DECL(_kern_ipc_tls_stats);
+
+static counter_u64_t ocf_gcm_crypts;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, ocf_gcm_crypts, CTLFLAG_RD,
+ &ocf_gcm_crypts,
+ "Total number of OCF GCM encryption operations");
+
+static counter_u64_t ocf_retries;
+SYSCTL_COUNTER_U64(_kern_ipc_tls_stats, OID_AUTO, ocf_retries, CTLFLAG_RD,
+ &ocf_retries,
+ "Number of OCF encryption operation retries");
+
+static int
+ktls_ocf_callback(struct cryptop *crp)
+{
+ struct ocf_operation *oo;
+
+ oo = crp->crp_opaque;
+ mtx_lock(&oo->os->lock);
+ oo->done = true;
+ mtx_unlock(&oo->os->lock);
+ wakeup(oo);
+ return (0);
+}
+
+static int
+ktls_ocf_encrypt(struct ktls_session *tls, const struct tls_record_layer *hdr,
+ uint8_t *trailer, struct iovec *iniov, struct iovec *outiov, int iovcnt,
+ uint64_t seqno)
+{
+ struct uio uio;
+ struct tls_aead_data ad;
+ struct tls_nonce_data nd;
+ struct cryptodesc *crde, *crda;
+ struct cryptop *crp;
+ struct ocf_session *os;
+ struct ocf_operation *oo;
+ struct iovec *iov;
+ int i, error;
+ uint16_t tls_comp_len;
+
+ os = tls->cipher;
+
+ oo = malloc(sizeof(*oo) + (iovcnt + 2) * sizeof(*iov), M_KTLS_OCF,
+ M_WAITOK | M_ZERO);
+ oo->os = os;
+ iov = oo->iov;
+
+ crp = crypto_getreq(2);
+ if (crp == NULL) {
+ free(oo, M_KTLS_OCF);
+ return (ENOMEM);
+ }
+
+ /* Setup the IV. */
+ memcpy(nd.fixed, tls->params.iv, TLS_AEAD_GCM_LEN);
+ memcpy(&nd.seq, hdr + 1, sizeof(nd.seq));
+
+ /* Setup the AAD. */
+ tls_comp_len = ntohs(hdr->tls_length) -
+ (AES_GMAC_HASH_LEN + sizeof(nd.seq));
+ ad.seq = htobe64(seqno);
+ ad.type = hdr->tls_type;
+ ad.tls_vmajor = hdr->tls_vmajor;
+ ad.tls_vminor = hdr->tls_vminor;
+ ad.tls_length = htons(tls_comp_len);
+ iov[0].iov_base = &ad;
+ iov[0].iov_len = sizeof(ad);
+ uio.uio_resid = sizeof(ad);
+
+ /*
+ * OCF always does encryption in place, so copy the data if
+ * needed. Ugh.
+ */
+ for (i = 0; i < iovcnt; i++) {
+ iov[i + 1] = outiov[i];
+ if (iniov[i].iov_base != outiov[i].iov_base)
+ memcpy(outiov[i].iov_base, iniov[i].iov_base,
+ outiov[i].iov_len);
+ uio.uio_resid += outiov[i].iov_len;
+ }
+
+ iov[iovcnt + 1].iov_base = trailer;
+ iov[iovcnt + 1].iov_len = AES_GMAC_HASH_LEN;
+ uio.uio_resid += AES_GMAC_HASH_LEN;
+
+ uio.uio_iov = iov;
+ uio.uio_iovcnt = iovcnt + 2;
+ uio.uio_offset = 0;
+ uio.uio_segflg = UIO_SYSSPACE;
+ uio.uio_td = curthread;
+
+ crp->crp_session = os->sid;
+ crp->crp_flags = CRYPTO_F_IOV | CRYPTO_F_CBIMM;
+ crp->crp_uio = &uio;
+ crp->crp_ilen = uio.uio_resid;
+ crp->crp_opaque = oo;
+ crp->crp_callback = ktls_ocf_callback;
+
+ crde = crp->crp_desc;
+ crda = crde->crd_next;
+
+ crda->crd_alg = os->crda_alg;
+ crda->crd_skip = 0;
+ crda->crd_len = sizeof(ad);
+ crda->crd_inject = crp->crp_ilen - AES_GMAC_HASH_LEN;
+
+ crde->crd_alg = CRYPTO_AES_NIST_GCM_16;
+ crde->crd_skip = sizeof(ad);
+ crde->crd_len = crp->crp_ilen - (sizeof(ad) + AES_GMAC_HASH_LEN);
+ crde->crd_flags = CRD_F_ENCRYPT | CRD_F_IV_EXPLICIT | CRD_F_IV_PRESENT;
+ memcpy(crde->crd_iv, &nd, sizeof(nd));
+
+ counter_u64_add(ocf_gcm_crypts, 1);
+ for (;;) {
+ error = crypto_dispatch(crp);
+ if (error)
+ break;
+
+ mtx_lock(&os->lock);
+ while (!oo->done)
+ mtx_sleep(oo, &os->lock, 0, "ocfktls", 0);
+ mtx_unlock(&os->lock);
+
+ if (crp->crp_etype != EAGAIN) {
+ error = crp->crp_etype;
+ break;
+ }
+
+ crp->crp_etype = 0;
+ crp->crp_flags &= ~CRYPTO_F_DONE;
+ oo->done = false;
+ counter_u64_add(ocf_retries, 1);
+ }
+
+ crypto_freereq(crp);
+ free(oo, M_KTLS_OCF);
+ return (error);
+}
+
+static void
+ktls_ocf_free(struct ktls_session *tls)
+{
+ struct ocf_session *os;
+
+ os = tls->cipher;
+ mtx_destroy(&os->lock);
+ explicit_bzero(os, sizeof(*os));
+ free(os, M_KTLS_OCF);
+}
+
+static int
+ktls_ocf_try(struct socket *so, struct ktls_session *tls)
+{
+ struct cryptoini cria, crie;
+ struct ocf_session *os;
+ int error;
+
+ memset(&cria, 0, sizeof(cria));
+ memset(&crie, 0, sizeof(crie));
+
+ switch (tls->params.cipher_algorithm) {
+ case CRYPTO_AES_NIST_GCM_16:
+ if (tls->params.iv_len != TLS_AEAD_GCM_LEN)
+ return (EINVAL);
+ switch (tls->params.cipher_key_len) {
+ case 128 / 8:
+ cria.cri_alg = CRYPTO_AES_128_NIST_GMAC;
+ break;
+ case 256 / 8:
+ cria.cri_alg = CRYPTO_AES_256_NIST_GMAC;
+ break;
+ default:
+ return (EINVAL);
+ }
+ cria.cri_key = tls->params.cipher_key;
+ cria.cri_klen = tls->params.cipher_key_len * 8;
+ break;
+ default:
+ return (EPROTONOSUPPORT);
+ }
+
+ /* Only TLS 1.1 and TLS 1.2 are currently supported. */
+ if (tls->params.tls_vmajor != TLS_MAJOR_VER_ONE ||
+ tls->params.tls_vminor < TLS_MINOR_VER_ONE ||
+ tls->params.tls_vminor > TLS_MINOR_VER_TWO)
+ return (EPROTONOSUPPORT);
+
+ os = malloc(sizeof(*os), M_KTLS_OCF, M_NOWAIT | M_ZERO);
+ if (os == NULL)
+ return (ENOMEM);
+
+ crie.cri_alg = tls->params.cipher_algorithm;
+ crie.cri_key = tls->params.cipher_key;
+ crie.cri_klen = tls->params.cipher_key_len * 8;
+
+ crie.cri_next = &cria;
+ error = crypto_newsession(&os->sid, &crie,
+ CRYPTO_FLAG_HARDWARE | CRYPTO_FLAG_SOFTWARE);
+ if (error) {
+ free(os, M_KTLS_OCF);
+ return (error);
+ }
+
+ os->crda_alg = cria.cri_alg;
+ mtx_init(&os->lock, "ktls_ocf", NULL, MTX_DEF);
+ tls->cipher = os;
+ tls->sw_encrypt = ktls_ocf_encrypt;
+ tls->free = ktls_ocf_free;
+ return (0);
+}
+
+struct ktls_crypto_backend ocf_backend = {
+ .name = "OCF",
+ .prio = 5,
+ .api_version = KTLS_API_VERSION,
+ .try = ktls_ocf_try,
+};
+
+static int
+ktls_ocf_modevent(module_t mod, int what, void *arg)
+{
+ int error;
+
+ switch (what) {
+ case MOD_LOAD:
+ ocf_gcm_crypts = counter_u64_alloc(M_WAITOK);
+ ocf_retries = counter_u64_alloc(M_WAITOK);
+ return (ktls_crypto_backend_register(&ocf_backend));
+ case MOD_UNLOAD:
+ error = ktls_crypto_backend_deregister(&ocf_backend);
+ if (error)
+ return (error);
+ counter_u64_free(ocf_gcm_crypts);
+ counter_u64_free(ocf_retries);
+ return (0);
+ default:
+ return (EOPNOTSUPP);
+ }
+}
+
+static moduledata_t ktls_ocf_moduledata = {
+ "ktls_ocf",
+ ktls_ocf_modevent,
+ NULL
+};
+
+DECLARE_MODULE(ktls_ocf, ktls_ocf_moduledata, SI_SUB_PROTO_END, SI_ORDER_ANY);
diff --git a/sys/sys/ktls.h b/sys/sys/ktls.h
new file mode 100644
index 000000000000..079d4448bd8d
--- /dev/null
+++ b/sys/sys/ktls.h
@@ -0,0 +1,194 @@
+/*-
+ * SPDX-License-Identifier: BSD-2-Clause-FreeBSD
+ *
+ * Copyright (c) 2014-2019 Netflix Inc.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in the
+ * documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
+ * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * $FreeBSD$
+ */
+#ifndef _SYS_KTLS_H_
+#define _SYS_KTLS_H_
+
+#include <sys/refcount.h>
+#include <sys/_task.h>
+
+struct tls_record_layer {
+ uint8_t tls_type;
+ uint8_t tls_vmajor;
+ uint8_t tls_vminor;
+ uint16_t tls_length;
+ uint8_t tls_data[0];
+} __attribute__ ((packed));
+
+#define TLS_MAX_MSG_SIZE_V10_2 16384
+#define TLS_MAX_PARAM_SIZE 1024 /* Max key/mac/iv in sockopt */
+#define TLS_AEAD_GCM_LEN 4
+#define TLS_CBC_IMPLICIT_IV_LEN 16
+
+/* Type values for the record layer */
+#define TLS_RLTYPE_APP 23
+
+/*
+ * Nonce for GCM.
+ */
+struct tls_nonce_data {
+ uint8_t fixed[TLS_AEAD_GCM_LEN];
+ uint64_t seq;
+} __packed;
+
+/*
+ * AEAD additional data format per RFC.
+ */
+struct tls_aead_data {
+ uint64_t seq; /* In network order */
+ uint8_t type;
+ uint8_t tls_vmajor;
+ uint8_t tls_vminor;
+ uint16_t tls_length;
+} __packed;
+
+/*
+ * Stream Cipher MAC additional data input. This does not match the
+ * exact data on the wire (the sequence number is not placed on the
+ * wire, and any explicit IV after the record header is not covered by
+ * the MAC).
+ */
+struct tls_mac_data {
+ uint64_t seq;
+ uint8_t type;
+ uint8_t tls_vmajor;
+ uint8_t tls_vminor;
+ uint16_t tls_length;
+} __packed;
+
+#define TLS_MAJOR_VER_ONE 3
+#define TLS_MINOR_VER_ZERO 1 /* 3, 1 */
+#define TLS_MINOR_VER_ONE 2 /* 3, 2 */
+#define TLS_MINOR_VER_TWO 3 /* 3, 3 */
+
+/* For TCP_TXTLS_ENABLE */
+struct tls_enable {
+ const uint8_t *cipher_key;
+ const uint8_t *iv; /* Implicit IV. */
+ const uint8_t *auth_key;
+ int cipher_algorithm; /* e.g. CRYPTO_AES_CBC */
+ int cipher_key_len;
+ int iv_len;
+ int auth_algorithm; /* e.g. CRYPTO_SHA2_256_HMAC */
+ int auth_key_len;
+ int flags;
+ uint8_t tls_vmajor;
+ uint8_t tls_vminor;
+};
+
+struct tls_session_params {
+ uint8_t *cipher_key;
+ uint8_t *auth_key;
+ uint8_t iv[TLS_CBC_IMPLICIT_IV_LEN];
+ int cipher_algorithm;
+ int auth_algorithm;
+ uint16_t cipher_key_len;
+ uint16_t iv_len;
+ uint16_t auth_key_len;
+ uint16_t max_frame_len;
+ uint8_t tls_vmajor;
+ uint8_t tls_vminor;
+ uint8_t tls_hlen;
+ uint8_t tls_tlen;
+ uint8_t tls_bs;
+ uint8_t flags;
+};
+
+#ifdef _KERNEL
+
+#define KTLS_API_VERSION 5
+
+struct iovec;
+struct ktls_session;
+struct m_snd_tag;
+struct mbuf;
+struct mbuf_ext_pgs;
+struct sockbuf;
+struct socket;
+
+struct ktls_crypto_backend {
+ LIST_ENTRY(ktls_crypto_backend) next;
+ int (*try)(struct socket *so, struct ktls_session *tls);
+ int prio;
+ int api_version;
+ int use_count;
+ const char *name;
+};
+
+struct ktls_session {
+ int (*sw_encrypt)(struct ktls_session *tls,
+ const struct tls_record_layer *hdr, uint8_t *trailer,
+ struct iovec *src, struct iovec *dst, int iovcnt,
+ uint64_t seqno);
+ union {
+ void *cipher;
+ struct m_snd_tag *snd_tag;
+ };
+ struct ktls_crypto_backend *be;
+ void (*free)(struct ktls_session *tls);
+ struct tls_session_params params;
+ u_int wq_index;
+ volatile u_int refcount;
+
+ struct task reset_tag_task;
+ struct inpcb *inp;
+ bool reset_pending;
+} __aligned(CACHE_LINE_SIZE);
+
+int ktls_crypto_backend_register(struct ktls_crypto_backend *be);
+int ktls_crypto_backend_deregister(struct ktls_crypto_backend *be);
+int ktls_enable_tx(struct socket *so, struct tls_enable *en);
+void ktls_destroy(struct ktls_session *tls);
+int ktls_frame(struct mbuf *m, struct ktls_session *tls, int *enqueue_cnt,
+ uint8_t record_type);
+void ktls_seq(struct sockbuf *sb, struct mbuf *m);
+void ktls_enqueue(struct mbuf *m, struct socket *so, int page_count);
+void ktls_enqueue_to_free(struct mbuf_ext_pgs *pgs);
+int ktls_set_tx_mode(struct socket *so, int mode);
+int ktls_get_tx_mode(struct socket *so);
+int ktls_output_eagain(struct inpcb *inp, struct ktls_session *tls);
+
+static inline struct ktls_session *
+ktls_hold(struct ktls_session *tls)
+{
+
+ if (tls != NULL)
+ refcount_acquire(&tls->refcount);
+ return (tls);
+}
+
+static inline void
+ktls_free(struct ktls_session *tls)
+{
+
+ if (refcount_release(&tls->refcount))
+ ktls_destroy(tls);
+}
+
+#endif /* !_KERNEL */
+#endif /* !_SYS_KTLS_H_ */
diff --git a/sys/sys/mbuf.h b/sys/sys/mbuf.h
index 46710f614114..796a77d791b4 100644
--- a/sys/sys/mbuf.h
+++ b/sys/sys/mbuf.h
@@ -301,6 +301,7 @@ struct mbuf {
};
};
+struct ktls_session;
struct socket;
/*
@@ -344,7 +345,7 @@ struct mbuf_ext_pgs {
uint16_t last_pg_len; /* Length of last page */
vm_paddr_t pa[MBUF_PEXT_MAX_PGS]; /* phys addrs of pages */
char hdr[MBUF_PEXT_HDR_LEN]; /* TLS header */
- void *tls; /* TLS session */
+ struct ktls_session *tls; /* TLS session */
#if defined(__i386__) || \
(defined(__powerpc__) && !defined(__powerpc64__) && defined(BOOKE))
/*
@@ -357,9 +358,10 @@ struct mbuf_ext_pgs {
char trail[MBUF_PEXT_TRAIL_LEN]; /* TLS trailer */
struct {
struct socket *so;
- void *mbuf;
+ struct mbuf *mbuf;
uint64_t seqno;
STAILQ_ENTRY(mbuf_ext_pgs) stailq;
+ int enc_cnt;
};
};
};
@@ -1506,5 +1508,18 @@ void netdump_mbuf_dump(void);
void netdump_mbuf_reinit(int nmbuf, int nclust, int clsize);
#endif
+static inline bool
+mbuf_has_tls_session(struct mbuf *m)
+{
+
+ if (m->m_flags & M_NOMAP) {
+ MBUF_EXT_PGS_ASSERT(m);
+ if (m->m_ext.ext_pgs->tls != NULL) {
+ return (true);
+ }
+ }
+ return (false);
+}
+
#endif /* _KERNEL */
#endif /* !_SYS_MBUF_H_ */
diff --git a/sys/sys/param.h b/sys/sys/param.h
index f6c6616900ad..3ad5835b5ca9 100644
--- a/sys/sys/param.h
+++ b/sys/sys/param.h
@@ -60,7 +60,7 @@
* in the range 5 to 9.
*/
#undef __FreeBSD_version
-#define __FreeBSD_version 1300041 /* Master, propagated to newvers */
+#define __FreeBSD_version 1300042 /* Master, propagated to newvers */
/*
* __FreeBSD_kernel__ indicates that this system uses the kernel of FreeBSD,
diff --git a/sys/sys/sockbuf.h b/sys/sys/sockbuf.h
index 7a2cc7a7d641..eb14ea8ee8ff 100644
--- a/sys/sys/sockbuf.h
+++ b/sys/sys/sockbuf.h
@@ -50,6 +50,7 @@
#define SB_AUTOSIZE 0x800 /* automatically size socket buffer */
#define SB_STOP 0x1000 /* backpressure indicator */
#define SB_AIO_RUNNING 0x2000 /* AIO operation running */
+#define SB_TLS_IFNET 0x4000 /* has used / is using ifnet KTLS */
#define SBS_CANTSENDMORE 0x0010 /* can't send more data to peer */
#define SBS_CANTRCVMORE 0x0020 /* can't receive more data from peer */
@@ -63,6 +64,7 @@
#define SB_MAX (2*1024*1024) /* default for max chars in sockbuf */
+struct ktls_session;
struct mbuf;
struct sockaddr;
struct socket;
@@ -74,6 +76,7 @@ struct selinfo;
*
* Locking key to struct sockbuf:
* (a) locked by SOCKBUF_LOCK().
+ * (b) locked by sblock()
*/
struct sockbuf {
struct mtx sb_mtx; /* sockbuf lock */
@@ -98,6 +101,8 @@ struct sockbuf {
u_int sb_ctl; /* (a) non-data chars in buffer */
int sb_lowat; /* (a) low water mark */
sbintime_t sb_timeo; /* (a) timeout for read/write */
+ uint64_t sb_tls_seqno; /* (a) TLS seqno */
+ struct ktls_session *sb_tls_info; /* (a + b) TLS state */
short sb_flags; /* (a) flags, see above */
int (*sb_upcall)(struct socket *, void *, int); /* (a) */
void *sb_upcallarg; /* (a) */
diff --git a/tools/tools/switch_tls/Makefile b/tools/tools/switch_tls/Makefile
new file mode 100644
index 000000000000..be50ebd654e9
--- /dev/null
+++ b/tools/tools/switch_tls/Makefile
@@ -0,0 +1,6 @@
+# $FreeBSD$
+
+PROG= switch_tls
+MAN=
+
+.include <bsd.prog.mk>
diff --git a/tools/tools/switch_tls/switch_tls.c b/tools/tools/switch_tls/switch_tls.c
new file mode 100644
index 000000000000..788926bfb92a
--- /dev/null
+++ b/tools/tools/switch_tls/switch_tls.c
@@ -0,0 +1,381 @@
+/* $OpenBSD: tcpdrop.c,v 1.4 2004/05/22 23:55:22 deraadt Exp $ */
+
+/*-
+ * Copyright (c) 2009 Juli Mallett <jmallett@FreeBSD.org>
+ * Copyright (c) 2004 Markus Friedl <markus@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include <sys/cdefs.h>
+__FBSDID("$FreeBSD$");
+
+#include <sys/param.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include <sys/socketvar.h>
+#include <sys/sysctl.h>
+
+#include <netinet/in.h>
+#include <netinet/in_pcb.h>
+#define TCPSTATES
+#include <netinet/tcp_fsm.h>
+#include <netinet/tcp_var.h>
+
+#include <err.h>
+#include <netdb.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+#define TCPDROP_FOREIGN 0
+#define TCPDROP_LOCAL 1
+
+#define SW_TLS 0
+#define IFNET_TLS 1
+
+struct host_service {
+ char hs_host[NI_MAXHOST];
+ char hs_service[NI_MAXSERV];
+};
+
+static bool tcpswitch_list_commands = false;
+
+static char *findport(const char *);
+static struct xinpgen *getxpcblist(const char *);
+static void sockinfo(const struct sockaddr *, struct host_service *);
+static bool tcpswitch(const struct sockaddr *, const struct sockaddr *, int);
+static bool tcpswitchall(const char *, int);
+static bool tcpswitchbyname(const char *, const char *, const char *,
+ const char *, int);
+static bool tcpswitchconn(const struct in_conninfo *, int);
+static void usage(void);
+
+/*
+ * Switch a tcp connection.
+ */
+int
+main(int argc, char *argv[])
+{
+ char stack[TCP_FUNCTION_NAME_LEN_MAX];
+ char *lport, *fport;
+ bool switchall, switchallstack;
+ int ch, mode;
+
+ switchall = false;
+ switchallstack = false;
+ stack[0] = '\0';
+ mode = SW_TLS;
+
+ while ((ch = getopt(argc, argv, "ailS:s")) != -1) {
+ switch (ch) {
+ case 'a':
+ switchall = true;
+ break;
+ case 'i':
+ mode = IFNET_TLS;
+ break;
+ case 'l':
+ tcpswitch_list_commands = true;
+ break;
+ case 'S':
+ switchallstack = true;
+ strlcpy(stack, optarg, sizeof(stack));
+ break;
+ case 's':
+ mode = SW_TLS;
+ break;
+ default:
+ usage();
+ }
+ }
+ argc -= optind;
+ argv += optind;
+
+ if (switchall && switchallstack)
+ usage();
+ if (switchall || switchallstack) {
+ if (argc != 0)
+ usage();
+ if (!tcpswitchall(stack, mode))
+ exit(1);
+ exit(0);
+ }
+
+ if ((argc != 2 && argc != 4) || tcpswitch_list_commands)
+ usage();
+
+ if (argc == 2) {
+ lport = findport(argv[0]);
+ fport = findport(argv[1]);
+ if (lport == NULL || lport[1] == '\0' || fport == NULL ||
+ fport[1] == '\0')
+ usage();
+ *lport++ = '\0';
+ *fport++ = '\0';
+ if (!tcpswitchbyname(argv[0], lport, argv[1], fport, mode))
+ exit(1);
+ } else if (!tcpswitchbyname(argv[0], argv[1], argv[2], argv[3], mode))
+ exit(1);
+
+ exit(0);
+}
+
+static char *
+findport(const char *arg)
+{
+ char *dot, *colon;
+
+ /* A strrspn() or strrpbrk() would be nice. */
+ dot = strrchr(arg, '.');
+ colon = strrchr(arg, ':');
+ if (dot == NULL)
+ return (colon);
+ if (colon == NULL)
+ return (dot);
+ if (dot < colon)
+ return (colon);
+ else
+ return (dot);
+}
+
+static struct xinpgen *
+getxpcblist(const char *name)
+{
+ struct xinpgen *xinp;
+ size_t len;
+ int rv;
+
+ len = 0;
+ rv = sysctlbyname(name, NULL, &len, NULL, 0);
+ if (rv == -1)
+ err(1, "sysctlbyname %s", name);
+
+ if (len == 0)
+ errx(1, "%s is empty", name);
+
+ xinp = malloc(len);
+ if (xinp == NULL)
+ errx(1, "malloc failed");
+
+ rv = sysctlbyname(name, xinp, &len, NULL, 0);
+ if (rv == -1)
+ err(1, "sysctlbyname %s", name);
+
+ return (xinp);
+}
+
+static void
+sockinfo(const struct sockaddr *sa, struct host_service *hs)
+{
+ static const int flags = NI_NUMERICHOST | NI_NUMERICSERV;
+ int rv;
+
+ rv = getnameinfo(sa, sa->sa_len, hs->hs_host, sizeof hs->hs_host,
+ hs->hs_service, sizeof hs->hs_service, flags);
+ if (rv == -1)
+ err(1, "getnameinfo");
+}
+
+static bool
+tcpswitch(const struct sockaddr *lsa, const struct sockaddr *fsa, int mode)
+{
+ struct host_service local, foreign;
+ struct sockaddr_storage addrs[2];
+ int rv;
+
+ memcpy(&addrs[TCPDROP_FOREIGN], fsa, fsa->sa_len);
+ memcpy(&addrs[TCPDROP_LOCAL], lsa, lsa->sa_len);
+
+ sockinfo(lsa, &local);
+ sockinfo(fsa, &foreign);
+
+ if (tcpswitch_list_commands) {
+ printf("switch_tls %s %s %s %s %s\n",
+ mode == SW_TLS ? "-s" : "-i",
+ local.hs_host, local.hs_service,
+ foreign.hs_host, foreign.hs_service);
+ return (true);
+ }
+
+ rv = sysctlbyname(mode == SW_TLS ? "net.inet.tcp.switch_to_sw_tls" :
+ "net.inet.tcp.switch_to_ifnet_tls", NULL, NULL, &addrs,
+ sizeof addrs);
+ if (rv == -1) {
+ warn("%s %s %s %s", local.hs_host, local.hs_service,
+ foreign.hs_host, foreign.hs_service);
+ return (false);
+ }
+ printf("%s %s %s %s: switched\n", local.hs_host, local.hs_service,
+ foreign.hs_host, foreign.hs_service);
+ return (true);
+}
+
+static bool
+tcpswitchall(const char *stack, int mode)
+{
+ struct xinpgen *head, *xinp;
+ struct xtcpcb *xtp;
+ struct xinpcb *xip;
+ bool ok;
+
+ ok = true;
+
+ head = getxpcblist("net.inet.tcp.pcblist");
+
+#define XINP_NEXT(xinp) \
+ ((struct xinpgen *)(uintptr_t)((uintptr_t)(xinp) + (xinp)->xig_len))
+
+ for (xinp = XINP_NEXT(head); xinp->xig_len > sizeof *xinp;
+ xinp = XINP_NEXT(xinp)) {
+ xtp = (struct xtcpcb *)xinp;
+ xip = &xtp->xt_inp;
+
+ /*
+ * XXX
+ * Check protocol, support just v4 or v6, etc.
+ */
+
+ /* Ignore PCBs which were freed during copyout. */
+ if (xip->inp_gencnt > head->xig_gen)
+ continue;
+
+ /* Skip listening sockets. */
+ if (xtp->t_state == TCPS_LISTEN)
+ continue;
+
+ /* If requested, skip sockets not having the requested stack. */
+ if (stack[0] != '\0' &&
+ strncmp(xtp->xt_stack, stack, TCP_FUNCTION_NAME_LEN_MAX))
+ continue;
+
+ if (!tcpswitchconn(&xip->inp_inc, mode))
+ ok = false;
+ }
+ free(head);
+
+ return (ok);
+}
+
+static bool
+tcpswitchbyname(const char *lhost, const char *lport, const char *fhost,
+ const char *fport, int mode)
+{
+ static const struct addrinfo hints = {
+ /*
+ * Look for streams in all domains.
+ */
+ .ai_family = AF_UNSPEC,
+ .ai_socktype = SOCK_STREAM,
+ };
+ struct addrinfo *ail, *local, *aif, *foreign;
+ int error;
+ bool ok, infamily;
+
+ error = getaddrinfo(lhost, lport, &hints, &local);
+ if (error != 0)
+ errx(1, "getaddrinfo: %s port %s: %s", lhost, lport,
+ gai_strerror(error));
+
+ error = getaddrinfo(fhost, fport, &hints, &foreign);
+ if (error != 0) {
+ freeaddrinfo(local); /* XXX gratuitous */
+ errx(1, "getaddrinfo: %s port %s: %s", fhost, fport,
+ gai_strerror(error));
+ }
+
+ ok = true;
+ infamily = false;
+
+ /*
+ * Try every combination of local and foreign address pairs.
+ */
+ for (ail = local; ail != NULL; ail = ail->ai_next) {
+ for (aif = foreign; aif != NULL; aif = aif->ai_next) {
+ if (ail->ai_family != aif->ai_family)
+ continue;
+ infamily = true;
+ if (!tcpswitch(ail->ai_addr, aif->ai_addr, mode))
+ ok = false;
+ }
+ }
+
+ if (!infamily) {
+ warnx("%s %s %s %s: different address families", lhost, lport,
+ fhost, fport);
+ ok = false;
+ }
+
+ freeaddrinfo(local);
+ freeaddrinfo(foreign);
+
+ return (ok);
+}
+
+static bool
+tcpswitchconn(const struct in_conninfo *inc, int mode)
+{
+ struct sockaddr *local, *foreign;
+ struct sockaddr_in6 sin6[2];
+ struct sockaddr_in sin4[2];
+
+ if ((inc->inc_flags & INC_ISIPV6) != 0) {
+ memset(sin6, 0, sizeof sin6);
+
+ sin6[TCPDROP_LOCAL].sin6_len = sizeof sin6[TCPDROP_LOCAL];
+ sin6[TCPDROP_LOCAL].sin6_family = AF_INET6;
+ sin6[TCPDROP_LOCAL].sin6_port = inc->inc_lport;
+ memcpy(&sin6[TCPDROP_LOCAL].sin6_addr, &inc->inc6_laddr,
+ sizeof inc->inc6_laddr);
+ local = (struct sockaddr *)&sin6[TCPDROP_LOCAL];
+
+ sin6[TCPDROP_FOREIGN].sin6_len = sizeof sin6[TCPDROP_FOREIGN];
+ sin6[TCPDROP_FOREIGN].sin6_family = AF_INET6;
+ sin6[TCPDROP_FOREIGN].sin6_port = inc->inc_fport;
+ memcpy(&sin6[TCPDROP_FOREIGN].sin6_addr, &inc->inc6_faddr,
+ sizeof inc->inc6_faddr);
+ foreign = (struct sockaddr *)&sin6[TCPDROP_FOREIGN];
+ } else {
+ memset(sin4, 0, sizeof sin4);
+
+ sin4[TCPDROP_LOCAL].sin_len = sizeof sin4[TCPDROP_LOCAL];
+ sin4[TCPDROP_LOCAL].sin_family = AF_INET;
+ sin4[TCPDROP_LOCAL].sin_port = inc->inc_lport;
+ memcpy(&sin4[TCPDROP_LOCAL].sin_addr, &inc->inc_laddr,
+ sizeof inc->inc_laddr);
+ local = (struct sockaddr *)&sin4[TCPDROP_LOCAL];
+
+ sin4[TCPDROP_FOREIGN].sin_len = sizeof sin4[TCPDROP_FOREIGN];
+ sin4[TCPDROP_FOREIGN].sin_family = AF_INET;
+ sin4[TCPDROP_FOREIGN].sin_port = inc->inc_fport;
+ memcpy(&sin4[TCPDROP_FOREIGN].sin_addr, &inc->inc_faddr,
+ sizeof inc->inc_faddr);
+ foreign = (struct sockaddr *)&sin4[TCPDROP_FOREIGN];
+ }
+
+ return (tcpswitch(local, foreign, mode));
+}
+
+static void
+usage(void)
+{
+ fprintf(stderr,
+"usage: switch_tls [-i | -s] local-address local-port foreign-address foreign-port\n"
+" switch_tls [-i | -s] local-address:local-port foreign-address:foreign-port\n"
+" switch_tls [-i | -s] local-address.local-port foreign-address.foreign-port\n"
+" switch_tls [-l | -i | -s] -a\n"
+" switch_tls [-l | -i | -s] -S stack\n");
+ exit(1);
+}