path: root/sys/netipsec
Commit message (Collapse)AuthorAgeFilesLines
* Revert "SO_RERROR indicates that receive buffer overflows should be handled ↵Alexander V. Chernikov2021-02-081-5/+5
| | | | | | | | as errors." Wrong version of the change was pushed inadvertenly. This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.
* SO_RERROR indicates that receive buffer overflows should be handled as errors.Alexander V. Chernikov2021-02-081-5/+5
| | | | | | | | | | | | Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default. This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.
* opencrypto: Introduce crypto_dispatch_async()Mark Johnston2021-02-082-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, OpenCrypto consumers can request asynchronous dispatch by setting a flag in the cryptop. (Currently only IPSec may do this.) I think this is a bit confusing: we (conditionally) set cryptop flags to request async dispatch, and then crypto_dispatch() immediately examines those flags to see if the consumer wants async dispatch. The flag names are also confusing since they don't specify what "async" applies to: dispatch or completion. Add a new KPI, crypto_dispatch_async(), rather than encoding the requested dispatch type in each cryptop. crypto_dispatch_async() falls back to crypto_dispatch() if the session's driver provides asynchronous dispatch. Get rid of CRYPTOP_ASYNC() and CRYPTOP_ASYNC_KEEPORDER(). Similarly, add crypto_dispatch_batch() to request processing of a tailq of cryptops, rather than encoding the scheduling policy using cryptop flags. Convert GELI, the only user of this interface (disabled by default) to use the new interface. Add CRYPTO_SESS_SYNC(), which can be used by consumers to determine whether crypto requests will be dispatched synchronously. This is just a helper macro. Use it instead of looking at cap flags directly. Fix style in crypto_done(). Also get rid of CRYPTO_RETW_EMPTY() and just check the relevant queues directly. This could result in some unnecessary wakeups but I think it's very uncommon to be using more than one queue per worker in a given workload, so checking all three queues is a waste of cycles. Reviewed by: jhb Sponsored by: Ampere Computing Submitted by: Klara, Inc. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28194
* Convert unmapped mbufs before computing checksums in IPsec.John Baldwin2021-01-191-2/+27
| | | | | | | | | | | This is similar to the logic used in ip_output() to convert mbufs prior to computing checksums. Unmapped mbufs can be sent when using sendfile() over IPsec or using KTLS over IPsec. Reported by: Sony Arpita Das @ Chelsio QA Reviewed by: np Sponsored by: Chelsio Differential Revision: https://reviews.freebsd.org/D28187
* Trigger soft lifetime expiration on sequence numberMarcin Wojtas2020-10-161-1/+6
| | | | | | | | | | | | | | | This patch adds 80% of UINT32_MAX limit on sequence number. When sequence number reaches limit kernel sends SADB_EXPIRE message to IKE daemon which is responsible to perform rekeying. Submitted by: Patryk Duda <pdk@semihalf.com> Reviewed by: ae Differential revision: https://reviews.freebsd.org/D22370 Obtained from: Semihalf Sponsored by: Stormshield Notes: svn path=/head/; revision=366759
* Add support for IPsec ESN and pass relevant information to crypto layerMarcin Wojtas2020-10-163-14/+122
| | | | | | | | | | | | | | | | | | | | | | | | | Implement support for including IPsec ESN (Extended Sequence Number) to both encrypt and authenticate mode (eg. AES-CBC and SHA256) and combined mode (eg. AES-GCM). Both ESP and AH protocols are updated. Additionally pass relevant information about ESN to crypto layer. For the ETA mode the ESN is stored in separate crp_esn buffer because the high-order 32 bits of the sequence number are appended after the Next Header (RFC 4303). For the AEAD modes the high-order 32 bits of the sequence number [e.g. RFC 4106, Chapter 5 AAD Construction] are included as part of crp_aad (SPI + ESN (32 high order bits) + Seq nr (32 low order bits)). Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com> Patryk Duda <pdk@semihalf.com> Reviewed by: jhb, gnn Differential revision: https://reviews.freebsd.org/D22369 Obtained from: Semihalf Sponsored by: Stormshield Notes: svn path=/head/; revision=366758
* Implement anti-replay algorithm with ESN supportMarcin Wojtas2020-10-166-96/+222
| | | | | | | | | | | | | | | | | | | | | | As RFC 4304 describes there is anti-replay algorithm responsibility to provide appropriate value of Extended Sequence Number. This patch introduces anti-replay algorithm with ESN support based on RFC 4304, however to avoid performance regressions window implementation was based on RFC 6479, which was already implemented in FreeBSD. To keep things clean and improve code readability, implementation of window is kept in seperate functions. Submitted by: Grzegorz Jaszczyk <jaz@semihalf.com> Patryk Duda <pdk@semihalf.com> Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D22367 Obtained from: Semihalf Sponsored by: Stormshield Notes: svn path=/head/; revision=366757
* net: clean up empty lines in .c and .h filesMateusz Guzik2020-09-018-17/+2
| | | | Notes: svn path=/head/; revision=365071
* Simplify IPsec transform-specific teardown.John Baldwin2020-06-256-37/+18
| | | | | | | | | | | | | | | | | | | - Rename from the teardown callback from 'zeroize' to 'cleanup' since this no longer zeroes keys. - Change the callback return type to void. Nothing checked the return value and it was always zero. - Don't have esp call into ah since it no longer needs to depend on this to clear the auth key. Instead, both are now private and self-contained. Reviewed by: delphij Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25443 Notes: svn path=/head/; revision=362636
* Enter and exit the network epoch for async IPsec callbacks.John Baldwin2020-06-252-6/+23
| | | | | | | | | | | | | | | When an IPsec packet has been encrypted or decrypted, the next step in the packet's traversal through the network stack is invoked from a crypto worker thread, not from the original calling thread. These threads need to enter the network epoch before passing packets down to IP output routines or up to transport protocols. Reviewed by: ae Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25444 Notes: svn path=/head/; revision=362635
* Use zfree() to explicitly zero IPsec keys.John Baldwin2020-06-254-19/+4
| | | | | | | | | Reviewed by: delphij Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25442 Notes: svn path=/head/; revision=362632
* Add the SCTP_SUPPORT kernel option.Mark Johnston2020-06-181-3/+3
| | | | | | | | | | | | | This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to support a loadable SCTP implementation. Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=362338
* Consistently include opt_ipsec.h for consumers of <netipsec/ipsec.h>.John Baldwin2020-05-295-5/+5
| | | | | | | | | | | | This fixes ipsec.ko to include all of IPSEC_DEBUG. Reviewed by: imp MFC after: 2 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25046 Notes: svn path=/head/; revision=361633
* Fix AES-CTR compatibility issue in ipsecMarcin Wojtas2020-05-261-1/+12
| | | | | | | | | | | | | | | | | | | | | r361390 decreased blocksize of AES-CTR from 16 to 1. Because of that ESP payload is no longer aligned to 16 bytes before being encrypted and sent. This is a good change since RFC3686 specifies that the last block doesn't need to be aligned. Since FreeBSD before r361390 couldn't decrypt partial blocks encrypted with AES-CTR we need to enforce 16 byte alignment in order to preserve compatibility. Add a sysctl(on by default) to control it. Submitted by: Kornel Duleba <mindal@semihalf.com> Reviewed by: jhb Obtained from: Semihalf Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D24999 Notes: svn path=/head/; revision=361507
* Add support for optional separate output buffers to in-kernel crypto.John Baldwin2020-05-253-24/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some crypto consumers such as GELI and KTLS for file-backed sendfile need to store their output in a separate buffer from the input. Currently these consumers copy the contents of the input buffer into the output buffer and queue an in-place crypto operation on the output buffer. Using a separate output buffer avoids this copy. - Create a new 'struct crypto_buffer' describing a crypto buffer containing a type and type-specific fields. crp_ilen is gone, instead buffers that use a flat kernel buffer have a cb_buf_len field for their length. The length of other buffer types is inferred from the backing store (e.g. uio_resid for a uio). Requests now have two such structures: crp_buf for the input buffer, and crp_obuf for the output buffer. - Consumers now use helper functions (crypto_use_*, e.g. crypto_use_mbuf()) to configure the input buffer. If an output buffer is not configured, the request still modifies the input buffer in-place. A consumer uses a second set of helper functions (crypto_use_output_*) to configure an output buffer. - Consumers must request support for separate output buffers when creating a crypto session via the CSP_F_SEPARATE_OUTPUT flag and are only permitted to queue a request with a separate output buffer on sessions with this flag set. Existing drivers already reject sessions with unknown flags, so this permits drivers to be modified to support this extension without requiring all drivers to change. - Several data-related functions now have matching versions that operate on an explicit buffer (e.g. crypto_apply_buf, crypto_contiguous_subsegment_buf, bus_dma_load_crp_buf). - Most of the existing data-related functions operate on the input buffer. However crypto_copyback always writes to the output buffer if a request uses a separate output buffer. - For the regions in input/output buffers, the following conventions are followed: - AAD and IV are always present in input only and their fields are offsets into the input buffer. - payload is always present in both buffers. If a request uses a separate output buffer, it must set a new crp_payload_start_output field to the offset of the payload in the output buffer. - digest is in the input buffer for verify operations, and in the output buffer for compute operations. crp_digest_start is relative to the appropriate buffer. - Add a crypto buffer cursor abstraction. This is a more general form of some bits in the cryptosoft driver that tried to always use uio's. However, compared to the original code, this avoids rewalking the uio iovec array for requests with multiple vectors. It also avoids allocate an iovec array for mbufs and populating it by instead walking the mbuf chain directly. - Update the cryptosoft(4) driver to support separate output buffers making use of the cursor abstraction. Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24545 Notes: svn path=/head/; revision=361481
* Don't pass bogus keys down for NULL algorithms.John Baldwin2020-05-022-5/+9
| | | | | | | | | | | | | | | | | | | | | The changes in r359374 added various sanity checks in sessions and requests created by crypto consumers in part to permit backend drivers to make assumptions instead of duplicating checks for various edge cases. One of the new checks was to reject sessions which provide a pointer to a key while claiming the key is zero bits long. IPsec ESP tripped over this as it passes along whatever key is provided for NULL, including a pointer to a zero-length key when an empty string ("") is used with setkey(8). One option would be to teach the IPsec key layer to not allocate keys of zero length, but I went with a simpler fix of just not passing any keys down and always using a key length of zero for NULL algorithms. PR: 245832 Reported by: CI Notes: svn path=/head/; revision=360560
* Remove support for IPsec algorithms deprecated in r348205 and r360202.John Baldwin2020-05-025-69/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Examples of depecrated algorithms in manual pages and sample configs are updated where relevant. I removed the one example of combining ESP and AH (vs using a cipher and auth in ESP) as RFC 8221 says this combination is NOT RECOMMENDED. Specifically, this removes support for the following ciphers: - des-cbc - 3des-cbc - blowfish-cbc - cast128-cbc - des-deriv - des-32iv - camellia-cbc This also removes support for the following authentication algorithms: - hmac-md5 - keyed-md5 - keyed-sha1 - hmac-ripemd160 Reviewed by: cem, gnn (older verisons) Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24342 Notes: svn path=/head/; revision=360557
* Fix name of 3DES cipher in deprecation warning.John Baldwin2020-04-221-1/+1
| | | | | | | | Submitted by: cem MFC after: 1 week Notes: svn path=/head/; revision=360206
* Deprecate 3des support in IPsec for FreeBSD 13.John Baldwin2020-04-221-1/+5
| | | | | | | | | | | | | | RFC 8221 does not outright ban 3des as the algorithms deprecated for 13 in r348205, but it is listed as a SHOULD NOT and will likely be a MUST NOT by the time 13 ships. Discussed with: bjk MFC after: 1 week Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D24341 Notes: svn path=/head/; revision=360202
* Update comments about IVs used in IPsec ESP.John Baldwin2020-04-201-16/+30
| | | | | | | | | | | | | | Add some prose and a diagram describing the layout of the cipher IV for AES-CTR and AES-GCM and how it relates to the ESP IV stored in the packet after the ESP header. Also, remove an XXX comment about the initial block counter value used for AES-CTR in esp_output as the current code matches the RFC (and the equivalent code in esp_input didn't have the XXX comment). Discussed with: cem Notes: svn path=/head/; revision=360137
* Generate IVs directly in esp_output.John Baldwin2020-04-201-4/+4
| | | | | | | | | | | | | | This is the only place that uses CRYPTO_F_IV_GENERATE. All crypto drivers currently duplicate the same boilerplate code to handle this case. Doing the generation directly removes complexity from drivers. It also simplifies support for separate input and output buffers. Reviewed by: cem Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24449 Notes: svn path=/head/; revision=360135
* Refactor driver and consumer interfaces for OCF (in-kernel crypto).John Baldwin2020-03-274-194/+141
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - The linked list of cryptoini structures used in session initialization is replaced with a new flat structure: struct crypto_session_params. This session includes a new mode to define how the other fields should be interpreted. Available modes include: - COMPRESS (for compression/decompression) - CIPHER (for simply encryption/decryption) - DIGEST (computing and verifying digests) - AEAD (combined auth and encryption such as AES-GCM and AES-CCM) - ETA (combined auth and encryption using encrypt-then-authenticate) Additional modes could be added in the future (e.g. if we wanted to support TLS MtE for AES-CBC in the kernel we could add a new mode for that. TLS modes might also affect how AAD is interpreted, etc.) The flat structure also includes the key lengths and algorithms as before. However, code doesn't have to walk the linked list and switch on the algorithm to determine which key is the auth key vs encryption key. The 'csp_auth_*' fields are always used for auth keys and settings and 'csp_cipher_*' for cipher. (Compression algorithms are stored in csp_cipher_alg.) - Drivers no longer register a list of supported algorithms. This doesn't quite work when you factor in modes (e.g. a driver might support both AES-CBC and SHA2-256-HMAC separately but not combined for ETA). Instead, a new 'crypto_probesession' method has been added to the kobj interface for symmteric crypto drivers. This method returns a negative value on success (similar to how device_probe works) and the crypto framework uses this value to pick the "best" driver. There are three constants for hardware (e.g. ccr), accelerated software (e.g. aesni), and plain software (cryptosoft) that give preference in that order. One effect of this is that if you request only hardware when creating a new session, you will no longer get a session using accelerated software. Another effect is that the default setting to disallow software crypto via /dev/crypto now disables accelerated software. Once a driver is chosen, 'crypto_newsession' is invoked as before. - Crypto operations are now solely described by the flat 'cryptop' structure. The linked list of descriptors has been removed. A separate enum has been added to describe the type of data buffer in use instead of using CRYPTO_F_* flags to make it easier to add more types in the future if needed (e.g. wired userspace buffers for zero-copy). It will also make it easier to re-introduce separate input and output buffers (in-kernel TLS would benefit from this). Try to make the flags related to IV handling less insane: - CRYPTO_F_IV_SEPARATE means that the IV is stored in the 'crp_iv' member of the operation structure. If this flag is not set, the IV is stored in the data buffer at the 'crp_iv_start' offset. - CRYPTO_F_IV_GENERATE means that a random IV should be generated and stored into the data buffer. This cannot be used with CRYPTO_F_IV_SEPARATE. If a consumer wants to deal with explicit vs implicit IVs, etc. it can always generate the IV however it needs and store partial IVs in the buffer and the full IV/nonce in crp_iv and set CRYPTO_F_IV_SEPARATE. The layout of the buffer is now described via fields in cryptop. crp_aad_start and crp_aad_length define the boundaries of any AAD. Previously with GCM and CCM you defined an auth crd with this range, but for ETA your auth crd had to span both the AAD and plaintext (and they had to be adjacent). crp_payload_start and crp_payload_length define the boundaries of the plaintext/ciphertext. Modes that only do a single operation (COMPRESS, CIPHER, DIGEST) should only use this region and leave the AAD region empty. If a digest is present (or should be generated), it's starting location is marked by crp_digest_start. Instead of using the CRD_F_ENCRYPT flag to determine the direction of the operation, cryptop now includes an 'op' field defining the operation to perform. For digests I've added a new VERIFY digest mode which assumes a digest is present in the input and fails the request with EBADMSG if it doesn't match the internally-computed digest. GCM and CCM already assumed this, and the new AEAD mode requires this for decryption. The new ETA mode now also requires this for decryption, so IPsec and GELI no longer do their own authentication verification. Simple DIGEST operations can also do this, though there are no in-tree consumers. To eventually support some refcounting to close races, the session cookie is now passed to crypto_getop() and clients should no longer set crp_sesssion directly. - Assymteric crypto operation structures should be allocated via crypto_getkreq() and freed via crypto_freekreq(). This permits the crypto layer to track open asym requests and close races with a driver trying to unregister while asym requests are in flight. - crypto_copyback, crypto_copydata, crypto_apply, and crypto_contiguous_subsegment now accept the 'crp' object as the first parameter instead of individual members. This makes it easier to deal with different buffer types in the future as well as separate input and output buffers. It's also simpler for driver writers to use. - bus_dmamap_load_crp() loads a DMA mapping for a crypto buffer. This understands the various types of buffers so that drivers that use DMA do not have to be aware of different buffer types. - Helper routines now exist to build an auth context for HMAC IPAD and OPAD. This reduces some duplicated work among drivers. - Key buffers are now treated as const throughout the framework and in device drivers. However, session key buffers provided when a session is created are expected to remain alive for the duration of the session. - GCM and CCM sessions now only specify a cipher algorithm and a cipher key. The redundant auth information is not needed or used. - For cryptosoft, split up the code a bit such that the 'process' callback now invokes a function pointer in the session. This function pointer is set based on the mode (in effect) though it simplifies a few edge cases that would otherwise be in the switch in 'process'. It does split up GCM vs CCM which I think is more readable even if there is some duplication. - I changed /dev/crypto to support GMAC requests using CRYPTO_AES_NIST_GMAC as an auth algorithm and updated cryptocheck to work with it. - Combined cipher and auth sessions via /dev/crypto now always use ETA mode. The COP_F_CIPHER_FIRST flag is now a no-op that is ignored. This was actually documented as being true in crypto(4) before, but the code had not implemented this before I added the CIPHER_FIRST flag. - I have not yet updated /dev/crypto to be aware of explicit modes for sessions. I will probably do that at some point in the future as well as teach it about IV/nonce and tag lengths for AEAD so we can support all of the NIST KAT tests for GCM and CCM. - I've split up the exising crypto.9 manpage into several pages of which many are written from scratch. - I have converted all drivers and consumers in the tree and verified that they compile, but I have not tested all of them. I have tested the following drivers: - cryptosoft - aesni (AES only) - blake2 - ccr and the following consumers: - cryptodev - IPsec - ktls_ocf - GELI (lightly) I have not tested the following: - ccp - aesni with sha - hifn - kgssapi_krb5 - ubsec - padlock - safe - armv8_crypto (aarch64) - glxsb (i386) - sec (ppc) - cesa (armv7) - cryptocteon (mips64) - nlmsec (mips64) Discussed with: cem Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D23677 Notes: svn path=/head/; revision=359374
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-263-6/+11
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* netipsec: fix a mismatched uma_zfree -> uma_zfree_pcpuMateusz Guzik2020-02-121-1/+1
| | | | | | | | | PR: 244077 Reported by: lwhsu Fixes: r357805 ("amd64: store per-cpu allocations subtracted by __pcpu") Notes: svn path=/head/; revision=357842
* Fix m_pullup() problem after removing PULLDOWN_TESTs and KAME EXT_*macros.Bjoern A. Zeeb2019-12-012-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | r354748-354750 replaced the KAME macros with m_pulldown() calls. Contrary to the rest of the network stack m_len checks before m_pulldown() were not put in placed (see r354748). Put these m_len checks in place for now (to go along with the style of the network stack since the initial commits). These are not put in for performance but to avoid an error scenario (even though it also will help performance at the moment as it avoid allocating an extra mbuf; not because of the unconditional function call). The observed error case went like this: (1) an mbuf with M_EXT arrives and we call m_pullup() unconditionally on it. (2) m_pullup() will call m_get() unless the requested length is larger than MHLEN (in which case it'll m_freem() the perfectly fine mbuf) and migrate the requested length of data and pkthdr into the new mbuf. (3) If m_get() succeeds, a further m_pullup() call going over MHLEN will fail. This was observed with failing auto-configuration as an RA packet of 200 bytes exceeded MHLEN and the m_pullup() called from nd6_ra_input() dropped the mbuf. (Re-)adding the m_len checks before m_pullup() calls avoids this problems with mbufs using external storage for now. MFC after: 3 weeks Sponsored by: Netflix Notes: svn path=/head/; revision=355254
* Add support for dummy ESP packets with next header field equal toAndrey V. Elsukov2019-11-271-0/+7
| | | | | | | | | | | | | | IPPROTO_NONE. According to RFC4303 2.6 they should be silently dropped. Submitted by: aurelien.cazuc.external_stormshield.eu MFC after: 10 days Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D22557 Notes: svn path=/head/; revision=355129
* netinet*: replace IP6_EXTHDR_GET()Bjoern A. Zeeb2019-11-152-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | In a few places we have IP6_EXTHDR_GET() left in upper layer protocols. The IP6_EXTHDR_GET() macro might perform an m_pulldown() in case the data fragment is not contiguous. Convert these last remaining instances into m_pullup()s instead. In CARP, for example, we will a few lines later call m_pullup() anyway, the IPsec code coming from OpenBSD would otherwise have done the m_pullup() and are copying the data a bit later anyway, so pulling it in seems no better or worse. Note: this leaves very few m_pulldown() cases behind in the tree and we might want to consider removing them as well to make mbuf management easier again on a path to variable size mbufs, especially given m_pulldown() still has an issue not re-checking M_WRITEABLE(). Reviewed by: gallatin MFC after: 8 weeks Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D22335 Notes: svn path=/head/; revision=354749
* Widen NET_EPOCH coverage.Gleb Smirnoff2019-10-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111 Notes: svn path=/head/; revision=353292
* Fix broken window replay check that will allow old packet to be accepted.Fabien Thomas2019-09-061-0/+2
| | | | | | | | | | | This was introduced in r309144. Submitted by: Jean-Francois HREN <jean-francois.hren@stormshield.eu> Approved by: ae@ MFC after: 3 days Notes: svn path=/head/; revision=351935
* Add missing new line in several log messages.Andrey V. Elsukov2019-08-091-6/+6
| | | | | | | | PR: 239694 MFC after: 1 week Notes: svn path=/head/; revision=350816
* netipsec key_register: check for M_NOWAIT alloc failureRyan Libby2019-06-251-1/+1
| | | | | | | | | Reviewed by: ae, cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D20742 Notes: svn path=/head/; revision=349373
* Make the warning intervals for deprecated crypto algorithms tunable.John Baldwin2019-06-114-10/+15
| | | | | | | | | | | | | | | | | New sysctl/tunables can now set the interval (in seconds) between rate-limited crypto warnings. The new sysctls are: - kern.cryptodev_warn_interval for /dev/crypto - net.inet.ipsec.crypto_warn_interval for IPsec - kern.kgssapi_warn_interval for KGSSAPI Reviewed by: cem MFC after: 1 month Relnotes: yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20555 Notes: svn path=/head/; revision=348970
* Add deprecation warnings for IPsec algorithms deprecated in RFC 8221.John Baldwin2019-05-232-0/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | All of these algorithms are either explicitly marked MUST NOT, or they are implicitly MUST NOTs by virtue of not being included in IETF's list of protocols at all despite having assignments from IANA. Specifically, this adds warnings for the following ciphers: - des-cbc - blowfish-cbc - cast128-cbc - des-deriv - des-32iv - camellia-cbc Warnings for the following authentication algorithms are also added: - hmac-md5 - keyed-md5 - keyed-sha1 - hmac-ripemd160 Reviewed by: cem, gnn MFC after: 3 days Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20340 Notes: svn path=/head/; revision=348205
* Replace read_random(9) with more appropriate arc4rand(9) KPIsConrad Meyer2019-04-043-27/+2
| | | | | | | | | Reviewed by: ae, delphij Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D19760 Notes: svn path=/head/; revision=345865
* Remove unused argument to priv_check_cred.Mateusz Guzik2018-12-111-1/+1
| | | | | | | | | | | | | | | | Patch mostly generated with cocinnelle: @@ expression E1,E2; @@ - priv_check_cred(E1,E2,0) + priv_check_cred(E1,E2) Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341827
* Add sadb_x_sa2 extension to SADB_ACQUIRE requests.Andrey V. Elsukov2018-10-211-1/+15
| | | | | | | | | | | | | | | | | | SADB_ACQUIRE requests are send by kernel, when security policy doesn't have corresponding security association for outbound packet. IKE daemon usually registers its handler for such messages and when the kernel asks for SA it can handle this request. Now such requests will contain additional fields that can help IKE daemon to create SA. And IKE now can create SAs using only information from SADB_ACQUIRE request, this is useful when many if_ipsec(4) interfaces are in use and IKE doesn track security policies that was installed by kernel. Obtained from: Yandex LLC MFC after: 3 weeks Sponsored by: Yandex LLC Notes: svn path=/head/; revision=339533
* Fix witness warning in xform_init().Andrey V. Elsukov2018-09-264-90/+87
| | | | | | | | | | | | | | | | | | | | | Do not call crypto_newsession() while holding xforms_lock mutex. Release mutex before invoking crypto_newsession(), and use ipsec_kmod_enter()/ipsec_kmod_exit() functions to protect from doing access to unloaded kernel module memory. Move xform-releated functions into subr_ipsec.c to be able use ipsec_kmod_* functions. Also unconditionally build ipsec_kmod_* functions, since now they are always used by IPSec code. Add xf_cntr field to struct xformsw, it is used by ipsec_kmod_* functions. Also constify xf_name field, since it is not expected to be modified. Approved by: re (kib) Differential Revision: https://reviews.freebsd.org/D17302 Notes: svn path=/head/; revision=338945
* Use the new VNET_DEFINE_STATIC macro when we are defining static VNETAndrew Turner2018-07-243-42/+42
| | | | | | | | | | | variables. Reviewed by: bz Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D16147 Notes: svn path=/head/; revision=336676
* OpenCrypto: Convert sessions to opaque handles instead of integersConrad Meyer2018-07-184-28/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Track session objects in the framework, and pass handles between the framework (OCF), consumers, and drivers. Avoid redundancy and complexity in individual drivers by allocating session memory in the framework and providing it to drivers in ::newsession(). Session handles are no longer integers with information encoded in various high bits. Use of the CRYPTO_SESID2FOO() macros should be replaced with the appropriate crypto_ses2foo() function on the opaque session handle. Convert OCF drivers (in particular, cryptosoft, as well as myriad others) to the opaque handle interface. Discard existing session tracking as much as possible (quick pass). There may be additional code ripe for deletion. Convert OCF consumers (ipsec, geom_eli, krb5, cryptodev) to handle-style interface. The conversion is largely mechnical. The change is documented in crypto.9. Inspired by https://lists.freebsd.org/pipermail/freebsd-arch/2018-January/018835.html . No objection from: ae (ipsec portion) Reported by: jhb Notes: svn path=/head/; revision=336439
* OCF: Add a typedef for session identifiersConrad Meyer2018-07-137-15/+18
| | | | | | | | | | No functional change. This should ease the transition from an integer session identifier model to an opaque pointer model. Notes: svn path=/head/; revision=336269
* fix locking within tcp_ipsec_pcbctl() to match ipsec4_pcbctl(), ipsec4_pcbctl()Sean Bruno2018-07-041-7/+9
| | | | | | | | | | | | | | | | | | | IPSEC_PCBCTL() functions, which include tcp_ipsec_pcbctl(), ipsec4_pcbctl(), and ipsec6_pcbctl(), should all have matching locking semantics. ipsec4_pcbctl() and ipsec6_pcbctl() expect the inp to be unlocked on entry and exit and appear to be correctly implemented as such. But tcp_ipsec_pcbctl() had other semantics. This patch fixes the semantics for tcp_ipsec_pcbctl(). Submitted by: Jason Eggleston <jason@eggnet.com> MFH: 2 weeks Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14623 Notes: svn path=/head/; revision=335962
* r335795 build fix: make static functions staticEd Maste2018-06-291-2/+2
| | | | | | | | | | -Werror,-Wmissing-prototypes makes this an error otherwise. MFC with: 335795 Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=335796
* Make debug output produced by `setkey -x` command a more human readable.Andrey V. Elsukov2018-06-291-3/+84
| | | | | | | | | | | | Add text names of SADB message types and extension headers to the output. Obtained from: Yandex LLC MFC after: 2 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D16036 Notes: svn path=/head/; revision=335795
* uma: implement provisional api for per-cpu zonesMateusz Guzik2018-06-081-2/+2
| | | | | | | | | | | | | Per-cpu zone allocations are very rarely done compared to regular zones. The intent is to avoid pessimizing the latter case with per-cpu specific code. In particular contrary to the claim in r334824, M_ZERO is sometimes being used for such zones. But the zeroing method is completely different and braching on it in the fast path for regular zones is a waste of time. Notes: svn path=/head/; revision=334858
* Rework IP encapsulation handling code.Andrey V. Elsukov2018-06-051-37/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently it has several disadvantages: - it uses single mutex to protect internal structures. It is used by data- and control- path, thus there are no parallelism at all. - it uses single list to keep encap handlers for both INET and INET6 families. - struct encaptab keeps unneeded information (src, dst, masks, protosw), that isn't used by code in the source tree. - matches are prioritized and when many tunneling interfaces are registered, encapcheck handler of each interface is invoked for each packet. The search takes O(n) for n interfaces. All this work is done with exclusive lock held. What this patch includes: - the datapath is converted to be lockless using epoch(9) KPI. - struct encaptab now linked using CK_LIST. - all unused fields removed from struct encaptab. Several new fields addedr: min_length is the minimum packet length, that encapsulation handler expects to see; exact_match is maximum number of bits, that can return an encapsulation handler, when it wants to consume a packet. - IPv6 and IPv4 handlers are stored in separate lists; - added new "encap_lookup_t" method, that will be used later. It is targeted to speedup lookup of needed interface, when gif(4)/gre(4) have many interfaces. - the need to use protosw structure is eliminated. The only pr_input method was used from this structure, so I don't see the need to keep using it. - encap_input_t method changed to avoid using mbuf tags to store softc pointer. Now it is passed directly trough encap_input_t method. encap_getarg() funtions is removed. - all sockaddr structures and code that uses them removed. We don't have any code in the tree that uses them. All consumers use encap_attach_func() method, that relies on invoking of encapcheck() to determine the needed handler. - introduced struct encap_config, it contains parameters of encap handler that is going to be registered by encap_attach() function. - encap handlers are stored in lists ordered by exact_match value, thus handlers that need more bits to match will be checked first, and if encapcheck method returns exact_match value, the search will be stopped. - all current consumers changed to use new KPI. Reviewed by: mmacy Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D15617 Notes: svn path=/head/; revision=334671
* Correctly handle the padding for IPv6-AH, as specified by RFC4302Conrad Meyer2018-06-041-20/+36
| | | | | | | | | | | | | | | | | | | | | The RFC specifies that under IPv6 the complete AH header must be 64 bit aligned, and under IPv4, 32 bit aligned. Prior to this change, we (along with other BSDs and MacOS) had violated this requirement. This makes it possible to set up IPv6-AH between Linux and BSD, and also probably between Windows and BSD. PR: 222684 Reported and tested by: Jason Mader <jasonmader AT gmail.com> Obtained from: NetBSD xform_ah.c 1.105 (b939fe2483972eb43d71bf990cfb7f26dece7839 NetBSD/src on GH) by Maxime Villard MFC after: 35.2731 hours Relnotes: probably (breaks ipv6 compat with older FreeBSD/NetBSD/MacOS) Sponsored by: Dell EMC Isilon Notes: svn path=/head/; revision=334625
* Temporary disable SPDCACHE statistic accounting until proper fix will beAndrey V. Elsukov2018-05-281-2/+2
| | | | | | | committed. This fixes the kernel build without option IPSEC. Notes: svn path=/head/; revision=334278
* netipsec/!VIMAGE: don't declare/define spdcache_destroy on non-VIMAGE buildsMatt Macy2018-05-241-2/+4
| | | | | | | this breaks MIPS compiles in universe Notes: svn path=/head/; revision=334194
* Add a SPD cache to speed up lookups.Fabien Thomas2018-05-222-22/+258
| | | | | | | | | | | | | | | | | | | | | | | | When large SPDs are used, we face two problems: - too many CPU cycles are spent during the linear searches in the SPD for each packet - too much contention on multi socket systems, since we use a single shared lock. Main changes: - added the sysctl tree 'net.key.spdcache' to control the SPD cache (disabled by default). - cache the sp indexes that are used to perform SP lookups. - use a range of dedicated mutexes to protect the cache lines. Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu> Reviewed by: ae Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D15050 Notes: svn path=/head/; revision=334054
* Merge r1.22-1.23 from NetBSD:Andrey V. Elsukov2018-04-261-6/+10
| | | | | | | | | | | | | | | | | | | | | | Don't assume M_PKTHDR is set only on the first mbuf of the chain. The check is replaced by (m1 != m), which is equivalent to the previous code: we want to modify m->m_pkthdr.len only when 'm' was not passed in m_adj(). Fix a pretty bad mistake, that has always been there: m_adj(m1, -(m1->m_len - roff)); if (m1 != m) m->m_pkthdr.len -= (m1->m_len - roff); This is wrong: m_adj() will modify m1->m_len, so we're using a wrong value when manually adjusting m->m_pkthdr.len. Reported by: Maxime Villard <max at m00nbsd dot net> Obtained from: NetBSD MFC after: 1 week Notes: svn path=/head/; revision=333016