aboutsummaryrefslogtreecommitdiff
path: root/sys/ofed
Commit message (Collapse)AuthorAgeFilesLines
* kern: net: remove TCP_LINGERTIMEKyle Evans2021-02-191-2/+0
| | | | | | | | | | | | | | | | | | | | | | TCP_LINGERTIME can be traced back to BSD 4.4 Lite and perhaps beyond, in exactly the same form that it appears here modulo slightly different context. It used to be the case that there was a single pr_usrreq method with requests dispatched to it; these exact two lines appeared in tcp_usrreq's PRU_ATTACH handling. The only purpose of this that I can find is to cause surprising behavior on accepted connections. Newly-created sockets will never hit these paths as one cannot set SO_LINGER prior to socket(2). If SO_LINGER is set on a listening socket and inherited, one would expect the timeout to be inherited rather than changed arbitrarily like this -- noting that SO_LINGER is nonsense on a listening socket beyond inheritance, since they cannot be 'connected' by definition. Neither Illumos nor Linux reset the timer like this based on testing and inspection of Illumos, and testing of Linux. Reviewed by: rscheff, tuexen Differential Revision: https://reviews.freebsd.org/D28265
* Fix mismerge in OFED updateRyan Stone2021-02-041-0/+2
| | | | | | | | | | | | | | | When OFED was upgraded to Linux v4.9, a bunch of Linux-specific netlink changes were dropped. Unfortunately, there was a mismerge in this process and as a result ib_sa_cancel_query() would fail to cancel an outstanding MAD. This was causing rdma_destroy_id() to hang indefinitely waiting for the MAD to complete and release the final reference. Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D28421 Reviewed by: hselasky, kib MFC after: 2 months
* Update user access region, UAR, APIs in the core in mlx5core.Hans Petter Selasky2021-01-081-8/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change include several changes as listed below all related to UAR. UAR is a special PCI memory area where the so-called doorbell register and blue flame register live. Blue flame is a feature for sending small packets more efficiently via a PCI memory page, instead of using PCI DMA. - All structures and functions named xxx_uuars were renamed into xxx_bfreg. - Remove partially implemented Blueflame support from mlx5en(4) and mlx5ib. - Implement blue flame register allocator. - Use blue flame register allocator in mlx5ib. - A common UAR page is now allocated by the core to support doorbell register writes for all of mlx5en and mlx5ib, instead of allocating one UAR per sendqueue. - Add support for DEVX query UAR. - Add support for 4K UAR for libmlx5. Linux commits: 7c043e908a74ae0a935037cdd984d0cb89b2b970 2f5ff26478adaff5ed9b7ad4079d6a710b5f27e7 0b80c14f009758cefeed0edff4f9141957964211 30aa60b3bd12bd79b5324b7b595bd3446ab24b52 5fe9dec0d045437e48f112b8fa705197bd7bc3c0 0118717583cda6f4f36092853ad0345e8150b286 a6d51b68611e98f05042ada662aed5dbe3279c1e MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking
* Fix for referencing file via its vnode in ibore.Hans Petter Selasky2020-11-021-43/+39
| | | | | | | | | | | | | | | | | | | | | Use the native vnode lookup functions, instead of going via the LinuxKPI, because the file referenced is typically created outside the LinuxKPI, and the LinuxKPI's fdget() can only resolve file descriptor numbers which were created by itself. The vnode pointer is used as an identifier to identify XRCD handles which are sharing resources. This patch fixes the so-called XRCD support in ibcore for FreeBSD. Refer to ibv_open_xrcd(3) for more information how the file descriptor argument is used. Reviewed by: kib@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=367269
* Factor out generic IP over infiniband, IPoIB, definitions and codeHans Petter Selasky2020-10-224-375/+33
| | | | | | | | | | | | | into net/if_infiniband.c and net/infiniband.h . No functional change intended. Differential Revision: https://reviews.freebsd.org/D26254 Reviewed by: melifaro@ MFC after: 1 week Sponsored by: Mellanox Technologies // NVIDIA Networking Notes: svn path=/head/; revision=366930
* Allow IP over IB to work with multiple FIBs.Ravi Pokala2020-10-131-0/+2
| | | | | | | | | | | | | | | | | Call M_SETFIB() to make sure the IPoIB packet is directed to the correct interface-specific FIB. This was sufficient to allow general-purpose routing using the default FIB, and a separate FIB for routing between IPoIB on ib0 and IPoEthernet on mce0. Reviewed by: hselasky Obtained from: Anmol Kumar <anmolk at panasas dot com> MFC after: 1 week Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D25239 Notes: svn path=/head/; revision=366686
* infiniband: Appease CovertyEric van Gyzen2020-08-314-17/+13
| | | | | | | | | | | | | | | | Coverity claims the call to rdma_gid2ip in cma_igmp_send overwrites addr. Use a consistent definition of sockaddr to prevent detections and code changes in the future. Submitted by: bret_ketchum@dell.com Reported by: Coverity Reviewed by: hselasky, kib MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D26229 Notes: svn path=/head/; revision=364997
* Infiniband clients must be attached and detached in a specific order in ibcore.Hans Petter Selasky2020-07-0610-19/+36
| | | | | | | | | | | | | | | | | | | Currently the linking order of the infiniband, IB, modules decide in which order the clients are attached and detached. For example one IB client may use resources from another IB client. This can lead to a potential deadlock at shutdown. For example if the ipoib is unregistered after the ib_multicast client is detached, then if ipoib is using multicast addresses a deadlock may happen, because ib_multicast will wait for all its resources to be freed before returning from the remove method. Fix this by using module_xxx_order() instead of module_xxx(). Differential Revision: https://reviews.freebsd.org/D23973 MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=362953
* Convert OFED rtable interactions to the new routing KPI.Alexander V. Chernikov2020-04-152-82/+59
| | | | | | | | Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D24387 Notes: svn path=/head/; revision=359966
* Fix for double unlock in ipoib.Hans Petter Selasky2020-03-161-1/+0
| | | | | | | | | | The ipoib_unicast_send() function is not supposed to unlock the priv lock. MFC after: 3 days Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=359014
* Fix some whitespace issues in ipoib.Hans Petter Selasky2020-03-061-3/+3
| | | | | | | | MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=358694
* Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)Pawel Biernacki2020-02-261-2/+4
| | | | | | | | | | | | | | | | | | | r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are still not MPSAFE (or already are but aren’t properly marked). Use it in preparation for a general review of all nodes. This is non-functional change that adds annotations to SYSCTL_NODE and SYSCTL_PROC nodes using one of the soon-to-be-required flags. Mark all obvious cases as MPSAFE. All entries that haven't been marked as MPSAFE before are by default marked as NEEDGIANT Approved by: kib (mentor, blanket) Commented by: kib, gallatin, melifaro Differential Revision: https://reviews.freebsd.org/D23718 Notes: svn path=/head/; revision=358333
* Make sure the VNET is properly set when reaping mbufs in ipoib.Hans Petter Selasky2020-01-111-0/+4
| | | | | | | | | | | | | | | | | | | | Else the following panic may happen: panic() icmp_error() ipoib_cm_mb_reap() linux_work_fn() taskqueue_run_locked() taskqueue_thread_loop() fork_exit() fork_trampoline() Submitted by: Andreas Kempe <kempe@lysator.liu.se> MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=356633
* Prevent potential underflow in ibcore.Hans Petter Selasky2019-11-151-1/+1
| | | | | | | | | | | Linux commit: a9018adfde809d44e71189b984fa61cc89682b5e MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=354728
* Correct MR length field to be 64-bit in ibcore.Hans Petter Selasky2019-11-151-1/+1
| | | | | | | | | | | Linux commit: edd31551148c09608feee6b8756ad148d550ee3b MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=354727
* VLAN_TRUNKDEV() requires epochification in ibcore after r353292.Hans Petter Selasky2019-10-161-3/+7
| | | | | | | Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=353633
* Replace rdma_is_upper_dev_rcu() with rdma_vlan_dev_real_dev() in ibcore.Hans Petter Selasky2019-10-162-13/+1
| | | | | | | | | | | This reduces the number of references to VLAN_TRUNKDEV() in ibcore. Currently only VLAN is supported as a child interface in FreeBSD. Remove superfluous RCU locking. Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=353632
* VLAN_DEVAT() requires epochification in ipoib after r353292.Hans Petter Selasky2019-10-161-0/+6
| | | | | | | Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=353631
* Fix missing epochification of the ibcore code after r353292.Hans Petter Selasky2019-10-151-1/+4
| | | | | | | Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=353547
* Fix missing epochification of the ipoib code after r353292.Hans Petter Selasky2019-10-153-0/+12
| | | | | | | Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=353546
* Convert to if_foreach_llmaddr() KPI.Gleb Smirnoff2019-10-141-66/+75
| | | | | | | Reviewed by: hselasky Notes: svn path=/head/; revision=353504
* Widen NET_EPOCH coverage.Gleb Smirnoff2019-10-071-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas. However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win. Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier. On output path we already enter epoch quite early - in the ip_output(), in the ip6_output(). This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch. Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed. This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources. Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111 Notes: svn path=/head/; revision=353292
* Make sure the transmit loop doesn't get starved in ipoib.Hans Petter Selasky2019-10-024-14/+29
| | | | | | | | | | | | | | When the software send queue gets filled up, callbacks to if_transmit will stop. Make sure the transmit callback routine checks the send queue and outputs any remaining mbufs. Else the remaining mbufs may simply sit in the output queue blocking the transmit path. MFC after: 3 days Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=352955
* OFED: Fix accidental double-copy of rdma_sdp.h in r351176Conrad Meyer2019-08-181-78/+0
| | | | | | | | | | The mistake came about like this: the first attempt to commit was blocked by a pre-commit hook due to missing SVN tags. svn revert doesn't delete new files, I guess. While reapplying the fixed diff, the non-empty target file was just concatenated with the new contents? Ugh. :-( Notes: svn path=/head/; revision=351180
* OFED: Unbreak SDP support in ibcoreConrad Meyer2019-08-173-62/+373
| | | | | | | | | | | | This regression was introduced in the r326169 Linux v4.9 Infiniband upgrade. Restore the functionality. Reviewed by: hselasky Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D21298 Notes: svn path=/head/; revision=351176
* SDP: Fix brain-o from r351162Conrad Meyer2019-08-171-1/+1
| | | | | | | | | Lost in translation between different SDP stacks. Reported by: hselasky Notes: svn path=/head/; revision=351169
* OFED: Fix ib_mad.h ib_user_mad.h include to match new uapi pathConrad Meyer2019-08-171-1/+1
| | | | | | | Sponsored by: Dell EMC Isilon Notes: svn path=/head/; revision=351163
* SDP: Add a dbg() on QP eventsConrad Meyer2019-08-171-0/+5
| | | | | | | Sponsored by: Dell EMC Isilon Notes: svn path=/head/; revision=351162
* SDP: Also log a nice status string in RX WC error dbg()Conrad Meyer2019-08-171-2/+3
| | | | | | | Sponsored by: Dell EMC Isilon Notes: svn path=/head/; revision=351161
* SDP: Include nice string names for raw event numbers in a dbg()Conrad Meyer2019-08-171-1/+2
| | | | | | | Sponsored by: Dell EMC Isilon Notes: svn path=/head/; revision=351160
* SDP: SYSCTL_DECL SDP-wide sysctl node in headerConrad Meyer2019-08-172-1/+2
| | | | | | | | | | | This allows use of the shared _net_inet_sdp in more than one compilation unit. (Nothing in-tree uses this today, but some of Isilon's out-of-tree SDP enhancements add sysctls below the node.) Sponsored by: Dell EMC Isilon Notes: svn path=/head/; revision=351159
* Fix prio vs. nonprio tagged traffic in RDMACMSlava Shwartsman2019-06-041-3/+17
| | | | | | | | | | | | | | | In current RDMACM implementation RDMACM server will not find a GID index when the request was prio-tagged and the sever is non prio-tagged and vise-versa. According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be considered as untagged. Treat RDMACM request the same. Reviewed by: hselasky, kib MFC after: 3 Days Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=348601
* Include eventhandler.h in more compilation unitsConrad Meyer2019-05-213-0/+3
| | | | | | | | | | | | | | | | This was enumerated with exhaustive search for sys/eventhandler.h includes, cross-referenced against EVENTHANDLER_* usage with the comm(1) utility. Manual checking was performed to avoid redundant includes in some drivers where a common os_bsd.h (for example) included sys/eventhandler.h indirectly, but it is possible some of these are redundant with driver-specific headers in ways I didn't notice. (These CUs did not show up as missing eventhandler.h in tinderbox.) X-MFC-With: r347984 Notes: svn path=/head/; revision=348026
* Add new rates to ibcore.Hans Petter Selasky2019-05-082-1/+27
| | | | | | | | | | | | Add the new rates that were added to the Infiniband specification as part of HDR and 2x support. Submitted by: slavash@ MFC after: 3 days Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=347301
* Handle IB_EVENT_DEVICE_FATAL event in ipoib.Hans Petter Selasky2019-05-081-1/+2
| | | | | | | | | | | Perform flush if IB_EVENT_DEVICE_FATAL was received. Submitted by: slavash@ MFC after: 3 days Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=347291
* Fix endless loop in ipoib_poll().Hans Petter Selasky2019-05-081-1/+1
| | | | | | | | | | | | | | ib_req_notify_cq may return negative value which will indicate a failure. In the case of uncorrectable error, we will end up in an endless loop. Fix that, by going to another loop with poll_more only if there is anything left to poll. Submitted by: slavash@ MFC after: 3 days Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=347278
* Make sure to error out when arming the CQ fails in ibcore.Hans Petter Selasky2019-05-081-3/+7
| | | | | | | | MFC after: 3 days Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=347257
* Mechanical cleanup of epoch(9) usage in network stack.Gleb Smirnoff2019-01-092-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | - Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always. - Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit. This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively. Discussed with: jtl, gallatin Notes: svn path=/head/; revision=342872
* Support MSG_DONTWAIT in send*(2).Mark Johnston2019-01-041-1/+2
| | | | | | | | | | | | | | | As it does for recv*(2), MSG_DONTWAIT indicates that the call should not block, returning EAGAIN instead. Linux and OpenBSD both implement this, so the change makes porting easier, especially since we do not return EINVAL or so when unrecognized flags are specified. Submitted by: Greg V <greg@unrelenting.technology> Reviewed by: tuexen MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D18728 Notes: svn path=/head/; revision=342768
* ipoib: Notify on modify QP failure only when relevantSlava Shwartsman2018-12-051-1/+25
| | | | | | | | | | | | | | | | | | | | | Modify QP can fail and it can be acceptable, like when moving from RST to ERR state, all the rest are not acceptable and a message to the log should be printed. The current code prints on all failures and many messages like: "Failed to modify QP to ERROR state" appear, even when supported by the state machine of the QP object. Linux commit: 5dc78ad1904db597bdb4427f3ead437aae86f54c Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341538
* ipoib: increase the non-cm queue lengthSlava Shwartsman2018-12-051-2/+2
| | | | | | | | | | | | | | | When a packet needs fragmentation, it might generate more than 3 fragments. With the queue length 3, all fragments are generated faster than the queue is drained, which effectively drops fourth and later fragments on the floor. Submitted by: kib@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341537
* ipoib: Don't do a light flush when MTU is unchanged.Slava Shwartsman2018-12-051-3/+5
| | | | | | | | | | | | | | | When changing the MTU of ibX network interfaces, check that the MTU was really changed before requesting an update of the multicast rules. Else we might go into an infinite loop joining and leaving ibX multicast groups towards the opensm master interface. Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341536
* ipoib: correct setting MTU from inside ipoib(4).Slava Shwartsman2018-12-053-13/+39
| | | | | | | | | | | | | | | | | | | | | It is not enough to set ifnet->if_mtu to change the interface MTU. System saves the MTU for route in the radix tree, and route cache keeps the interface MTU as well. Since addition of the multicast group causes recalculation of MTU, even bringing the interface up changes MTU from 4042 to 1500, which makes the system configuration inconsistent. Worse, ip_output() prefers route MTU over interface MTU, so large packets are not fragmented and dropped on floor. Fix it for ipoib(4) using the same approach (or hack) as was applied for it_tun/if_tap in r339012. Thanks to bz@ for giving the hint. Submitted by: kib@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341535
* ibcore: Fix clearing of bound device interface.Slava Shwartsman2018-12-051-2/+7
| | | | | | | | | | | | | | Binding to a loopback device is not allowed. Make sure the destination device address is global by clearing the bound device interface. Only do this conditionally, else link local addresses won't work. Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341534
* ibcore: ip6_dev_find() needs to know the scope ID.Slava Shwartsman2018-12-052-3/+4
| | | | | | | | | | | | Else the wrong network device can be returned for link-local addresses. Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341533
* ibcore: Fix sleeping in atomic when RoCE is usedSlava Shwartsman2018-12-051-19/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A couple of places in the CM do spin_lock_irq(&cm_id_priv->lock); ... if (cm_alloc_response_msg(work->port, work->mad_recv_wc, &msg)) However when the underlying transport is RoCE, this leads to a sleeping function being called with the lock held - the callchain is cm_alloc_response_msg() -> ib_create_ah_from_wc() -> ib_init_ah_from_wc() -> rdma_addr_find_l2_eth_by_grh() -> rdma_resolve_ip() and rdma_resolve_ip() starts out by doing req = kzalloc(sizeof *req, GFP_KERNEL); not to mention rdma_addr_find_l2_eth_by_grh() doing wait_for_completion(&ctx.comp); to wait for the task that rdma_resolve_ip() queues up. Fix this by moving the AH creation out of the lock. Linux commit: c76161181193985087cd716fdf69b5cb6cf9ee85 Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341532
* ibcore: Add missing unref of netdevice.Slava Shwartsman2018-12-051-0/+1
| | | | | | | | | | Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341531
* ibcore: Fix loopback with rdma-cm.Slava Shwartsman2018-12-051-7/+26
| | | | | | | | | | | | | | | | | Trying to validate loopback fails because rtalloc1() resolves system local addresses to the loopback network interface, lo0. Fix this by explicitly checking for loopback during validation of the source and destination network address. If the source address belongs to a local network interface and is equal to the destination address, there is no need to run the destination address through rtalloc1(). Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341530
* ibcore: Make sure all VNETs are scanned for VLAN interfaces.Slava Shwartsman2018-12-051-5/+10
| | | | | | | | | | | | | The master network interface and the VLANs may reside in different VNETs. Make sure that all VNETs are searched when scanning for GID entries. Submitted by: netapp Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341529
* ibcore: Always check return value from ib_init_ah_from_wc().Slava Shwartsman2018-12-052-16/+26
| | | | | | | | | | | | | | | | This prevents code from accepting RoCEv1 connections when only ROCEv2 is enabled and vice versa. Linux commit: 0c4386ec77cfcd0ccbdbe8c2e67dd3a49b2a4c7f Submitted by: hselasky@ Approved by: hselasky (mentor) MFC after: 1 week Sponsored by: Mellanox Technologies Notes: svn path=/head/; revision=341528