path: root/sys
Commit message (Collapse)AuthorAgeFilesLines
* ixl(4): Fix VLAN HW filteringKrzysztof Galazka2021-02-0411-313/+487
| | | | | | | | | | | | | | | | | | | X700 family of controllers has limited number of available VLAN HW filters. Driver did not handle properly a case when user assigned more VLANs to the interface which had all filters already in use. Fix that by disabling HW filtering when it is impossible to create filters for all requested VLANs. Keep track of registered VLANs using bitstring to be able to re-enable HW filtering when number of requested VLANs drops below the limit. Also switch all allocations to use M_IXL malloc type to ease detecting memory leaks in the driver. Reviewed by: erj Tested by: gowtham.kumar.ks@intel.com MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28137
* Add a comment notifying that "device axp" requires miibus for build.Muhammad Moinur Rahman2021-02-042-1/+3
| | | | | | | | | Although if RJ-45 interface is not being used the miibus is not required but miibus is a build time dependency. Reviewed by: imp, manu, rajesh1.kumar@amd.com Approved by: imp, manu, rajesh1.kumar@amd.com Differential Revision: https://reviews.freebsd.org/D28465
* Fix mismerge in OFED updateRyan Stone2021-02-041-0/+2
| | | | | | | | | | | | | | | When OFED was upgraded to Linux v4.9, a bunch of Linux-specific netlink changes were dropped. Unfortunately, there was a mismerge in this process and as a result ib_sa_cancel_query() would fail to cancel an outstanding MAD. This was causing rdma_destroy_id() to hang indefinitely waiting for the MAD to complete and release the final reference. Sponsored by: Dell Inc. Differential Revision: https://reviews.freebsd.org/D28421 Reviewed by: hselasky, kib MFC after: 2 months
* Fix race condition in linuxkpi workqueueRyan Stone2021-02-041-22/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Consider the following scenario: 1. A delayed_work struct in the WORK_ST_TIMER state. 2. Thread A calls mod_delayed_work() 3. Thread B (a callout thread) simultaneously calls linux_delayed_work_timer_fn() The following sequence of events is possible: A: Call linux_cancel_delayed_work() A: Change state from TIMER TO CANCEL B: Change state from CANCEL to TASK B: taskqueue_enqueue() the task A: taskqueue_cancel() the task A: Call linux_queue_delayed_work_on(). This is a no-op because the state is WORK_ST_TASK. As a result, the delayed_work struct will never be invoked. This is causing address resolution in ib_addr.c to stop permanently, as it never tries to reschedule a task that it thinks is already scheduled. Fix this by introducing locking into the cancel path (which corresponds with the lock held while the callout runs). This will prevent the callout from changing the state of the task until the cancel is complete, preventing the race. Differential Revision: https://reviews.freebsd.org/D28420 Reviewed by: hselasky MFC after: 2 months
* [POWERPC64BE] add mrsas driver to GENERIC64Alfredo Dal'Ava Junior2021-02-041-0/+1
| | | | | | | Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br> Reviewed by: luporl, alfredo, kadesai (on email) Sponsored by: Eldorado Research Institute (eldorado.org.br) Differential Revision: https://reviews.freebsd.org/D26531
* [POWERPC64BE] mrsas: add big-endian supportAlfredo Dal'Ava Junior2021-02-044-201/+510
| | | | | | | | | Add endiannes conversions in order to support big-endian platforms Submitted by: Andre Fernando da Silva <andre.silva@eldorado.org.br> Reviewed by: luporl, alfredo, kadesai (on email) Sponsored by: Eldorado Research Institute (eldorado.org.br) Differential Revision: https://reviews.freebsd.org/D26531
* Add a VM flag to prevent reclaim on a failed contig allocationRyan Stone2021-02-033-2/+11
| | | | | | | | | | | | | | | | If a M_WAITOK contig alloc fails, the VM subsystem will try to reclaim contiguous memory twice before actually failing the request. On a system with 64GB of RAM I've observed this take 400-500ms before it finally gives up, and I believe that this will only be worse on systems with even more memory. In certain contexts this delay is extremely harmful, so add a flag that will skip reclaim for allocation requests to allow those paths to opt-out of doing an expensive reclaim. Sponsored by: Dell Inc Differential Revision: https://reviews.freebsd.org/D28422 Reviewed by: markj, kib
* dwmmc: Multiple busdma fixes.Michal Meloun2021-02-031-15/+32
| | | | | | | | | | | | | | | - limit maximum segment size to 2048 bytes. Although dwmmc supports a buffer fragment with a maximum length of 4095 bytes, use the nearest lower power of two as the maximum fragment size. Otherwise, busdma create excessive buffer fragments. - fix off by one error in computation of the maximum data transfer length. - in addition, reserve two DMA descriptors that can be used by busdma bouncing. The beginning or end of the buffer can be misaligned. - Don’t ignore errors passed to bus_dmamap_load() callback function. - In theory, a DMA engine may be running at time when next dma descriptor is constructed. Create a full DMA descriptor before OWN bit is set. MFC after: 2 weeks
* linux: remove locks around callout_drain in timerfd_close()shu2021-02-031-2/+0
| | | | | | | | | The lock around callout_drain() is unnecessary and may cause deadlock when one closes a timer descriptor during timer execution. Reviewed By: delphij Submitted By: ankohuu_outlook.com (Shunchao Hu) Differential Revision: https://reviews.freebsd.org/D28148
* Revert "Reimplement strlen"Mateusz Guzik2021-02-031-25/+54
| | | | | | | | | This reverts commit 710e45c4b8539d028877769f1a4ec088c48fb5f1. It breaks for some corner cases on big endian ppc64. Given the stage of the release process it is best to revert for now. Reported by: jhibbits
* linux: make timerfd_settime(2) set expirations count to zeroshu2021-02-031-0/+1
| | | | | | | | | | | | | | | | | On Linux, read(2) from a timerfd file descriptor returns an unsigned 8-byte integer (uint64_t) containing the number of expirations that have occurred, if the timer has already expired one or more times since its settings were last modified using timerfd_settime(), or since the last successful read(2). That's to say, once we do a read or call timerfd_settime(), timer fd's expiration count should be zero. Some Linux applications create timerfd and add it to epoll with LT mode, when event comes, they do timerfd_settime instead of read to stop event source from trigger. On FreeBSD, timerfd_settime(2) didn't set the count to zero, which caused high CPU utilization. Submitted by: ankohuu_outlook.com (Shunchao Hu) Differential Revision: https://reviews.freebsd.org/D28231
* Expose clang's alignment builtins and use them for roundup2/rounddown2Alex Richardson2021-02-032-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | This makes roundup2/rounddown2 type- and const-preserving and allows using it on pointer types without casting to uintptr_t first. Not performing pointer-to-integer conversions also helps the compiler's optimization passes and can therefore result in better code generation. When using it with integer values there should be no change other than the compiler checking that the alignment value is a valid power-of-two. I originally implemented these builtins for CHERI a few years ago and they have been very useful for CheriBSD. However, they are also useful for non-CHERI code so I was able to upstream them for Clang 10.0. Rationale from the clang documentation: Clang provides builtins to support checking and adjusting alignment of pointers and integers. These builtins can be used to avoid relying on implementation-defined behavior of arithmetic on integers derived from pointers. Additionally, these builtins retain type information and, unlike bitwise arithmetic, they can perform semantic checking on the alignment value. There is also a feature request for GCC, so GCC may also support it in the future: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98641 Reviewed By: brooks, jhb, imp Differential Revision: https://reviews.freebsd.org/D28332
* arm64: Initialize VFP control register.Michal Meloun2021-02-034-0/+32
| | | | | | The RW fields in this register reset to architecturally unknown values, so initialize these to the proper rounding and denormal mode. MFC after: 1 week
* Always clamp curve25519 keys prior to use.Peter Grehan2021-02-031-0/+1
| | | | | | | | | | | | | | | | | This fixes an issue where a private key contained bits that should have been cleared by the clamping process, but were passed through to the scalar multiplication routine and resulted in an invalid public key. Issue diagnosed (and an initial fix proposed) by shamaz.mazum in PR 252894. This fix suggested by Jason Donenfeld. PR: 252894 Reported by: shamaz.mazum Reviewed by: dch MFC after: 3 days
* Enable multipath routing by default.Alexander V. Chernikov2021-02-031-1/+1
| | | | | | | | | | | | | ROUTE_MPATH was added to the GENERIC kernel in r368648. According to the plan in D27428, it was enabled with `net.route.multipath` sysctl set to 0. Given enough time has passed, this change enables route multipath by default. The goal is to ship FreeBSD 13 with multipath turned on. Reviewed By: donner, olivier MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28423
* Correct description for kern.proc.proc_tdEd Maste2021-02-021-1/+1
| | | | | | | | | | | | kern.proc.proc_td returns the process table with an entry for each thread. Previously the description included "no threads", presumably a cut-and-pasteo in 2648efa621748. Description suggested by PauAmma. PR: 253146 MFC after: 3 days Sponsored by: The FreeBSD Foundation
* Allow setting alias port ranges in libalias and ipfw. This will allow a systemNeel Chauhan2021-02-025-3/+41
| | | | | | | | | to be a true RFC 6598 NAT444 setup, where each network segment (e.g. user, subnet) can have their own dedicated port aliasing ranges. Reviewed by: donner, kp Approved by: 0mp (mentor), donner, kp Differential Revision: https://reviews.freebsd.org/D23450
* Make DataSN counter of solicited Data-Out local.Alexander Motin2021-02-022-6/+5
| | | | | | | | | | | DataSN for solicited Data-Out is per-R2T. Since we handle whole R2T in one go, we don't need to store it anywhere, especially in global per-command structure. This may allow us to handle multiple R2T per command at once, if we decide, or may be relax locking. Rename the second use of that field to io_referenced_task_tag. MFC after: 1 month
* cache: fix trailing slash support in face of permission problemsMateusz Guzik2021-02-021-0/+10
| | | | | Reported by: Johan Hendriks <joh.hendriks gmail.com> Tested by: kevans
* WITH_OFED build option: fixKonstantin Belousov2021-02-021-1/+1
| | | | | | | | | | | | | Userspace has OFED build enabled for quite some time, but kernel modules were not. This is useless config because any userspace IB code requires kernel support. So enable modules build by default. Move WITH_OFED to WITHOUT_OFED since defaults are now enabled. Reviewed by: emaste, hselasky, kevans MFC after: 3 days Sponsored by: NVidia Networking / Mellanox Technologies Differential Revision: https://reviews.freebsd.org/D28460
* tests/sys/kern/crc32: Check for SSE4.2 before using itAlex Richardson2021-02-022-6/+17
| | | | | | | | | | | This avoids a SIGILL when running these tests on QEMU (which defaults to a basic amd64 CPU without SSE4.2). This commit also tests the table-based implementations in addition to testing the hw-accelerated crc32 versions. Reviewed By: cem, kib, markj Differential Revision: https://reviews.freebsd.org/D28395
* bhyve/ioapic: improve the tracking of IRR bitRoger Pau Monné2021-02-021-4/+18
| | | | | | | | | | | | | | | One common method of EOI'ing an interrupt at the IO-APIC level is to switch the pin to edge triggering mode and then back into level mode. That would cause the IRR bit to be cleared and thus further interrupts to be injected. FreeBSD does indeed use that method if the IO-APIC EOI register is not supported. The bhyve IO-APIC emulation code didn't clear the IRR bit when doing that switch, and was also missing acknowledging the IRR state when trying to inject an interrupt in vioapic_send_intr. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28238
* bhyve/ioapic: only account for asserted line in level modeRoger Pau Monné2021-02-021-0/+2
| | | | | | | | | | After modifying a redirection entry only try to inject an interrupt if the pin is in level mode, pins in edge mode shouldn't take into account the line assert status as they are triggered by edge changes, not the line status itself. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28237
* bhyve/vioapic: remove an extra pin masked checkRoger Pau Monné2021-02-021-3/+1
| | | | | | | | | | | vioapic_send_intr does already check whether the pin is masked before injecting the interrupt, there's no need to do it in vioapic_write also. No functional change intended. Reviewed by: grehan Differential revision: https://reviews.freebsd.org/D28236
* Replace the redundant MENTAT macro with SOLARIS.Cy Schubert2021-02-0211-44/+42
| | | | | | | MENTAT and SOLARIS are synonymous. Remove the extraneous duplicate macro. MFC after: 1 week
* Indentation cleanup resulting from the cleanup of #ifdefs.Cy Schubert2021-02-027-216/+215
| | | | | | | | | | | The conscious decision was made not to perform any indentation or whitespace cleanup while cleaning out old redunant #ifdefs. The reason for this was to avoid confusing future readers of history and diffs with cosmetic changes, making bisection of any possible bugs introduced more difficult. This commit cleans up the whitespace detritus left behind from the previous #ifdef cleanup commits. MFC after: 1 week
* Retire the K&R/STD C __P prototype declarations.Cy Schubert2021-02-0241-1016/+1010
| | | | | | | | | In the old days when K&R C and STD C were each in use a workaround (read hack) was required to allow the same code to work on each without modification. All C compilers support STD C. We can finally put the __P prototype to rest. MFC after: 1 week
* vt: parse_font_info_static should set refcount, not parse_font_infoToomas Soome2021-02-011-2/+8
| | | | | | | | | | | As we get started with no memory allocator, we set up static font data for font passed by loader (if there is any). At this time, we also must set refcount 1, and refcount will get incremented in cnprobe() callback. At some point the memory allocator will be available, and we will set up properly allocated font data, but we should not disturb the refcount. PR: 253147
* zfs: update zfs_config.h to match OpenZFS gf11b09decMartin Matuska2021-02-011-34/+97
| | | | | | | | | | | | | | | | | | Update zfs_config.h to match latest merge in FreeBSD The version string is declared as 2.0.0-FreeBSD_gf11b09dec to provide more information about the loaded module: - the OpenZFS version in base is 2.0 - we are using the in tree-module ("FreeBSD") - the last merged OpenZFS git revision ("gf11b09dec") With future merges the git revision tag should be updated. As we are merging from OpenZFS master branch and already include features like dRAID, referencing patchlevel releases (2.0.1, 2.0.2) is pointless. Reviewed by: freqlabs MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28447
* iflib: Free resources in a consistent order during detachSai Rajesh Tallamraju2021-02-013-23/+20
| | | | | | | | | | | | | Memory and PCI resources are freed with no particular order. This could cause use-after-frees when detaching following a failed attach. For instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but iflib_tqg_detach() attempts to access this array. Similarly, adapter queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to access adapter queues to free PCI resources. MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27634
* bridge: fix STP roles and protos stringsJonah Caplan2021-02-011-6/+6
| | | | | | | | | Add the missing commas that got lost in e5539fb618cc7. PR: 252532 Reviewd by: kp@, donner@, freqlabs@ MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28425
* arm64: Improve DDB backtrace supportJessica Clarke2021-02-017-41/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | The existing implementation relies on each trap handler saving a normal stack frame record, which is a waste of time and space when we're already saving a trapframe to the stack. It's also wrong as it currently saves LR not ELR. Instead of patching it up, rewrite it based on the RISC-V implementation with inspiration from the amd64 implementation for how to handle vectored traps to provide an improved implementation. This includes compressing the information down to one line like other architectures rather than the highly-verbose old form that repeats itself by printing LR and FP in one frame only to print them as PC and SP in the next. It also includes printing out actually useful information about the traps that occurred, though FAR is not saved in the trapframe so we cannot print it (in general it can be clobbered between when the trap happened and now), only ESR. The AAPCS also allows the stack frame record to be located anywhere in the frame, not just the top, so the caller's SP is not at a fixed offset from the callee's FP like on almost all other architectures in existence. This means there is no way to derive the caller's SP in the unwinder, and so we have to drop that bit of (unused) state everywhere. Reviewed by: jhb, markj Differential Revision: https://reviews.freebsd.org/D28026
* Fix LINT kernel builds after 1a714ff20419 .Hans Petter Selasky2021-02-012-30/+8
| | | | | | | MFC after: 1 week Discussed with: rrs@ Differential Revision: https://reviews.freebsd.org/D28357 Sponsored by: Mellanox Technologies // NVIDIA Networking
* sctp: small cleanup, no functional change intended.Michael Tuexen2021-02-011-4/+2
| | | | MFC after: 3 days
* zfs: remove incomplete ifdefs for lockless symlink supportMateusz Guzik2021-02-011-8/+0
| | | | This wil be handled differently upstream and merged later.
* cxgbe(4): Fixes to tx coalescing.Navdeep Parhar2021-02-014-13/+62
| | | | | | | | | | - The behavior implemented in r362905 resulted in delayed transmission of packets in some cases, causing performance issues. Use a different heuristic to predict tx requests. - Add a tunable/sysctl (hw.cxgbe.tx_coalesce) to disable tx coalescing entirely. It can be changed at any time. There is no change in default behavior.
* mips: fix NLM platforms breakage caused by e0a0a3efOleksandr Tymoshenko2021-02-011-0/+18
| | | | | | | NetLogic platforms have their own implementation of cpu_init_interrupts. Apply the same logic to it as to intr_machdep.c. PR: 253051
* x86: use compiler intrinsics for bswap*Mateusz Guzik2021-02-011-59/+3
* amd64: use compiler intrinsics for bsf* and bsr*Mateusz Guzik2021-02-011-32/+4
* cache: add delayed degenerate path handlingMateusz Guzik2021-02-011-32/+25
* cache: move hash computation into the parsing loopMateusz Guzik2021-02-011-3/+42
* sctp: improve input validationMichael Tuexen2021-01-311-38/+62
| | | | | | | | | Improve the handling of INIT chunks in specific szenarios and report and appropriate error cause. Thanks to Anatoly Korniltsev for reporting the issue for the userland stack. MFC after: 3 days
* mips: fix early kernel panic when setting up interrupt countersOleksandr Tymoshenko2021-01-312-36/+22
| | | | | | | | | | | | | | | | | | Commit 248f0ca converted intrcnt and intrnames from u_long[] and char[] to u_long* and char* respectively, but for non-INTRNG mips these symbols were defined in .S file as a pre-allocated static arrays, so the problem wasn't cought at compile time. Conversion from an array to a pointer requires pointer initialization and it wasn't done for MIPS, so whatever happenned to be in the begginning of intcnt[] array was used as a pointer value. Move intrcnt/intrnames to C code and allocate them dynamically although with a fixed size at the moment. Reviewed by: emaste PR: 253051 Differential Revision: https://reviews.freebsd.org/D28424 MFC after: 1 day
* msdosfs: fix vnode leak with msdosfs_rename()Edward Tomasz Napierala2021-01-311-0/+8
| | | | | | | | | | This could happen when failing due to disappearing source file. Reviewed By: kib Tested by: pho Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27338
* msdosfs: fix double unlock if the source file disappearsEdward Tomasz Napierala2021-01-311-1/+0
| | | | | | | | | | | We would unlock fvp here, only to unlock it again below, just before "bad". Reviewed By: kib Tested by: pho Sponsored by: NetApp, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D27339
* cxgb(4): Remove assumption of physically contiguous mbufs.Alexander Motin2021-01-313-36/+5
| | | | | | | | | | | | | | | | | | | | Investigation of iSCSI target data corruption reports brought me to discovery that cxgb(4) expects mbufs to be physically contiguous, that is not true after I've started using m_extaddref() in software iSCSI for large zero-copy transmissions. In case of fragmented memory the driver transmitted garbage from pages following the first one due to simple use of pmap_kextract() for the first pointer instead of proper bus_dmamap_load_mbuf_sg(). Seems like it was done as some optimization many years ago, and at very least it is wrong in a world of IOMMUs. This patch just removes that optimization, plus limits packet coalescing for mbufs crossing page boundary, also depending on assumption of one segment per packet. MFC after: 3 days Sponsored by: iXsystems, Inc. Reviewed by: mmacy, np Differential revision: https://reviews.freebsd.org/D28428
* amd64: move memcmp checks upfrontMateusz Guzik2021-01-311-23/+29
| | | | | | | | | | | | | | | This is a tradeoff which saves jumps for smaller sizes while making the 8-16 range slower (roughly in line with the other cases). Tested with glibc test suite. For example size 3 (most common with vfs namecache) (ops/s): before: 407086026 after: 461391995 The regressed range of 8-16 (with 8 as example): before: 540850489 after: 461671032
* cache: add trailing slash supportMateusz Guzik2021-01-311-43/+184
| | | | Tested by: pho
* cache: handle NOFOLLOW requests for symlinksMateusz Guzik2021-01-311-5/+24
| | | | Tested by: pho
* Use process fib for inet/inet6 fib_algo sysctls.Alexander V. Chernikov2021-01-311-2/+2
| | | | | | This allows to set/query fib algo for non-default fibs. MFC after: 3 days