aboutsummaryrefslogtreecommitdiff
path: root/sys
Commit message (Collapse)AuthorAgeFilesLines
...
* fusefs: fix VOP_ADVLOCK with SEEK_ENDAlan Somers2022-10-191-2/+31
| | | | | | | | | | | When the user specifies SEEK_END, unlike SEEK_CUR, VOP_ADVLOCK must adjust lock offsets itself. Sort-of related to bug 266886. MFC after: 2 weeks Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D37040
* tcp: style the struct tcpcb definitionGleb Smirnoff2022-10-191-26/+26
| | | | | | | | - Use C99 types uintXX_t instead of u_intXX_t. - Try to make space/tab usage a little bit more consistent. - Shorten comments to fit into 80 chars. Not a functional change, just making future changes easier to read.
* nfsd: Make the pNFS server update Change for Setxattr/RmxattrRick Macklem2022-10-181-2/+47
| | | | | | | | | | | | | | | | When the NFS server does the Setxattr or Rmxattr operation, the Change attribute (va_filerev) needs to be updated. Without this patch, that was not happening for the pNFS server configuration. This patch does a Setattr against the DS file to make the Change attribute change. This bug was discovered during a recent IETF NFSv4 testing event, where the Change attribute wasn't changed in the operation reply. MFC after: 1 month
* mbuf: don't include lock.h conditionallyGleb Smirnoff2022-10-181-9/+0
| | | | | | | | Using keywords from opt_global.h in the system headers allows to create cryptic kernel build failures, that depend on the options used in the kernel config, very hard to debug and understand. Fixes: 063d8114650c025240604b5c6df9358355fc98f4
* geom_part: Check number of GPT entries and size of GPT entryZhenlei Huang2022-10-181-4/+27
| | | | | | | | | | | | | | | | | | | | | | | Current specification does not have upper limit of the number of partition entries and the size of partition entry. In 799eac8c3df597179bbb3b078362f3ff03993a1a Andrey V. Elsukov introduced a limit maximum number of GPT entries to 4k, but that is for write routine (gpart create) only. When attaching disks that have large number of GPT entries exceeding the limit, or disks with large size of partition entry, it is still possible to exhaust kernel memory. 1. Reuse the limit of the maximum number of partition entries. 2. Limit the maximum size of GPT entry to 1k. In current specification (2.10) the size of GPT entry is 128 * 2^n while n >= 0, and the size - 128 is reserved. 1k should be sufficient enough for foreseen future. PR: 266548 Discussed with: imp Reviewed by: markj MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D36717
* ofw: add BUS_GET_DEVICE_PATH interface to openfirm/fdt, somewhat incomplete.Takanori Watanabe2022-10-187-0/+22
| | | | | | | | | | | This add BUS_GET_DEVICE_PATH interface, which shows device tree of openfirm/fdt. In qemu-system-arm64 with "virt" machine with device-tree firmware, % devctl getpath OFW cpu0 Reviewed by: andrew Differential Revision: https://reviews.freebsd.org/D37031
* amd64: Add FIRECRACKER kernel configurationColin Percival2022-10-181-0/+197
| | | | | | | | | This kernel configuration supports the Firecracker VMM environment. Relnotes: FreeBSD can now run inside the Firecracker VMM via the amd64 FIRECRACKER kernel configuration. Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36672
* PVH: Set bootmethod to PVHColin Percival2022-10-181-1/+1
| | | | | | | | | | Now that we can PVH boot on a non-Xen hypervisor, we shouldn't set machdep.bootmethod to "XEN". Instead, set it to "PVH"; there are other ways to discern the hypervisor. Reviewed by: royger Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36191
* PVH: support whitespace cmdline splittingColin Percival2022-10-181-1/+1
| | | | | | | | | | | | For historical reasons, Xen kernel command lines have options separated by commas. Every other FreeBSD platform uses whitespace; this is also necessary in PVH in order to support the Firecracker VMM. Allow options to be separated by any combination of commas and whitespace. Reviewed by: imp Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36190
* x86: Distinguish Xen from non-Xen PVH bootsColin Percival2022-10-181-24/+68
| | | | | | | | | | | | | | | | | | | | | | | The PVH boot protocol, introduced by Xen, is now used by some non-Xen platforms (e.g. the Firecracker VM) as well. In order to accommodate these, we use CPUID to detect Xen and only perform Xen-specific setup when running on that platform. The "isxen" function duplicates some work done by identcpu.c later in the boot process; but we need it here since this is the very first C code which runs when PVH booting (even before hammer_time). In many places the existing code had xc_printf(...); HYPERVISOR_shutdown(SHUTDOWN_crash); making use of Xen functionality to print a message and shut down; in the places where this idiom can be reached in the non-xen case, we replace it idiom with a CRASH(...) macro which calls those in the Xen case and halts in the non-Xen case. Reviewed by: royger Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D35801
* x86: Add support for PVH version 1 memmapColin Percival2022-10-181-2/+47
| | | | | | | | | | | | | | Version 0 of PVH booting uses a Xen hypercall to retrieve the system memory map; in version 1 the memory map can be provided via the start_info structure. Using the memory map from the version 1 start_info structure allows FreeBSD to use PVH booting on systems other than Xen, e.g. on the Firecracker VM. Reviewed by: royger Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D35800
* x86: Add MPTABLE_LINUX_BUG_COMPAT optionColin Percival2022-10-183-0/+56
| | | | | | | | | | | | | | | | | | | Linux has two bugs in its handling of the x86 MP table: 1. It assumes that there is always 640 kB of base memory, and looks for the MP table in the top kB of this even if the memory map indicates that memory location does not exist. 2. It ignores that entry_count field and instead iterates through the MP table by scanning until it runs out of bytes in the table. The Firecracker VM (and probably other related VMs) relies on both of these bugs. With the MPTABLE_LINUX_BUG_COMPAT option, we search for the MP table at address 639k even if that isn't in the memory map; and replace a zeroed entry_count with a value computed from scanning the table until we run out of table bytes. Reviewed by: imp Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D35799
* Add NO_LEGACY_PCIB kernel option to i386, amd64Colin Percival2022-10-183-0/+8
| | | | | | | | | | | | | On systems without a PCI bus, legacy_pcib_identify by default creates one anyway: legacy_pcib_identify: no bridge found, adding pcib0 anyway This commit adds a kernel option NO_LEGACY_PCIB which disables this, allowing systems to be fully PCI-free. Reviewed by: imp Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D35798
* ns8250: Check if flush via FCR succeededColin Percival2022-10-181-0/+19
| | | | | | | | | | | | | The emulated UART in the Firecracker VMM (aka the implementation in the rust-vmm/vm-superio project) includes FIFOs but does not implement the FCR register, which is used by ns8250_flush to flush the FIFOs. Check the LSR to see if there is still data in the FIFOs and call ns8250_drain if necessary. Discussed with: emaste, imp, jrtc27 Sponsored by: https://patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36979
* vtblk: Use busdmaColin Percival2022-10-181-17/+144
| | | | | | | | | | | | | | | | | We assume that bus addresses from busdma are the same thing as "physical" addresses in the Virtio specification; this seems to be true for the platforms on which Virtio is currently supported. For block devices which are limited to a single data segment per I/O, we request PAGE_SIZE alignment from busdma; this is necessary in order to support unaligned I/Os from userland, which may cross a boundary between two non-physically-contiguous pages. On devices which support more data segments per I/O, we retain the existing behaviour of limiting I/Os to (max data segs - 1) * PAGE_SIZE. Reviewed by: bryanv Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36667
* vtblk: Include pointer to softc in requestColin Percival2022-10-181-6/+13
| | | | | | | | No functional change intended. Reviewed by: bryanv, imp Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36666
* vtblk: Requeue inside vtblk_request_executeColin Percival2022-10-181-3/+5
| | | | | | | | | | | | | | | | Most virtio_blk requests are launched from vtblk_startio; prior to this commit, if vtblk_request_execute failed (e.g. due to a lack of space on the virtio queue) vtblk_startio would requeue the request to be reattempted later. Add a flag "vbr_requeue_on_error" to requests and perform the requeuing from inside vtblk_request_execute instead. No functional change intended. Reviewed by: bryanv, imp Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36665
* vtblk: Make vtblk_request_execute return void.Colin Percival2022-10-181-11/+21
| | | | | | | | | | | The error, if any, now gets stashed in the request structure. (Step 1 of reworking this driver to use busdma.) No functional change intended. Reviewed by: bryanv, imp Sponsored by: https://www.patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36664
* virtio_mmio: Support command-line parametersColin Percival2022-10-182-0/+151
| | | | | | | | | | | | | | | | | | | | The Virtio MMIO bus driver was added in 2014 with support for devices exposed via FDT; in 2018 support was added to discover Virtio MMIO devices via ACPI tables, as in QEMU. The Firecracker VMM eschews both FDT and ACPI, instead presenting device information via kernel command line arguments of the form virtio_mmio.device=<parameters>. These command line parameters get converted into kernel environment variables; this adds support for parsing those variables and attaching virtio_mmio children to nexus. There is a case to be made that it would be cleaner to have a new "cmdlinebus" attached to nexus and virtio_mmio children attached to that. A future commit might do that. Discussed with: imp, jrtc27 Sponsored by: https://patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36189
* kern: Support duplicate variables in early kenvColin Percival2022-10-181-17/+51
| | | | | | | | | | | | | | | | | | | | | | | | Some virtual machines pass virtio MMIO device parameters via the kernel command line as a series of virtio_mmio.device=<parameters> options. These get translated into FreeBSD kernel environment variables; but unfortunately they all use the same variable name, which resulted in all but the first such parameter being ignored when the dynamic kernel environment is set up from the initial environment buffers. With this commit, duplicate environment settings will instead be stored as ${name}_1, ${name}_2... ${name}_9999. In the unlikely event that the same variable is set over 10000 times before the dynamic kernel environment is set up, we panic. Variable settings after the dynamic environment is initialized continue to override the previously-set value; the change is limited to the very early kernel boot (prior to SI_SUB_KMEM + 1) and changes behaviour from "ignore" to "store with a different name" only. Reviewed by: imp Feedback from: kevans Sponsored by: https://patreon.com/cperciva Differential Revision: https://reviews.freebsd.org/D36187
* dpaa2: fix build without WITNESSGleb Smirnoff2022-10-181-0/+1
| | | | | Using mutex(9) requires including <sys/lock.h> per manual page. With WITNESS the header was cryptically included via dpaa_ni.h -> mbuf.h.
* dpaa2: fix standalone module buildGleb Smirnoff2022-10-181-0/+2
|
* dpaa2: fix build without FDTGleb Smirnoff2022-10-181-1/+3
|
* iflib: Introduce v2 of TX Queue Select FunctionalityEric Joyner2022-10-173-25/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | For v2, iflib will parse packet headers before queueing a packet. This commit also adds a new field in the structure that holds parsed header information from packets; it stores the IP ToS/traffic class field found in the IPv4/IPv6 header. To help, it will only partially parse header packets before queueing them by using a new header parsing function that does less than the current parsing header function; for our purposes we only need up to the minimal IP header in order to get the IP ToS infromation and don't need to pull up more data. For now, v1 and v2 co-exist in this patch; v1 still offers a less-invasive method where none of the packet is parsed in iflib before queueing. This also bumps the sys/param.h version. Signed-off-by: Eric Joyner <erj@FreeBSD.org> Tested by: IntelNetworking MFC after: 3 days Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D34742
* linuxkpi: retire now-unused MIPS supportEd Maste2022-10-172-5/+2
| | | | | | Reviewed by: bz, manu Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D37023
* carp: fix regression panic from ccd69bd573fGleb Smirnoff2022-10-171-1/+2
| | | | | Reported & tested by: Oleg Ginzburg <olevole olevole.ru> Fixes: ccd69bd573f185308e7652190ff64b50f7fba381
* ksched: correct return code for invalid priorityAli Abdallah2022-10-171-1/+1
| | | | | | | | | | | By convention, EINVAL is returned when validating arguments, not EPERM. This matches the documented behaviour of sched_setscheduler(3), and that of SCHED_OTHER. PR: 227735 MFC after: 1 week Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D37021
* Fix mpr(4) panic during a firmware update.Kenneth D. Merry2022-10-171-1/+1
| | | | | | | | | | | | | | | | | | | | | Issue Description: The RequestCredits field of IOCFacts got changed between the Phase23 firmware to Phase24 firmware. So as part of firmware update operation, driver has to free the resources & pools which are created with the Phase23 Firmware's IOCFacts data (i.e. during driver load time) and has to reallocate the resources and pools using Phase24's IOCFacts data. Here driver has freed the interrupts but missed to reallocate the interrupts and hence config page read operation is getting timed out and controller is going for recursive reinit (controller reset) operations and leading to kernel panic. Fix: Reallocate the interrupts if the interrupts are disabled as part of firmware update/downgrade operation. Submitted by: Sreekanth Ready <sreekanth.reddy@broadcom.com> Tested by: ken MFC after: 3 days
* if_ovpn(4): implement ioctl() to set if_flagsGert Doering2022-10-172-0/+43
| | | | | | | | Fully working openvpn(8) --iroute support needs real subnet config on ovpn(4) interfaces (IFF_BROADCAST), while client-side/p2p configs need IFF_POINTOPOINT setting. So make this configurable. Reviewed by: kp
* fusefs: After successful F_GETLK, l_whence should be SEEK_SETAlan Somers2022-10-171-0/+1
| | | | | | | | PR: 266886 Reported by: John Millikin <jmillikin@gmail.com> MFC after: 2 weeks Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D37014
* if_ovpn: fix use-after-freeKristof Provost2022-10-171-2/+3
| | | | | | | | | | ovpn_encrypt_tx_cb() calls ovpn_encap() to transmit a packet, then adds the length of the packet to the "tunnel_bytes_sent" counter. However, after ovpn_encap() returns 0, the mbuf chain may have been freed, so the load of m->m_pkthdr.len may be a use-after-free. Reported by: markj Sponsored by: Rubicon Communications, LLC ("Netgate")
* Revert "unbound: Vendor import 1.17.0"Cy Schubert2022-10-161-0/+4
| | | | | | | This reverts commit 64d318ea98b7c59f5567d47a9a8474887d8b5cb8, reversing changes made to 8063dc03202fad7d6bdf34976bc8556fa3f23fa1. Revert a mismerge which reversed 8063dc03202fad7d6bdf34976bc8556fa3f23fa1.
* unbound: Vendor import 1.17.0Cy Schubert2022-10-161-4/+0
| | | | | | | | Added ACL per interface, proxy protocol and bug fixes. Announcement: https://nlnetlabs.nl/news/2022/Oct/13/unbound-1.17.0-released/ Merge commit '643f9a0581e8aac7eb790ced1164748939829826' into new_merge
* nfsd: Make Setxattr/Removexattr NFSv4.2 ops IO_SYNCRick Macklem2022-10-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | When the NFS server does Setxattr or Removexattr, the operations must be done IO_SYNC. If a server crashes/reboots immediately after replying it must have the extended attribute changes. Since UFS does extended attributes asynchronously by default and there is no "ioflag" argument in the VOP calls, follow the VOP calls with VOP_FSYNC(), to ensure the operation has been done synchronously. This was found by inspection while investigating a bug discovered during a recent IETF NFSv4 testing event, where the Change attribute wasn't changed in the operation reply. This bug will take further work for ZFS and the pNFS server configuration, but is now fixed for a non-pNFS UFS exported file system. MFC after: 1 month
* kern_intr: Check for NULL event in intr_destroy()Mitchell Horne2022-10-151-0/+3
| | | | | | | | | It likely won't happen, but is consistent with the other functions of this KPI. Reviewed by: imp, jhb MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D33479
* nfscl: Fix the NFSv4.0 mount so that it does not crashRick Macklem2022-10-151-8/+8
| | | | | | | | | | | | | | Commit efe58855f3ea modifies IN_LOOPBACK() so that it uses a VNET variable. Without this patch, nfscl_getmyip() uses IN_LOOPBACK() when the VNET is not set and crashes the system. nfscl_getmyip() is only called when a NFSv4.0 (not NFSv4.1/4.2) mount is done. This patch re-organizes nfscl_getmyip() so that IN_LOOPBACK() is before the CURVENT_RESTORE() macro, to avoid the crashes. Reviewed by: karels, zlei.huang_gmail.com Differential Revision: https://reviews.freebsd.org/D37008
* if_me: Use dedicated network privilegeZhenlei Huang2022-10-153-1/+3
| | | | | | | Separate if_me privileges from if_gif. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D36691
* pf: fix LINT-NOINET6 buildKristof Provost2022-10-151-2/+6
|
* clnt_vc.c: Replace msleep() with pause() to avoid assert panicRick Macklem2022-10-141-3/+3
| | | | | | | | | | | | | | | | An msleep() in clnt_vc.c used a global "fake_wchan" wchan argument along with the mutex in a CLIENT structure. As such, it was possible to use different mutexes for the same wchan and cause a panic assert. Since this is in a rarely executed code path, the assert panic was only recently observed. Since "fake_wchan" never gets a wakeup, this msleep() can be replaced with a pause() to avoid the panic assert, which is what this patch does. Reviewed by: kib, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D36977
* Add initial DPAA2 supportDmitry Salychev2022-10-1433-0/+17601
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPAA2 is a hardware-level networking architecture found in some NXP SoCs which contain hardware blocks including Management Complex (MC, a command interface to manipulate DPAA2 objects), Wire Rate I/O processor (WRIOP, packets distribution, queuing, drop decisions), Queues and Buffers Manager (QBMan, Rx/Tx queues control, Rx buffer pools) and the others. The Management Complex runs NXP-supplied firmware which provides DPAA2 objects as an abstraction layer over those blocks to simplify an access to the underlying hardware. Each DPAA2 object has its own driver (to perform an initialization at least) and will be visible as a separate device in the device tree. Two new drivers (dpaa2_mc and dpaa2_rc) act like firmware buses in order to form a hierarchy of the DPAA2 devices: acpiX (or simplebusX) dpaa2_mcX dpaa2_rcX dpaa2_mcp0 ... dpaa2_mcpN dpaa2_bpX dpaa2_macX dpaa2_io0 ... dpaa2_ioM dpaa2_niX dpaa2_mc is suppossed to be a root of the hierarchy, comes in ACPI and FDT flavours and implements helper interfaces to allocate and assign bus resources, MSI and "managed" DPAA2 devices (NXP treats some of the objects as resources for the other DPAA2 objects to let them function properly). Almost all of the DPAA2 objects are assigned to the resource containers (dpaa2_rc) to implement isolation. The initial implementation focuses on the DPAA2 network interface to be operational. It is the most complex object in terms of dependencies which uses I/O objects to transmit/receive packets. Approved by: bz (mentor) Tested by: manu, bz MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D36638
* kinst: Clarify a comment in the trampoline allocatorMark Johnston2022-10-141-4/+5
| | | | Fixes: f0bc4ed144fc ("kinst: Initial revision")
* kinst: Remove an unused constantMark Johnston2022-10-141-5/+0
| | | | | | This was left over after a rework of the trampoline allocator. Fixes: f0bc4ed144fc ("kinst: Initial revision")
* vmm: validate icr valueCorvin Köhne2022-10-141-3/+88
| | | | | | | | | | Not all combinations of icr values are allowed. Neither Intel nor AMD document what happens when an invalid value is written to the icr. Ignore the IPI. So, the guest will note that the IPI wasn't delivered. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D36946 Sponsored by: Beckhoff Automation GmbH & Co. KG
* vmm: increase vlapic versionCorvin Köhne2022-10-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Mac os panics on apic versions lower than 0x14. See https://opensource.apple.com/source/xnu/xnu-7195.81.3/osfmk/i386/lapic_native.c.auto.html Additionally, an upcoming commit will validate the icr values written by the guest. Older intel processors allow some different combinations than the newer ones. AMD documents that only the newer combinations are allowed. So, bumping the version allows us to avoid a differentiation between AMD and Intel. Intel documents that newer processors than the P6 are using the new combinations. Sadly, Intel does not document which apic version belongs to those processors. Linux identifies newer apics by a version larger or equal to 0x14. Intel and AMD allow apic version between 0x10 and 0x15. So, using 0x14 seems to be fine. See https://github.com/torvalds/linux/blob/3eba620e7bd772a0c7dc91966cb107872b54a910/arch/x86/kernel/apic/apic.c#L238 Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D36945 Sponsored by: Beckhoff Automation GmbH & Co. KG
* vmm: permit some IPIs to be handled by userspaceCorvin Köhne2022-10-147-70/+161
| | | | | | | | | | | Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland. INIT and STARTUP IPIs are now returned to userland. Due to backward compatibility reasons, a new capability is added for enabling VM_EXITCODE_IPI. Reviewed by: jhb Differential Revision: https://reviews.freebsd.org/D35623 Sponsored by: Beckhoff Automation GmbH & Co. KG
* pf: apply the network stack's ICMP rate limiting to ICMP errors sent by pfKristof Provost2022-10-143-2/+28
| | | | | | PR: 266477 Event: Aberdeen Hackathon 2022 Differential Revision: https://reviews.freebsd.org/D36903
* netinet6: trim overly long lines in GET_PKTOPT_VAR(), fit into 80 charsGleb Smirnoff2022-10-131-23/+23
|
* inpcb: provide in_pcbremhash() to reduce copy-pasteGleb Smirnoff2022-10-131-34/+27
|
* vinum/geom_vinum_var.h: Fix missing linefeed in license.Pedro F. Giffuni2022-10-131-1/+2
| | | | | | | The license is still standard BSD-4-clause and the text is unmodified but add a missing linefeed for readability. No functional change.
* sctp: improve sending of ABORT packets in response to INIT-ACKsMichael Tuexen2022-10-121-1/+4
| | | | | | | | Ensure that the initiate tag of the INIT-ACK chunk is used as the verification tag of the packet containing the ABORT chunk. Reported by: Suganya Dharma MFC after: 1 week