| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Signed-off-by: Elyes Haouas <ehaouas@noos.fr>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/885
|
|
|
|
|
| |
Reviewed by: kib (older version)
Differential Revision: https://reviews.freebsd.org/D39921
|
|
|
|
|
|
|
|
| |
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With clang it expands to "inline"; clang in practice may inline
externally visible functions even without the hint. So just remove the
hints and let the compiler decide.
No functional change intended. pmap.o is identical before and after
this patch.
Reviewed by: alc
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D42446
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The sysctl knob 'vm.pmap.allow_2m_x_ept' is loader tunable and have
public document entry in security(7) but is fetched from kernel
environment 'hw.allow_2m_x_ept'. That is inconsistent and obscure.
As there is public security advisory FreeBSD-SA-19:25.mcepsc [1],
people may refer to it and use 'hw.allow_2m_x_ept', let's keep old
name for compatibility.
[1] https://www.freebsd.org/security/advisories/FreeBSD-SA-19:25.mcepsc.asc
Reviewed by: kib
Fixes: c08973d09c95 Workaround for Intel SKL002/SKL012S errata
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42311
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a man page for pmap_kextract(9), with alias to vtophys(9). This man
page is based on pmap_extract(9).
Add it as cross reference in pmap(9), and add comments above the
function implementations.
Co-authored-by: Graham Perrin <grahamperrin@gmail.com>
Co-authored-by: mhorne
Sponsored by: The FreeBSD Foundation
Pull Request: https://github.com/freebsd/freebsd-src/pull/827
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reverts the changes made in D19670 and fixes the original
issue by allocating and prepopulating a leaf page table page for wired
userspace 2M pages.
The original issue is an edge case that creates an unmapped, wired
region in userspace. Subsequent faults on this region can trigger wired
superpage creation, which leads to a panic in pmap_demote_pde_locked()
as the pmap does not create a leaf page table page for the wired
superpage. D19670 fixed this by disallowing preemptive creation of
wired superpage mappings, but that fix is currently interfering with an
ongoing effort of speeding up vm_map_wire for large, contiguous entries
(e.g. bhyve wiring guest memory).
Reviewed by: alc, markj
Sponsored by: Google, Inc. (GSoC 2023)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D41132
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For amd64, i386, arm, and riscv, i.e. all architectures except arm64,
the custom implementation is provided since we maintain the bitmask of
active CPUs anyway.
Arm64 uses somewhat naive iteration over CPUs and match current vmspace'
pmap with the argument. It is not guaranteed that vmspace->pmap is the
same as the active pmap, but the inaccuracy should be toleratable.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32360
|
|
|
|
| |
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Because KASAN shadows the kernel image itself (KMSAN currently does
not), a shadow mapping of the boot stack must be created very early
during boot. pmap_san_enter() reserves a fixed number of pages for the
purpose of creating and mapping this shadow region.
After commit 789df254cc9e ("amd64: Use a larger boot stack"), it could
happen that this reservation is insufficient; this happens when
bootstack crosses a PAGE_SHIFT + KASAN_SHADOW_SCALE_SHIFT boundary.
Update the calculation to take into account the new size of the boot
stack.
Fixes: 789df254cc9e ("amd64: Use a larger boot stack")
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
amd64 is special in that its implementation of zpcpu_offset_cpu() is not
the identity transformation, even in !SMP kernels. Because the pm_pcidp
array of amd64's struct pmap is allocated from a pcpu UMA zone, this
means that accessing pm_pcidp directly, as is done in !SMP
implementations of pmap_invalidate_*, does not work. Specifically, I
see occasional unexplicable crashes in userspace when PCIDs are enabled.
Apply a minimal patch to fix the problem. While it would also make
sense to provide separate implementations of zpcpu_* for !SMP kernels,
fixing it this way makes the SMP and !SMP implementations of
pmap_invalidate_* more similar.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D41230
|
|
|
|
|
| |
Recent changes to the pctrie code make it necessary to initialize the
kernel pmap's rangeset for PKU.
|
|
|
|
|
|
|
|
|
|
| |
Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.
Reviewed by: alc, kib, markj
Differential Revision: https://reviews.freebsd.org/D40971
|
|
|
|
|
|
|
|
|
|
| |
The function pmap_pde_ept_executable() should not be conditionally
compiled based on VM_NRESERVLEVEL. It is required indirectly by
pmap_enter(..., psind=1) even when reservation-based allocation is
disabled at compile time.
Reviewed by: alc
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
|
| |
Since pmap_ps_enabled() is true by default, check it inside of
pmap_promote_pde() instead of at every call site.
Modify pmap_promote_pde() to return true if the promotion succeeded and
false otherwise. Use this return value in a couple places.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D40744
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Stop requiring all of the PTEs to have the accessed bit set for superpage
promotion to occur. Given that change, add support for promotion to
pmap_enter_quick(), which does not set the accessed bit in the PTE that
it creates.
Since the final mapping within a superpage-aligned and sized region of a
memory-mapped file is typically created by a call to pmap_enter_quick(),
we now achieve promotions in circumstances where they did not occur
before, for example, the X server's read-only mapping of libLLVM-15.so.
See also https://www.usenix.org/system/files/atc20-zhu-weixi_0.pdf
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40478
|
|
|
|
|
|
|
|
| |
Now that <sys/tslog.h> is wrapped in #ifdef _KERNEL, it's safe to have
tslog annotations in files which might be built from userland (i.e. in
subr_boot.c, which is built as part of the boot loader).
This reverts commit 59588a546f55523d6fd37ab42eb08b719311d7d6.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The change to subr_boot.c broke the libsa build because the TSLOG
macros have their own definitions for the boot loader -- I didn't
realize that the loader code used subr_boot.c.
I'm currently testing a fix and I'll revert this revert once I'm
satisfied that everything works, but I don't want to leave the
tree broken for too long.
This reverts commit 469cfa3c30ee7a5ddeb597d0a8c3e7cac909b27a.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
SYSINIT cpu takes roughly 2770 us:
* 2280 us in vm_ksubmap_init
* 535 us in kmem_malloc
* 450 us in pmap_zero_page
* 1720 us in pmap_growkernel
* 1620 us in pmap_zero_page
* 80 us in bufinit
* 480 us in cpu_setregs
* 430 us in cpu_setregs calling load_cr0
Much of this is hypervisor overhead: load_cr0 is slow because it traps
to the hypervisor, and 99% of the time in pmap_zero_page is spent when
we first touch the page, presumably due to the host Linux kernel
faulting in backing pages one by one.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40327
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
pmap_zero_page is responsible for 4.6 ms of the 25.0 ms of boot time.
This is not in fact time spent zeroing pages though; almost all of
that time is spent in a first-touch penalty, presumably due to the
host Linux kernel faulting in backing pages one by one.
There's probably a way to improve that by teaching Firecracker to
fault in all the VM's pages from the start rather than having them
faulted in one at a time, but that's outside of FreeBSD's control.
This commit adds a TSLOG_PAGEZERO option which enables TSLOG on the
amd64 pmap_zero_page function; it's a separate option (turned off
by default even if TSLOG is enabled) since zeroing pages happens
enough that it can easily fill the TSLOG buffer and prevent other
timing information from being recorded.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40326
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
hammer_time takes roughly 2740 us:
* 55 us in xen_pvh_parse_preload_data
* 20 us in boot_parse_cmdline_delim
* 20 us in boot_env_to_howto
* 15 us in identify_hypervisor
* 1320 us in link_elf_reloc
* 1310 us in relocate_file1 handling ef->rela
* 25 us in init_param1
* 30 us in dpcpu_init
* 355 us in initializecpu
* 255 us in initializecpu calling load_cr4
* 425 us in getmemsize
* 280 us in pmap_bootstrap
* 205 us in create_pagetables
* 10 us in init_param2
* 25 us in pci_early_quirks
* 60 us in cninit
* 90 us in kdb_init
* 105 us in msgbufinit
* 20 us in fpuinit
* 205 us elsewhere in hammer_time
Some of these are unavoidable (e.g. identify_hypervisor uses CPUID and
load_cr4 loads the CR4 register, both of which trap to the hypervisor)
but others may deserve attention.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40325
|
|
|
|
|
| |
Reported by: peterj
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
| |
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D39920
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not preallocate pcpu area backing pages on early startup, only
allocate enough of KVA for pcpu[MAXCPU] and the page for BSP. Other
pages are allocated after we know the number of cpus and their
assignments to the domains.
PCPUs are not accessed until they are initialized, which happens on AP
startup.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39945
|
|
|
|
|
|
|
|
|
|
| |
The APs pcpu area is zeroed in init_secondary() by pcpu_init(), so the
early initialization in pmap_bootstrap() is nop.
Fixes: 42f722e721cd010ae5759a4b0d3b7b93c2b9cad2ESC
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39945
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change eliminates the struct pmap_pcid array embedded into struct
pmap and sized by MAXCPU, which would bloat with MAXCPU increase. Also
it removes false sharing of cache lines, since the array elements are
mostly locally accessed by corresponding CPUs.
Suggested by: mjg
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
|
|
|
|
|
|
|
|
|
| |
Cpuid is used to index the pmap->pm_pcids array only.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
|
|
|
|
|
|
|
| |
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
|
|
|
|
|
|
|
|
|
| |
to initialize pm_pcids array for a new user pmap
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
|
|
|
|
|
|
|
|
|
| |
and rename the structure to pmap_pcid.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39890
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact. Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.
For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().
Reported by: andrew
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39556
|
|
|
|
| |
Requested by: jhb
|
|
|
|
|
|
|
|
|
|
|
| |
In particular, do not enable the workaround if INVPCID is not supported
by the core.
Reported by: "Chen, Alvin W" <Weike.Chen@Dell.com>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37940
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If processor prefetches neighboring TLB entries to the one being accessed
(as some have been reported to do), then the spin lock does not prevent
the situation described in the "AMD64 Architecture Programmer's Manual
Volume 2: System Programming" rev. 3.23, "7.3.1 Special Coherency
Considerations".
Reported and reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37770
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A hypothetical CPU bug makes invalidation of global PTEs using INVLPG
in pcid mode unreliable, it seems. The workaround is applied for all
CPUs with small cores, since we do not know the scope of the issue, and
the right fix.
Reviewed by: alc (previous version)
Discussed with: emaste, markj
Tested by: karels
PR: 261169, 266145
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37770
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On amd64, don't abort promotion due to a missing accessed bit in a
mapping before possibly write protecting that mapping. Previously,
in some cases, we might not repromote after madvise(MADV_FREE) because
there was no write fault to trigger the repromotion. Conversely, on
arm64, don't pointlessly, yet harmlessly, write protect physical pages
that aren't part of the physical superpage.
Don't count aborted promotions due to explicit promotion prohibition
(arm64) or hardware errata (amd64) as ordinary promotion failures.
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36916
|
|
|
|
|
|
|
|
|
|
| |
Use it and several other vm_page_*_valid() functions in more places.
Suggested and reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D37024
|
|
|
|
|
|
|
| |
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D36919
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a <sys/_pv_entry.h> intended for use in <machine/pmap.h> to
define struct pv_entry, pv_chunk, and related macros and inline
functions.
Note that powerpc does not yet use this as while the mmu_radix pmap
in powerpc uses the new scheme (albeit with fewer PV entries in a
chunk than normal due to an used pv_pmap field in struct pv_entry),
the Book-E pmaps for powerpc use the older style PV entries without
chunks (and thus require the pv_pmap field).
Suggested by: kib
Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36685
|
|
|
|
|
|
| |
There is no such error code.
Fixes: 1d5ebad06c20b ("pmap: optimize MADV_WILLNEED on existing superpages")
|
|
|
|
|
|
|
|
|
|
| |
Specifically, avoid pointless calls to pmap_enter_quick_locked() when
madvise(MADV_WILLNEED) is applied to an existing superpage mapping.
Reported by: mhorne
Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36801
|
|
|
|
|
|
|
|
|
|
|
|
| |
This assertion can be triggered by usermode since vm_map_madvise()
doesn't force advice to be applied to an entire largepage mapping. I
can't see any reason not to permit it, however, since MADV_DONTNEED and
_FREE are advisory and we can simply do nothing when a 1GB mapping is
encountered.
Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36675
|
|
|
|
|
|
|
|
|
| |
This code path can be triggered by applying MADV_WILLNEED to a 1GB
mapping.
Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D36674
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
pmap_growkernel() may be called when mapping a region above KERNBASE,
typically for a kernel module. If we have enough PTPs left over from
bootstrap, pmap_growkernel() does nothing. However, it's possible to
run out, and in this case pmap_growkernel() will try to grow the kernel
map all the way from kernel_vm_end to somewhere past KERNBASE, which can
easily run the system out of memory. This happens with large kernel
modules such as the nvidia GPU driver. There is also a WIP dtrace
provider which needs to map KVA in the region above KERNBASE (to provide
trampolines which allow a copy of traced kernel instruction to be
executed), and its allocations could potentially trigger this scenario.
This change modifies pmap_growkernel() to manage the two regions
separately, allowing them to grow independently. The end of the
KERNBASE region is tracked by modifying "nkpt".
PR: 265019
Reviewed by: alc, imp, kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D36673
|
|
|
|
|
|
| |
Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36549
|
|
|
|
|
|
|
|
| |
This matches the return type of pmap_mapdev/bios.
Reviewed by: kib, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36548
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When attempting to promote 4KB user-space mappings to a 2MB user-space
mapping, the address of the struct vm_page representing the page table
page that contains the 4KB mappings is already known to the caller.
Pass that address to the promotion function rather than making the
promotion function recompute it, which on arm64 entails iteration over
the vm_phys_segs array by PHYS_TO_VM_PAGE(). And, while I'm here,
eliminate unnecessary arithmetic from the calculation of the first PTE's
address on arm64.
MFC after: 1 week
|
|
|
|
|
|
|
|
|
|
| |
This applies one of the changes from
5567d6b4419b02a2099527228b1a51cc55a5b47d to other architectures
besides arm64.
Reviewed by: kib
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D36263
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With clang 15, the following -Werror warning is produced:
sys/amd64/amd64/pmap.c:8274:22: error: variable 'freed' set but not used [-Werror,-Wunused-but-set-variable]
int allfree, field, freed, i, idx;
^
The 'freed' variable is only used when PV_STATS is defined. Ensure it is
only declared and set in that case.
MFC after: 3 days
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the kernel is compiled with -asan-stack=true, the address sanitizer
will emit inline accesses to the shadow map. In other words, some
shadow map accesses are not intercepted by the KASAN runtime, so they
cannot be disabled even if the runtime is not yet initialized by
kasan_init() at the end of hammer_time().
This went unnoticed because the loader will initialize all PML4 entries
of the bootstrap page table to point to the same PDP page, so early
shadow map accesses do not raise a page fault, though they are silently
corrupting memory. In fact, when the loader does not copy the staging
area, we do get a page fault since in that case only the first and last
PML4Es are populated by the loader. But due to another bug, the loader
always treated KASAN kernels as non-relocatable and thus always copied
the staging area.
It is not really practical to annotate hammer_time() and all callees
with __nosanitizeaddress, so instead add some early initialization which
creates a shadow for the boot stack used by hammer_time(). This is only
needed by KASAN, not by KMSAN, but the shared pmap code handles both.
Reported by: mhorne
Reviewed by: kib
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35449
|