aboutsummaryrefslogtreecommitdiff
path: root/sys/amd64/include/md_var.h
Commit message (Collapse)AuthorAgeFilesLines
* sys: Remove $FreeBSD$: two-line .h patternWarner Losh2023-08-161-2/+0
| | | | Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
* amd64: be more precise when enabling the AlderLake small core PCID workaroundKonstantin Belousov2023-01-051-0/+1
| | | | | | | | | | | In particular, do not enable the workaround if INVPCID is not supported by the core. Reported by: "Chen, Alvin W" <Weike.Chen@Dell.com> Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37940
* kasan: Create a shadow for the bootstack prior to hammer_time()Mark Johnston2022-06-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When the kernel is compiled with -asan-stack=true, the address sanitizer will emit inline accesses to the shadow map. In other words, some shadow map accesses are not intercepted by the KASAN runtime, so they cannot be disabled even if the runtime is not yet initialized by kasan_init() at the end of hammer_time(). This went unnoticed because the loader will initialize all PML4 entries of the bootstrap page table to point to the same PDP page, so early shadow map accesses do not raise a page fault, though they are silently corrupting memory. In fact, when the loader does not copy the staging area, we do get a page fault since in that case only the first and last PML4Es are populated by the loader. But due to another bug, the loader always treated KASAN kernels as non-relocatable and thus always copied the staging area. It is not really practical to annotate hammer_time() and all callees with __nosanitizeaddress, so instead add some early initialization which creates a shadow for the boot stack used by hammer_time(). This is only needed by KASAN, not by KMSAN, but the shared pmap code handles both. Reported by: mhorne Reviewed by: kib MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35449
* amd64: -m32 support for machine/md_var.hBrooks Davis2022-06-131-0/+6
| | | | | | | | | | Install the i386 md_var.h under /usr/include/i386 on amd64 and include when targeting i386. This is a mostly kernel-only header required by procstat's ZFS support. It is pulled in by the i386 machine/counter.h. Reviewed by: jhb, imp
* Unstaticize {get,set}_fpcontext() on amd64Edward Tomasz Napierala2022-01-041-0/+5
| | | | | | | This will be used to fix Linux signal delivery. Discussed With: kib Sponsored By: EPSRC
* amd64: do not assume that kernel is loaded at 2M physicalKonstantin Belousov2021-07-311-5/+2
| | | | | | | | | | | | | | | Allow any 2M aligned contiguous location below 4G for the staging area location. It should still be mapped by loader at KERNBASE. The assumption kernel makes about loader->kernel handoff with regard to the MMU programming are explicitly listed at the beginning of hammer_time(), where kernphys is calculated. Now kernphys is the variable instead of symbol designating the physical address. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31121
* amd64: make efi_boot globalKonstantin Belousov2021-07-241-0/+2
| | | | | | | Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31121
* amd64: retire sse2_pagezeroMateusz Guzik2021-01-301-1/+0
| | | | | | | | All page zeroing is using temporal stores with rep movs*, the routine is unused for several years. Should a need arise for zeroing using non-temporal stores, a more optimized variant can be implemented with a more descriptive name.
* amd64: Store full 64bit of FIP/FDP for 64bit processes when using XSAVE.Konstantin Belousov2020-10-031-0/+1
| | | | | | | | | | | | | | | | | If current process is 64bit, use rex-prefixed version of XSAVE (XSAVE64). If current process is 32bit and CPU supports saving segment registers cs/ds in the FPU save area, use non-prefixed variant of XSAVE. Reported and tested by: Michał Górny <mgorny@mgorny@moritz.systems> PR: 250043 Reviewed by: emaste, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26643 Notes: svn path=/head/; revision=366417
* Move ctx_switch_xsave declaration to amd64 md_var.h.Konstantin Belousov2020-10-031-0/+1
| | | | | | | | Sponsored by: The FreeBSD Foundation MFC after: 3 days Notes: svn path=/head/; revision=366415
* Move vm_page_dump bitset array definition to MI codeD Scott Phillips2020-09-211-1/+0
| | | | | | | | | | | | | | | | | | | These definitions were repeated by all architectures, with small variations. Consolidate the common definitons in machine independent code and use bitset(9) macros for manipulation. Many opportunities for deduplication remain in the machine dependent minidump logic. The only intended functional change is increasing the bit index type to vm_pindex_t, allowing the indexing of pages with address of 8 TiB and greater. Reviewed by: kib, markj Approved by: scottl (implicit) MFC after: 1 week Sponsored by: Ampere Computing, Inc. Differential Revision: https://reviews.freebsd.org/D26129 Notes: svn path=/head/; revision=365977
* amd64 pmap: LA57 AKA 5-level pagingKonstantin Belousov2020-08-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since LA57 was moved to the main SDM document with revision 072, it seems that we should have a support for it, and silicons are coming. This patch makes pmap support both LA48 and LA57 hardware. The selection of page table level is done at startup, kernel always receives control from loader with 4-level paging. It is not clear how UEFI spec would adapt LA57, for instance it could hand out control in LA57 mode sometimes. To switch from LA48 to LA57 requires turning off long mode, requesting LA57 in CR4, then re-entering long mode. This is somewhat delicate and done in pmap_bootstrap_la57(). AP startup in LA57 mode is much easier, we only need to toggle a bit in CR4 and load right value in CR3. I decided to not change kernel map for now. Single PML5 entry is created that points to the existing kernel_pml4 (KML4Phys) page, and a pml5 entry to create our recursive mapping for vtopte()/vtopde(). This decision is motivated by the fact that we cannot overcommit for KVA, so large space there is unusable until machines start providing wider physical memory addressing. Another reason is that I do not want to break our fragile autotuning, so the KVA expansion is not included into this first step. Nice side effect is that minidumps are compatible. On the other hand, (very) large address space is definitely immediately useful for some userspace applications. For userspace, numbering of pte entries (or page table pages) is always done for 5-level structures even if we operate in 4-level mode. The pmap_is_la57() function is added to report the mode of the specified pmap, this is done not to allow simultaneous 4-/5-levels (which is not allowed by hw), but to accomodate for EPT which has separate level control and in principle might not allow 5-leve EPT despite x86 paging supports it. Anyway, it does not seems critical to have 5-level EPT support now. Tested by: pho (LA48 hardware) Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25273 Notes: svn path=/head/; revision=364527
* amd64: move pcb out of kstack to struct thread.Konstantin Belousov2019-10-251-0/+1
| | | | | | | | | | | | | | | | | | | | This saves 320 bytes of the precious stack space. The only negative aspect of the change I can think of is that the struct thread increased by 320 bytes obviously, and that 320 bytes are not swapped out anymore. I believe the freed stack space is much more important than that. Also, current struct thread size is 1392 bytes on amd64, so UMA will allocate two thread structures per (4KB) slab, which leaves a space for pcb without increasing zone memory use. Reviewed by: alc, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D22138 Notes: svn path=/head/; revision=354095
* amd64: rework PCPU allocationKonstantin Belousov2019-08-241-0/+3
| | | | | | | | | | | | | | | | | | Move pcpu KVA out of .bss into dynamically allocated VA at pmap_bootstrap(). This avoids demoting superpage mapping .data/.bss. Also it makes possible to use pmap_qenter() for installation of domain-local pcpu page on NUMA configs. Refactor pcpu and IST initialization by moving it to helper functions. Reviewed by: markj Tested by: pho Discussed with: jeff Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21320 Notes: svn path=/head/; revision=351457
* Add pci_early function to detect Intel stolen memory.Konstantin Belousov2018-10-311-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some Intel devices BIOS does not properly reserve memory (called "stolen memory") for the GPU. If the stolen memory is claimed by the OS, functions that depend on stolen memory (like frame buffer compression) can't be used. A function called pci_early_quirks that is called before the virtual memory system is started was added. In Linux, this PCI early quirks function iterates through all PCI slots to check for any device that require quirks. While this more generic solution is preferable I only ported the Intel graphics specific parts because I think my implementation would be too similar to Linux GPL'd solution after looking at the Linux code too much. The code regarding Intel graphics stolen memory was ported from Linux. In the case of Intel graphics stolen memory this pci_early_quirks will read the stolen memory base and size from north bridge registers. The values are stored in global variables that is later read by linuxkpi_gplv2. Linuxkpi stores these values in a Linux-specific structure that is read by the drm driver. Relevant linuxkpi code is here: https://github.com/FreeBSDDesktop/kms-drm/blob/drm-v4.16/linuxkpi/gplv2/src/linux_compat.c#L37 For now, only amd64 arch is suppor ted since that is the only arch supported by the new drm drivers. I was told that Intel GPUs are always located on 0:2:0 so these values are hard coded for now. Note that the structure and early execution of the detection code is not required in its current form, but we expect that the code will be added shortly which fixes the potential BIOS bugs by reserving the stolen range in phys_avail[]. This must be done as early as possible to avoid conflicts with the potential usage of the memory in kernel. Submitted by: Johannes Lundberg <johalun0@gmail.com> Reviewed by: bwidawsk, imp MFC after: 1 week Differential revision: https://reviews.freebsd.org/D16719 Differential revision: https://reviews.freebsd.org/D17775 Notes: svn path=/head/; revision=339979
* amd64: flush L1 data cache on syscall return with an error.Konstantin Belousov2018-10-201-0/+4
| | | | | | | | | | | | | | | | | The knob allows to select the flushing mode or turn it off/on. The idea, as well as the list of the ignored syscall errors, were taken from https://www.openwall.com/lists/kernel-hardening/2018/10/11/10 . I was not able to measure statistically significant difference between flush enabled vs disabled using syscall_timing getuid. Reviewed by: bwidawsk Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17536 Notes: svn path=/head/; revision=339507
* Update L1TF workaround to sustain L1D pollution from NMI.Konstantin Belousov2018-08-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Current mitigation for L1TF in bhyve flushes L1D either by an explicit WRMSR command, or by software reading enough uninteresting data to fully populate all lines of L1D. If NMI occurs after either of methods is completed, but before VM entry, L1D becomes polluted with the cache lines touched by NMI handlers. There is no interesting data which NMI accesses, but something sensitive might be co-located on the same cache line, and then L1TF exposes that to a rogue guest. Use VM entry MSR load list to ensure atomicity of L1D cache and VM entry if updated microcode was loaded. If only software flush method is available, try to help the bhyve sw flusher by also flushing L1D on NMI exit to kernel mode. Suggested by and discussed with: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16790 Notes: svn path=/head/; revision=338068
* Add Intel Spec Store Bypass Disable control.Konstantin Belousov2018-05-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Speculative Store Bypass (SSB) is a speculative execution side channel vulnerability identified by Jann Horn of Google Project Zero (GPZ) and Ken Johnson of the Microsoft Security Response Center (MSRC) https://bugs.chromium.org/p/project-zero/issues/detail?id=1528. Updated Intel microcode introduces a MSR bit to disable SSB as a mitigation for the vulnerability. Introduce a sysctl hw.spec_store_bypass_disable to provide global control over the SSBD bit, akin to the existing sysctl that controls IBRS. The sysctl can be set to one of three values: 0: off 1: on 2: auto Future work will enable applications to control SSBD on a per-process basis (when it is not enabled globally). SSBD bit detection and control was verified with prerelease microcode. Security: CVE-2018-3639 Tested by: emaste (previous version, without updated microcode) Sponsored by: The FreeBSD Foundation MFC after: 3 days Notes: svn path=/head/; revision=334005
* Move the CR0.WP manipulation KPI to x86.Konstantin Belousov2018-03-201-2/+0
| | | | | | | | | | | This should allow to avoid some #ifdefs in the common x86/ code. Requested by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=331258
* Provide KPI for handling of rw/ro kernel text.Konstantin Belousov2018-03-201-0/+2
| | | | | | | | | | | | | | | | This is a pure syntax patch to create an interface to enable and later restore write access to the kernel text and other read-only mapped regions. It is in line with e.g. vm_fault_disable_pagefaults() by allowing the nesting. Discussed with: Peter Lei <peter.lei@ieee.org> Reviewed by: jtl Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14768 Notes: svn path=/head/; revision=331252
* IBRS support, AKA Spectre hardware mitigation.Konstantin Belousov2018-01-311-0/+1
| | | | | | | | | | | | | | | | | | | It is coded according to the Intel document 336996-001, reading of the patches posted on lkml, and some additional consultations with Intel. For existing processors, you need a microcode update which adds IBRS CPU features, and to manually enable it by setting the tunable/sysctl hw.ibrs_disable to 0. Current status can be checked in sysctl hw.ibrs_active. The mitigation might be inactive if the CPU feature is not patched in, or if CPU reports that IBRS use is not required, by IA32_ARCH_CAP_IBRS_ALL bit. Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D14029 Notes: svn path=/head/; revision=328625
* Move the kernphys declaration to machine/md_var.h.Konstantin Belousov2018-01-181-0/+6
| | | | | | | | | | | | Apparently machinde/cpu.h is supposed to contain MD implementations of MI interfaces. Also, remove kernphys declaration from machdep.c, since it is already provided by md_var.h. Requested and reviewed by: bde MFC after: 13 days Notes: svn path=/head/; revision=328128
* Move the hardware setup for fast syscalls into a common function.Konstantin Belousov2018-01-111-0/+1
| | | | | | | | | Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=327818
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Lower the amd64 shared page, which contains the signal trampoline,Don Lewis2017-08-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | from the top of user memory to one page lower on machines with the Ryzen (AMD Family 17h) CPU. This pushes ps_strings and the stack down by one page as well. On Ryzen there is some sort of interaction between code running at the top of user memory address space and interrupts that can cause FreeBSD to either hang or silently reset. This sounds similar to the problem found with DragonFly BSD that was fixed with this commit: https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20 but our signal trampoline location was already lower than the address that DragonFly moved their signal trampoline to. It also does not appear to be related to SMT as described here: https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads?p=955498#post955498 "Hi, Matt Dillon here. Yes, I did find what I believe to be a hardware issue with Ryzen related to concurrent operations. In a nutshell, for any given hyperthread pair, if one hyperthread is in a cpu-bound loop of any kind (can be in user mode), and the other hyperthread is returning from an interrupt via IRETQ, the hyperthread issuing the IRETQ can stall indefinitely until the other hyperthread with the cpu-bound loop pauses (aka HLT until next interrupt). After this situation occurs, the system appears to destabilize. The situation does not occur if the cpu-bound loop is on a different core than the core doing the IRETQ. The %rip the IRETQ returns to (e.g. userland %rip address) matters a *LOT*. The problem occurs more often with high %rip addresses such as near the top of the user stack, which is where DragonFly's signal trampoline traditionally resides. So a user program taking a signal on one thread while another thread is cpu-bound can cause this behavior. Changing the location of the signal trampoline makes it more difficult to reproduce the problem. I have not been because the able to completely mitigate it. When a cpu-thread stalls in this manner it appears to stall INSIDE the microcode for IRETQ. It doesn't make it to the return pc, and the cpu thread cannot take any IPIs or other hardware interrupts while in this state." since the system instability has been observed on FreeBSD with SMT disabled. Interrupts to appear to play a factor since running a signal-intensive process on the first CPU core, which handles most of the interrupts on my machine, is far more likely to trigger the problem than running such a process on any other core. Also lower sv_maxuser to prevent a malicious user from using mmap() to load and execute code in the top page of user memory that was made available when the shared page was moved down. Make the same changes to the 64-bit Linux emulator. PR: 219399 Reported by: nbe@renzel.net Reviewed by: kib Reviewed by: dchagin (previous version) Tested by: nbe@renzel.net (earlier version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11780 Notes: svn path=/head/; revision=321899
* On amd64, declare sse2_pagezero() and start using it again, but onlyBruce Evans2016-08-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for zeroing pages in idle where nontemporal writes are clearly best. This is almost a no-op since zeroing in idle works does nothing good and is off by default. Fix END() statement forgotten in previous commit. Align the loop in sse2_pagezero(). Since it writes to main memory, the loop doesn't have to be very carefully written to keep up. Unrolling it was considered useless or harmful and was not done on i386, but that was too careless. Timing for i386: the loop was not unrolled at all, and moved only 4 bytes/iteration. So on a 2GHz CPU, it needed to run at 2 cycles/ iteration to keep up with a memory speed of just 4GB/sec. But when it crossed a 16-byte boundary, on old CPUs it ran at 3 cycles/ iteration so it gave a maximum speed of 2.67GB/sec and couldn't even keep up with PC3200 memory. Fix the alignment so that it keep up with 4GB/sec memory, and unroll once to get nearer to 8GB/sec. Further unrolling might be useless or harmful since it would prevent the loop fitting in 16-bytes. My test system with an old CPU and old DDR1 only needed 5+ GB/sec. My test system with a new CPU and DDR3 doesn't need any changes to keep up ~16GB/sec. Timing for amd64: with 8-byte accesses and newer faster CPUs it is easy to reach 16GB/sec but not so easy to go much faster. The alignment doesn't matter much if the CPU is not very old. The loop was already unrolled 4 times, but needs 32 bytes and uses a fancy method that doesn't work for 2-way unrolling in 16 bytes. Just align it to 32-bytes. Notes: svn path=/head/; revision=305004
* Type of the interrupt handlers on x86 cannot be expressed in C.Konstantin Belousov2016-03-291-3/+0
| | | | | | | | | | | Simplify and unify placeholder type definitions. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D5771 Notes: svn path=/head/; revision=297399
* Remove redundant ctx_switch_xsave declaration in sys/amd64/include/md_var.hEnji Cooper2015-12-221-1/+0
| | | | | | | | | | | | | | | This variable was added to sys/x86/include/x86_var.h recently. This unbreaks building kernel source that #includes both md_var.h and x86_var.h with gcc 4.2.1 on amd64 Differential Revision: https://reviews.freebsd.org/D4686 Reviewed by: kib X-MFC with: r291949 Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=292619
* Merge common parts of i386 and amd64 md_var.h and smp.h intoKonstantin Belousov2015-12-071-72/+5
| | | | | | | | | | | new headers x86/include x86_var.h and x86_smp.h. Reviewed by: emaste, jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D4358 Notes: svn path=/head/; revision=291949
* Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUIDKonstantin Belousov2015-08-031-0/+1
| | | | | | | | | | | | | | | | | reported, on APs. We already did this on BSP. Otherwise, the userspace software which depends on the features reported by the high CPUID levels is misbehaving. In particular, AVX detection is non-functional, depending on which CPU thread happens to execute when doing CPUID. Another victim is the libthr signal handlers interposer, which needs to save full FPU extended state. Reported and tested by: Andre Meiser <ortadur@web.de> Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=286228
* Update print_INTEL_TLB() by the tag values from the Intel SDMKonstantin Belousov2015-06-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rev. 55. The modern CPUs cache and TLB descriptions looked quite questionable without the update, e.g. Haswell i7 4770S reported: Data TLB: 4 KB pages, 4-way set associative, 64 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line After the update, the report is: Data TLB: 1 GByte pages, 4-way set associative, 4 entries Data TLB: 4 KB pages, 4-way set associative, 64 entries Instruction TLB: 2M/4M pages, fully associative, 8 entries Instruction TLB: 4KByte pages, 8-way set associative, 64 entries 64-Byte prefetching Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries L2 cache: 256 kbytes, 8-way associative, 64 bytes/line Some tags were apparently removed from the table 3-21, Vol. 2A. Keep them around, but add a comment stating the removal. Update the format line for cpu_stdext_feature according to the bits from the SDM rev.55. It appears that Haswells do not store %cs and %ds values in the FPU save area. Store content of the %ecx register from the CPUID leaf 0x7 subleaf 0 as cpu_stdext_feature2 and print defined bits from it, again acording to SDM rev. 55. Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=284104
* If x86 CPU implementation of the MWAIT instruction reasonablyKonstantin Belousov2015-05-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | interacts with interrupts, query ACPI and use MWAIT for entrance into Cx sleep states. Support C1 "I/O then halt" mode. See Intel' document 302223-007 "Intelб╝ Processor Vendor-Specific ACPI Interface Specification" for description. Move the acpi_cpu_c1() function into x86/cpu_machdep.c and use it instead of inlining "sti; hlt" sequence in several places. In the acpi(4) man page, besides documenting the dev.cpu.N.cx_methods sysctl, correct the names for dev.cpu.N.{cx_usage,cx_lowest,cx_supported} sysctls. Both jkim and avg have some other patches implementing the mwait functionality; this work is unrelated. Linux does not rely on the ACPI to provide correct tables describing Cx modes. Instead, the driver has pre-defined knowledge of the CPU models, it was supplied by Intel. Tested by: pho (previous versions) Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=282678
* Move some common code from sys/amd64/amd64/machdep.c andKonstantin Belousov2015-04-221-0/+1
| | | | | | | | | | | sys/i386/i386/machdep.c to new file sys/x86/x86/cpu_machdep.c. Most of the code is related to the idle handling. Discussed with: pluknet Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=281851
* For x86, read MAXPHYADDR, defined in SDM vol 3 4.1.4 Enumeration of PagingKonstantin Belousov2015-01-121-0/+1
| | | | | | | | | | | | Features by CPUID as CPUID.80000008H:EAX[7:0], into variable cpu_maxphyaddr. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=277047
* Rework virtual machine hypervisor detection.John Baldwin2014-10-281-0/+2
| | | | | | | | | | | | | | | | | | - Move the existing code to x86/x86/identcpu.c since it is x86-specific. - If the CPUID2_HV flag is set, assume a hypervisor is present and query the 0x40000000 leaf to determine the hypervisor vendor ID. Export the vendor ID and the highest supported hypervisor CPUID leaf via hv_vendor[] and hv_high variables, respectively. The hv_vendor[] array is also exported via the hw.hv_vendor sysctl. - Merge the VMWare detection code from tsc.c into the new probe in identcpu.c. Add a VM_GUEST_VMWARE to identify vmware and use that in the TSC code to identify VMWare. Differential Revision: https://reviews.freebsd.org/D1010 Reviewed by: delphij, jkim, neel Notes: svn path=/head/; revision=273800
* Pass up the error status of minidumpsys() to its callers.Mark Johnston2014-10-081-1/+1
| | | | | | | | | PR: 193761 Submitted by: Conrad Meyer <conrad.meyer@isilon.com> Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=272766
* - Move prototypes for various functions into out of C files and intoJohn Baldwin2014-09-041-0/+3
| | | | | | | | | | | | | <machine/md_var.h>. - Move some CPU-related variables out of i386/i386/identcpu.c to initcpu.c to match amd64. - Move the declaration of has_f00f_hack out of identcpu.c to machdep.c. - Remove a misleading comment from i386/i386/initcpu.c (locore zeros the BSS before it calls identify_cpu()) and remove explicit zero assignments to reduce the diff with amd64. Notes: svn path=/head/; revision=271076
* Move fpusave() wrapper for suspend hander to sys/amd64/amd64/fpu.c.Jung-uk Kim2014-03-041-1/+0
| | | | | | | Inspired by: jhb Notes: svn path=/head/; revision=262752
* x86: detect mwait capabilities and extensions, when presentAndriy Gapon2013-07-281-0/+3
| | | | | | | | Reviewed by: kib (earlier amd64-only version) MFC after: 2 weeks Notes: svn path=/head/; revision=253747
* Fix the hardware watchpoints on SMP amd64. Load the updated %drKonstantin Belousov2013-05-211-0/+1
| | | | | | | | | | | | | | registers also on other CPUs, besides the CPU which happens to execute the ddb. The debugging registers are stored in the pcpu area, together with the command which is executed by the IPI stop handler upon resume. Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=250851
* Provide the reading and display of the Standard Extended Features,Konstantin Belousov2012-11-011-0/+1
| | | | | | | | | | | introduced with the IvyBridge CPUs. Provide the definitions for new bits in CR3 and CR4 registers. Tested by: avg, Michael Moll <kvedulv@kvedulv.de> MFC after: 2 weeks Notes: svn path=/head/; revision=242432
* Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usageKonstantin Belousov2012-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out. The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise. For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd. MFC after: 1 month Notes: svn path=/head/; revision=238450
* Add support for the extended FPU states on amd64, both for nativeKonstantin Belousov2012-01-211-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 64bit and 32bit ABIs. As a side-effect, it enables AVX on capable CPUs. In particular: - Query the CPU support for XSAVE, list of the supported extensions and the required size of FPU save area. The hw.use_xsave tunable is provided for disabling XSAVE, and hw.xsave_mask may be used to select the enabled extensions. - Remove the FPU save area from PCB and dynamically allocate the (run-time sized) user save area on the top of the kernel stack, right above the PCB. Reorganize the thread0 PCB initialization to postpone it after BSP is queried for save area size. - The dumppcb, stoppcbs and susppcbs now do not carry the FPU state as well. FPU state is only useful for suspend, where it is saved in dynamically allocated suspfpusave area. - Use XSAVE and XRSTOR to save/restore FPU state, if supported and enabled. - Define new mcontext_t flag _MC_HASFPXSTATE, indicating that mcontext_t has a valid pointer to out-of-struct extended FPU state. Signal handlers are supplied with stack-allocated fpu state. The sigreturn(2) and setcontext(2) syscall honour the flag, allowing the signal handlers to inspect and manipilate extended state in the interrupted context. - The getcontext(2) never returns extended state, since there is no place in the fixed-sized mcontext_t to place variable-sized save area. And, since mcontext_t is embedded into ucontext_t, makes it impossible to fix in a reasonable way. Instead of extending getcontext(2) syscall, provide a sysarch(2) facility to query extended FPU state. - Add ptrace(2) support for getting and setting extended state; while there, implement missed PT_I386_{GET,SET}XMMREGS for 32bit binaries. - Change fpu_kern KPI to not expose struct fpu_kern_ctx layout to consumers, making it opaque. Internally, struct fpu_kern_ctx now contains a space for the extended state. Convert in-kernel consumers of fpu_kern KPI both on i386 and amd64. First version of the support for AVX was submitted by Tim Bird <tim.bird am sony com> on behalf of Sony. This version was written from scratch. Tested by: pho (previous version), Yamagi Burmeister <lists yamagi org> MFC after: 1 month Notes: svn path=/head/; revision=230426
* Put amd64_syscall() prototype in md_var.h.Konstantin Belousov2011-09-151-0/+1
| | | | | | | | | | Requested by: jhb Reviewed by: alc, jhb Approved by: re (bz) MFC after: 2 weeks Notes: svn path=/head/; revision=225576
* Handle a case when non-canonical address is loaded into the fsbase orKonstantin Belousov2010-04-101-0/+4
| | | | | | | | | gsbase MSR. MFC after: 3 days Notes: svn path=/head/; revision=206459
* Implement AMD's recommended workaround for Erratum 383 on Family 10hAlan Cox2010-03-091-0/+1
| | | | | | | | | | | | | | | | processors. With this workaround, superpage promotion can be re-enabled under virtualization. Moreover, machine check exceptions can safely be enabled when FreeBSD is running natively on Family 10h processors. Most of the credit should go to Andriy Gapon for diagnosing the error and working with Borislav Petkov at AMD to document it. Andriy also reviewed and tested my patches. Discussed with: jhb MFC after: 3 weeks Notes: svn path=/head/; revision=204907
* Amd64 init_secondary() calls initializecpu() while curthread is stillKonstantin Belousov2009-11-131-0/+1
| | | | | | | | | | | | | | | | | | | not properly set up. r199067 added the call to TUNABLE_INT_FETCH() to initializecpu() that results in hang because AP are started when kernel environment is already dynamic and thus needs to acquire mutex, that is too early in AP start sequence to work. Extract the code that should be executed only once, because it sets up global variables, from initializecpu() to initializecpucache(), and call the later only from hammer_time() executed on BSP. Now, TUNABLE_INT_FETCH() is done only once at BSP at the early boot stage. In collaboration with: Mykola Dzham <freebsd levsha org ua> Reviewed by: jhb Tested by: ed, battlez Notes: svn path=/head/; revision=199253
* When the page caching attributes are changed, after new mapping isKonstantin Belousov2009-07-221-0/+1
| | | | | | | | | | | | | | | | | | established, OS shall flush the caches on all processors that may have used the mapping previously. This operation is not needed if processors support self-snooping. If not, but clflush instruction is implemented on the CPU, series of the clflush can be used on the mapping region. Otherwise, we have to flush the whole cache. The later operation is very expensive, and AMD-made CPUs do not have self-snooping. Implement cache flush for remapped region by using clflush for amd64, when supported by CPU. Proposed and reviewed by: alc Approved by: re (kensmith) Notes: svn path=/head/; revision=195820
* Save and restore segment registers on amd64 when entering and leavingKonstantin Belousov2009-04-011-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the kernel on amd64. Fill and read segment registers for mcontext and signals. Handle traps caused by restoration of the invalidated selectors. Implement user-mode creation and manipulation of the process-specific LDT descriptors for amd64, see sysarch(2). Implement support for TSS i/o port access permission bitmap for amd64. Context-switch LDT and TSS. Do not save and restore segment registers on the context switch, that is handled by kernel enter/leave trampolines now. Remove segment restore code from the signal trampolines for freebsd/amd64, freebsd/ia32 and linux/i386 for the same reason. Implement amd64-specific compat shims for sysarch. Linuxolator (temporary ?) switched to use gsbase for thread_area pointer. TODO: Currently, gdb is not adapted to show segment registers from struct reg. Also, no machine-depended ptrace command is added to set segment registers for debugged process. In collaboration with: pho Discussed with: peter Reviewed by: jhb Linuxolator tested by: dchagin Notes: svn path=/head/; revision=190620
* Add basic amd64 support for VIA Nano processors.Jung-uk Kim2009-01-121-0/+2
| | | | Notes: svn path=/head/; revision=187109