aboutsummaryrefslogtreecommitdiff
path: root/sys/amd64/amd64/trap.c
Commit message (Collapse)AuthorAgeFilesLines
* Get rid of sa->narg. It serves no purpose; use sa->callp->sy_narg instead.Edward Tomasz Napierala2020-09-271-7/+7
| | | | | | | | | Reviewed by: kib Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D26458 Notes: svn path=/head/; revision=366205
* amd64: clean up empty lines in .c and .h filesMateusz Guzik2020-09-011-1/+0
| | | | Notes: svn path=/head/; revision=365067
* Restore workaround for sysret fault on non-canonical address after LA57.Konstantin Belousov2020-08-241-1/+2
| | | | | | | Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=364734
* Untie nmi_handle_intr() from DEV_ISA.Alexander Motin2020-07-221-4/+0
| | | | | | | | | | The only part of nmi_handle_intr() depending on ISA is isa_nmi(), which is already wrapped. Entering debugger on NMI does not really depend on ISA. MFC after: 2 weeks Notes: svn path=/head/; revision=363431
* Retire procfs-based process debugging.John Baldwin2020-04-011-1/+0
| | | | | | | | | | | | | | | | | | Modern debuggers and process tracers use ptrace() rather than procfs for debugging. ptrace() has a supserset of functionality available via procfs and new debugging features are only added to ptrace(). While the two debugging services share some fields in struct proc, they each use dedicated fields and separate code. This results in extra complexity to support a feature that hasn't been enabled in the default install for several years. PR: 244939 (exp-run) Reviewed by: kib, mjg (earlier version) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D23837 Notes: svn path=/head/; revision=359530
* amd64: remove no longer needed atomic_load_ptr castsMateusz Guzik2020-02-141-2/+2
| | | | Notes: svn path=/head/; revision=357942
* amd64: only check for error != 0 in the inlined part of l1d flush checkMateusz Guzik2020-02-141-8/+15
| | | | | | | | | | | | | | | | this replaces the following near the syscall exit: cmp $0x39,%rax ja 0xffffffff8108f82c movabs $0x200001800060005,%rcx bt %rax,%rcx jae 0xffffffff8108f82c with: test %edi,%edi jne 0xffffffff8091a49c Notes: svn path=/head/; revision=357914
* amd64: remove redundant sa->code assignment from cpu_fetch_syscall_args_fallbackMateusz Guzik2020-02-111-2/+0
| | | | | | | It is already set in the only caller. Notes: svn path=/head/; revision=357767
* Reimplement stack capture of running threads on i386 and amd64.Mark Johnston2020-01-311-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | After r355784 the td_oncpu field is no longer synchronized by the thread lock, so the stack capture interrupt cannot be delievered precisely. Fix this using a loop which drops the thread lock and restarts if the wrong thread was sampled from the stack capture interrupt handler. Change the implementation to use a regular interrupt instead of an NMI. Now that we drop the thread lock, there is no advantage to the latter. Simplify the KPIs. Remove stack_save_td_running() and add a return value to stack_save_td(). On platforms that do not support stack capture of running threads, stack_save_td() returns EOPNOTSUPP. If the target thread is running in user mode, stack_save_td() returns EBUSY. Reviewed by: kib Reported by: mjg, pho Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23355 Notes: svn path=/head/; revision=357334
* amd64: move GDT into PCPU area.Konstantin Belousov2019-11-121-2/+3
| | | | | | | | | | | Reviewed by: jhb, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D22302 Notes: svn path=/head/; revision=354646
* Improve MD page fault handlers.Konstantin Belousov2019-09-271-62/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Centralize calculation of signal and ucode delivered on unhandled page fault in new function vm_fault_trap(). MD trap_pfault() now almost always uses the signal numbers and error codes calculated in consistent MI way. This introduces the protection fault compatibility sysctls to all non-x86 architectures which did not have that bug, but apparently they were already much more wrong in selecting delivered signals on protection violations. Change the delivered signal for accesses to mapped area after the backing object was truncated. According to POSIX description for mmap(2): The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal. An implementation may generate SIGBUS signals when a reference would cause an error in the mapped object, such as out-of-space condition. Adjust according to the description, keeping the existing compatibility code for SIGSEGV/SIGBUS on protection failures. For situations where kernel cannot handle page fault due to resource limit enforcement, SIGBUS with a new error code BUS_OBJERR is delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on amd64 due to protection key access violation. vm_fault_hold() is renamed to vm_fault(). Fixed some nits in trap_pfault()s like mis-interpreting Mach errors as errnos. Removed unneeded truncations of the fault addresses reported by hardware. PR: 211924 Reviewed by: alc Discussed with: jilles, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21566 Notes: svn path=/head/; revision=352807
* Don't pass error from syscallenter() to syscallret().John Baldwin2019-07-151-4/+3
| | | | | | | | | | | | | | | syscallret() doesn't use error anymore. Fix a few other places to permit removing the return value from syscallenter() entirely. - Remove a duplicated assertion from arm's syscall(). - Use td_errno for amd64_syscall_ret_flush_l1d. Reviewed by: kib MFC after: 1 month Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D2090 Notes: svn path=/head/; revision=350013
* Make trap_msg array constant as well.Konstantin Belousov2019-06-081-1/+1
| | | | | | | | | Suggested by: tijl Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=348813
* amd64 trap.c: Modernize syntax around trap_msg[].Konstantin Belousov2019-06-081-45/+39
| | | | | | | | | | | | Convert the array to use C99 initializers. Make it constant. Replace MAX_TRAP_MSG with nitems(). Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=348798
* Fix a race between fasttrap and the user breakpoint handler.Mark Johnston2019-06-061-6/+27
| | | | | | | | | | | | | | | | | | | | | When disabling the last enabled userspace probe, fasttrap clears the function pointers which hook in to the breakpoint handler. If a traced thread hit a fasttrap breakpoint before it was removed, we must ensure that it is able to call the hook; otherwise fasttrap will not consume the trap and SIGTRAP will be delievered to the thread. Synchronize with such threads by ensuring that they load the hook pointer with interrupts disabled, and by completing an SMP rendezvous after removing breakpoints and before clearing the pointers. Reported by: Alexander Alexeev <Alexander.Alexeev@dell.com> Tested by: Alexander Alexeev (earlier version) Reviewed by: cem, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20526 Notes: svn path=/head/; revision=348742
* amd64 pmap: rework delayed invalidation, removing global mutex.Konstantin Belousov2019-05-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For machines having cmpxcgh16b instruction, i.e. everything but very early Athlons, provide lockless implementation of delayed invalidation. The implementation maintains lock-less single-linked list with the trick from the T.L. Harris article about volatile mark of the elements being removed. Double-CAS is used to atomically update both link and generation. New thread starting DI appends itself to the end of the queue, setting the generation to the generation of the last element +1. On DI finish, thread donates its generation to the previous element. The generation of the fake head of the list is the last passed DI generation. Basically, the implementation is a queued spinlock but without spinlock. Many thanks both to Peter Holm and Mark Johnson for keeping with me while I produced intermediate versions of the patch. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 month MFC note: td_md.md_invl_gen should go to the end of struct thread Differential revision: https://reviews.freebsd.org/D19630 Notes: svn path=/head/; revision=347695
* Fix formatting.Mark Johnston2019-05-141-2/+2
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=347564
* Add kernel support for Intel userspace protection keys feature onKonstantin Belousov2019-02-201-0/+15
| | | | | | | | | | | | | | | | Skylake Xeons. See SDM rev. 68 Vol 3 4.6.2 Protection Keys and the description of the RDPKRU and WRPKRU instructions. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D18893 Notes: svn path=/head/; revision=344353
* amd64: add defines and decode protection keys and SGX page faults reasons.Konstantin Belousov2019-02-201-1/+3
| | | | | | | | | | | Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D18893 Notes: svn path=/head/; revision=344352
* Remove iBCS2, part2: general kernelMateusz Guzik2018-12-191-5/+0
| | | | | | | | Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=342243
* Don't enter DDB for fatal traps before panic by default.John Baldwin2018-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | Add a new 'debugger_on_trap' knob separate from 'debugger_on_panic' and make the calls to kdb_trap() in MD fatal trap handlers prior to calling panic() conditional on this new knob instead of 'debugger_on_panic'. Disable the new knob by default. Developers who wish to recover from a fatal fault by adjusting saved register state and retrying the faulting instruction can still do so by enabling the new knob. However, for the more common case this makes the user experience for panics due to a fatal fault match the user experience for other panics, e.g. 'c' in DDB will generate a crash dump and reboot the system rather than being stuck in an infinite loop of fatal fault messages and DDB prompts. Reviewed by: kib, avg MFC after: 2 months Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D17768 Notes: svn path=/head/; revision=340020
* amd64: flush L1 data cache on syscall return with an error.Konstantin Belousov2018-10-201-0/+80
| | | | | | | | | | | | | | | | | The knob allows to select the flushing mode or turn it off/on. The idea, as well as the list of the ignored syscall errors, were taken from https://www.openwall.com/lists/kernel-hardening/2018/10/11/10 . I was not able to measure statistically significant difference between flush enabled vs disabled using syscall_timing getuid. Reviewed by: bwidawsk Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D17536 Notes: svn path=/head/; revision=339507
* amd64: partially depessimize cpu_fetch_syscall_args and cpu_set_syscall_retvalMateusz Guzik2018-10-131-14/+46
| | | | | | | | | | | | | | | | | | Vast majority of syscalls take 6 or less arguments. Move handling of other cases to a fallback function. Similarly, special casing for _syscall and __syscall magic syscalls is moved away. Return is almost always 0. The change replaces 3 branches with 1 in the common case. Also the 'frame' variable convinces clang not to reload it on each access. Reviewed by: kib Approved by: re (gjb) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17542 Notes: svn path=/head/; revision=339349
* Don't clear DR6 for debug exceptions from userland.John Baldwin2018-09-271-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | This reverts part of r333368. The attempt to clear DR6 was occuring too soon as trapsignal() does not pause to let the debugger notice the SIGTRAP and query DR6. The signal exchange does not occur until much later during ast(). As a result, GDB was no longer recognizing hardware breakpoints and watchpoints on x86. In addition, any userland programs that want to inspect DR6 in a SIGTRAP handler don't have a way to do this if we clear DR6 in the exception handler. Instead of relying on the kernel to clear DR6, debuggers will have to explicitly clear it after a trace trap (which they needed to do on older kernels anyway). Reviewed by: kib Approved by: re (delphij) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D17319 Notes: svn path=/head/; revision=338976
* Make the PTI violation check to follow style of the SMAP check.Konstantin Belousov2018-09-171-5/+12
| | | | | | | | | | | | | No functional changes. Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (rgrimes) Differential revision: https://reviews.freebsd.org/D17181 Notes: svn path=/head/; revision=338711
* Remove unneeded new line from the panic string.Konstantin Belousov2018-09-161-1/+1
| | | | | | | | | | | Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (rgrimes) Differential revision: https://reviews.freebsd.org/D17181 Notes: svn path=/head/; revision=338699
* Swap order of dererencing PCPU curpmap and checking for usermode inKonstantin Belousov2018-09-021-1/+1
| | | | | | | | | | | | | | | | | | trap_pfault() KPTI violation check. EFI RT may set curpmap to NULL for the duration of the call for some machines (PCID but no INVPCID). Since apparently EFI RT code must be ready for exceptions from the calls, avoid dereferencing curpmap until we know that this call does not come from usermode. Reviewed by: kevans Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (rgrimes) Differential revision: https://reviews.freebsd.org/D16972 Notes: svn path=/head/; revision=338434
* Update L1TF workaround to sustain L1D pollution from NMI.Konstantin Belousov2018-08-191-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | Current mitigation for L1TF in bhyve flushes L1D either by an explicit WRMSR command, or by software reading enough uninteresting data to fully populate all lines of L1D. If NMI occurs after either of methods is completed, but before VM entry, L1D becomes polluted with the cache lines touched by NMI handlers. There is no interesting data which NMI accesses, but something sensitive might be co-located on the same cache line, and then L1TF exposes that to a rogue guest. Use VM entry MSR load list to ensure atomicity of L1D cache and VM entry if updated microcode was loaded. If only software flush method is available, try to help the bhyve sw flusher by also flushing L1D on NMI exit to kernel mode. Suggested by and discussed with: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed by: jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D16790 Notes: svn path=/head/; revision=338068
* Use SMAP on amd64.Konstantin Belousov2018-07-291-1/+23
| | | | | | | | | | | | | | Ifuncs selectors dispatch copyin(9) family to the suitable variant, to set rflags.AC around userspace access. Rflags.AC bit is cleared in all kernel entry points unconditionally even on machines not supporting SMAP. Reviewed by: jhb Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D13838 Notes: svn path=/head/; revision=336876
* Don't bother looking for non-executable pages when a process isTycho Nightingale2018-06-081-1/+2
| | | | | | | | | | | excluded from PTI. Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15708 Notes: svn path=/head/; revision=334856
* hwpmc: simplify calling convention for hwpmc interrupt handlingMatt Macy2018-06-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | pmc_process_interrupt takes 5 arguments when only 3 are needed. cpu is always available in curcpu and inuserspace can always be derived from the passed trapframe. While facially a reasonable cleanup this change was motivated by the need to workaround a compiler bug. core2_intr(cpu, tf) -> pmc_process_interrupt(cpu, ring, pmc, tf, inuserspace) -> pmc_add_sample(cpu, ring, pm, tf, inuserspace) In the process of optimizing the tail call the tf pointer was getting clobbered: (kgdb) up at /storage/mmacy/devel/freebsd/sys/dev/hwpmc/hwpmc_mod.c:4709 4709 pmc_save_kernel_callchain(ps->ps_pc, (kgdb) up 1205 error = pmc_process_interrupt(cpu, PMC_HR, pm, tf, resulting in a crash in pmc_save_kernel_callchain. Notes: svn path=/head/; revision=334827
* x86: stop unconditionally clearing PSL_T on the trace trap.Konstantin Belousov2018-05-231-2/+8
| | | | | | | | | | | | | | | | | | | | | | We certainly should clear PSL_T when calling the SIGTRAP signal handler, which is already done by all x86 sendsig(9) ABI code. On the other hand, there is no obvious reason why PSL_T needs to be cleared when returning from the signal handler. For instance, Linux allows userspace to set PSL_T and keep tracing enabled for the desired period. There are userspace programs which would use PSL_T if we make it possible, for instance sbcl. Remember if PSL_T was set by PT_STEP or PT_SETSTEP by mean of TDB_STEP flag, and only clear it when the flag is set. Discussed with: Ali Mashtizadeh Reviewed by: jhb (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D15054 Notes: svn path=/head/; revision=334122
* Cleanups related to debug exceptions on x86.John Baldwin2018-05-221-25/+30
| | | | | | | | | | | | | | | | | | | | | | | | | - Add constants for fields in DR6 and the reserved fields in DR7. Use these constants instead of magic numbers in most places that use DR6 and DR7. - Refer to T_TRCTRAP as "debug exception" rather than a "trace trap" as it is not just for trace exceptions. - Always read DR6 for debug exceptions and only clear TF in the flags register for user exceptions where DR6.BS is set. - Clear DR6 before returning from a debug exception handler as recommended by the SDM dating all the way back to the 386. This allows debuggers to determine the cause of each exception. For kernel traps, clear DR6 in the T_TRCTRAP case and pass DR6 by value to other parts of the handler (namely, user_dbreg_trap()). For user traps, wait until after trapsignal to clear DR6 so that userland debuggers can read DR6 via PT_GETDBREGS while the thread is stopped in trapsignal(). Reviewed by: kib, rgrimes MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D15189 Notes: svn path=/head/; revision=334009
* Prepare DB# handler for deferred trigger of watchpoints.Konstantin Belousov2018-05-081-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since pop %ss/mov %ss instructions defer all interrupts and exceptions for the next instruction, it is possible that the userspace watchpoint trap executes on the first instruction of the kernel entry for syscall/bpt. In this case, DB# should be treated similarly to NMI: on amd64 we must always load GSBASE even if the trap comes from kernel mode, and load the kernel page table root into %cr3. Moreover, the trap must use the dedicated stack, because we are still on the user stack when trapped on syscall entry. For i386, we must reload %cr3. The syscall instruction is not configured, so there is no issue with executing on user stack when trapping. Due to some CPU erratas it is not always possible to detect that the userspace watchpoint triggered by inspecting %dr6. In trap(), compare the trap %rip with the known unsafe entry points and if matched pretend that the watchpoint did not fire at all. Thank you to the MSRC Incident Response Team, and in particular Greg Lenti and Nate Warfield, for coordinating the response to this issue across multiple vendors. Thanks to Computer Recycling at The Working Center of Kitchener for making hardware available to allow us to test the patch on additional CPU families. Reviewed by: jhb Discussed with: Matthew Dillon Tested by: emaste Sponsored by: The FreeBSD Foundation Security: CVE-2018-8897 Security: FreeBSD-SA-18:06.debugreg Notes: svn path=/head/; revision=333368
* amd64: stop asserting params != NULL in the syscall pathMateusz Guzik2018-05-071-2/+1
| | | | | | | | | | | | | The parameter is effectively controllable by userspace. It does not matter what it is set to as it is being passed to copyin - worst case the operation will just fail. While here stop computing it unless it is going to be used. Noted by: dillon@backplane.com Notes: svn path=/head/; revision=333337
* amd64: syscall path bcopy -> memcpyMateusz Guzik2018-05-041-1/+1
| | | | Notes: svn path=/head/; revision=333266
* amd64: get rid of the pessimized bcopy in syscall arg copyMateusz Guzik2018-05-041-1/+1
| | | | | | | | | | | | | | | | | | The code was unnecessarily conditionally copying either 5 or 6 args. It can blindly copy 6, which also means the size is known at compilation time and the operation can be depessimized. Note the entire syscall handling code is rather slow. Tested on Skylake, sample result for getppid (calls/s): without pti: 7310106 -> 10653569 with pti: 3304843 -> 4148306 Some syscalls (like read) did not note any difference, other have typically very modest wins. Notes: svn path=/head/; revision=333241
* Expand the checks for UCR3 == PMAP_NO_CR3 to enable processes to beTycho Nightingale2018-04-271-3/+5
| | | | | | | | | | | excluded from PTI. Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D15100 Notes: svn path=/head/; revision=333059
* set kdb_why to "trap" when calling kdb_trap from trap_fatalAndriy Gapon2018-04-191-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This will allow to hook a ddb script to "kdb.enter.trap" event. Previously there was no specific name for this event, so it could only be handled by either "kdb.enter.unknown" or "kdb.enter.default" hooks. Both are very unspecific. Having a specific event is useful because the fatal trap condition is very similar to panic but it has an additional property that the current stack frame is the frame where the trap occurred. So, both a register dump and a stack bottom dump have additional information that can help analyze the problem. I have added the event only on architectures that have trap_fatal() function defined. I haven't looked at other architectures. Their maintainers can add support for the event later. Sample script: kdb.enter.trap=bt; show reg; x/aS $rsp,20; x/agx $rsp,20 Reviewed by: kib, jhb, markj MFC after: 11 days Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D15093 Notes: svn path=/head/; revision=332752
* don't check for kdb reentry in trap_fatal(), it's impossibleAndriy Gapon2018-04-181-1/+1
| | | | | | | | | | | trap() checks for it earlier and calls kdb_reentry(). Discussed with: jhb MFC after: 12 days Sponsored by: Panzura Notes: svn path=/head/; revision=332730
* Remove very old and unused signal information codes.John Baldwin2018-03-271-2/+3
| | | | | | | | | | | | | These have been supplanted by the MI signal information codes in <sys/signal.h> since 7.0. The FPE_*_TRAP ones were deprecated even earlier in 1999. PR: 226579 (exp-run) Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D14637 Notes: svn path=/head/; revision=331650
* Use correct symbol name in r328202.Konstantin Belousov2018-01-201-2/+2
| | | | | | | | Sponsored by: The FreeBSD Foundation MFC after: 11 days Notes: svn path=/head/; revision=328205
* Use predefined symbol for the CR3.PCID mask.Konstantin Belousov2018-01-201-2/+2
| | | | | | | | Sponsored by: The FreeBSD Foundation MFC after: 11 days Notes: svn path=/head/; revision=328202
* PTI: Trap if we returned to userspace with kernel (full) page tableKonstantin Belousov2018-01-191-0/+11
| | | | | | | | | | | | | | | | | | | | still active. Map userspace portion of VA in the PTI kernel-mode page table as non-executable. This way, if we ever miss reloading ucr3 into %cr3 on the return to usermode, the process traps instead of executing in potentially vulnerable setup. Catch the condition of such trap and verify user-mode %cr3, which is saved by page fault handler. I peek this trick in some article about Linux implementation. Reviewed by: alc, markj (previous version) Sponsored by: The FreeBSD Foundation MFC after: 12 days DIfferential revision: https://reviews.freebsd.org/D13956 Notes: svn path=/head/; revision=328177
* Use a dedicated per-CPU stack for machine check exceptions.John Baldwin2018-01-181-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Similar to NMIs, machine check exceptions can fire at any time and are not masked by IF. This means that machine checks can fire when the kstack is too deep to hold a trap frame, or at critical sections in trap handlers when a user %gs is used with a kernel %cs. Use the same strategy used for NMIs of using a dedicated per-CPU stack configured in IST 3. Store the CPU's pcpu pointer at the stop of the stack so that the machine check handler can reliably find the proper value for %gs (also borrowed from NMIs). This should also fix a similar issue with PTI with a MC# occurring while the CPU is executing on the trampoline stack. While here, bypass trap() entirely and just call mca_intr(). This avoids a bogus call to kdb_reenter() (there's no reason to try to reenter kdb if a MC# is raised). Reviewed by: kib Tested by: avg (on AMD without PTI) Differential Revision: https://reviews.freebsd.org/D13962 Notes: svn path=/head/; revision=328157
* PTI for amd64.Konstantin Belousov2018-01-171-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The implementation of the Kernel Page Table Isolation (KPTI) for amd64, first version. It provides a workaround for the 'meltdown' vulnerability. PTI is turned off by default for now, enable with the loader tunable vm.pmap.pti=1. The pmap page table is split into kernel-mode table and user-mode table. Kernel-mode table is identical to the non-PTI table, while usermode table is obtained from kernel table by leaving userspace mappings intact, but only leaving the following parts of the kernel mapped: kernel text (but not modules text) PCPU GDT/IDT/user LDT/task structures IST stacks for NMI and doublefault handlers. Kernel switches to user page table before returning to usermode, and restores full kernel page table on the entry. Initial kernel-mode stack for PTI trampoline is allocated in PCPU, it is only 16 qwords. Kernel entry trampoline switches page tables. then the hardware trap frame is copied to the normal kstack, and execution continues. IST stacks are kept mapped and no trampoline is needed for NMI/doublefault, but of course page table switch is performed. On return to usermode, the trampoline is used again, iret frame is copied to the trampoline stack, page tables are switched and iretq is executed. The case of iretq faulting due to the invalid usermode context is tricky, since the frame for fault is appended to the trampoline frame. Besides copying the fault frame and original (corrupted) frame to kstack, the fault frame must be patched to make it look as if the fault occured on the kstack, see the comment in doret_iret detection code in trap(). Currently kernel pages which are mapped during trampoline operation are identical for all pmaps. They are registered using pmap_pti_add_kva(). Besides initial registrations done during boot, LDT and non-common TSS segments are registered if user requested their use. In principle, they can be installed into kernel page table per pmap with some work. Similarly, PCPU can be hidden from userspace mapping using trampoline PCPU page, but again I do not see much benefits besides complexity. PDPE pages for the kernel half of the user page tables are pre-allocated during boot because we need to know pml4 entries which are copied to the top-level paging structure page, in advance on a new pmap creation. I enforce this to avoid iterating over the all existing pmaps if a new PDPE page is needed for PTI kernel mappings. The iteration is a known problematic operation on i386. The need to flush hidden kernel translations on the switch to user mode make global tables (PG_G) meaningless and even harming, so PG_G use is disabled for PTI case. Our existing use of PCID is incompatible with PTI and is automatically disabled if PTI is enabled. PCID can be forced on only for developer's benefit. MCE is known to be broken, it requires IST stack to operate completely correctly even for non-PTI case, and absolutely needs dedicated IST stack because MCE delivery while trampoline did not switched from PTI stack is fatal. The fix is pending. Reviewed by: markj (partially) Tested by: pho (previous version) Discussed with: jeff, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=328083
* Avoid re-check of usermode condition.Konstantin Belousov2018-01-011-3/+1
| | | | | | | | | | | | | | | | It does not change anything in the behavior of trap_pfault(), while eliminating obfuscation of jumping to the code which checks for the condition reversed of the goto cause. Also avoid force initialize the rv variable, since it is now only accessed after storing vm_fault() return value. Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D13725 Notes: svn path=/head/; revision=327472
* Pass the trap frame to fasttrap hooks.Mark Johnston2017-12-111-7/+2
| | | | | | | | | | | | | The DTrace fasttrap entry points expect a struct reg containing the register values of the calling thread. Perform the conversion in fasttrap rather than in the trap handler: this reduces the number of ifdefs and avoids wasting stack space for traps that don't involve DTrace. MFC after: 2 weeks Notes: svn path=/head/; revision=326774
* spdx: initial adoption of licensing ID tags.Pedro F. Giffuni2017-11-181-0/+2
| | | | | | | | | | | | | | | | | | | | The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Initially, only tag files that use BSD 4-Clause "Original" license. RelNotes: yes Differential Revision: https://reviews.freebsd.org/D13133 Notes: svn path=/head/; revision=325966
* Simplify the code.Konstantin Belousov2017-08-201-3/+2
| | | | | | | | | Noted by: Oliver Pinter Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=322723