aboutsummaryrefslogtreecommitdiff
path: root/sys/kern
Commit message (Collapse)AuthorAgeFilesLines
* Extend libteken to support CJK fullwidth characters.Ed Schouten2013-12-201-1/+1
| | | | | | | | | | | | | | | | Introduce a new formatting bit (TF_CJK_RIGHT) that is set when putting a cell that is the right part of a CJK fullwidth character. This will allow drivers like vt(9) to support fullwidth characters properly. emaste@ has a patch to extend vt(9)'s font handling to increase the number of Unicode -> glyph maps from 2 ({normal,bold)} to 4 ({normal,bold} x {left,right}). This will need to use this formatting bit to determine whether to draw the left or right glyph. Reviewed by: emaste Notes: svn path=/head/; revision=259667
* Move list of ttys handling from the allocating procedures, to theGleb Smirnoff2013-12-201-10/+10
| | | | | | | | | | device creation stage. A device creation can fail, and in that case an entry already on the list will be freed. Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=259663
* Fix compilation on 32 bit architectures and use INT64_MAX instead ofStefan Eßer2013-12-191-3/+6
| | | | | | | LONG_MAX for the upper bound check. Notes: svn path=/head/; revision=259633
* Fix overflow for timeout values of more than 68 years, which is the maximumStefan Eßer2013-12-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | covered by sbintime (LONG_MAX seconds). Some programs use timeout values in excess of 1000 years. The conversion to sbintime caused wrap-around on overflow, which resulted in short or negative timeout values. This caused long delays on sockets opened by affected programs (e.g. OpenSSH). Kernels compiled without -fno-strict-overflow were not affected, apparently because the compiler tested the sign of the timeout value before performing the multiplication that lead to overflow. When the -fno-strict-overflow option was added to CFLAGS, this optimization was disabled and the test was performed on the result of the multiplication. Negative products were caught and resulted in EINVAL being returned, but wrap-around to positive values just shortened the timeout value to the residue of the result that could be represented by sbintime. The fix is to cap the timeout values at the maximum that can be represented by sbintime, which is 2^31 - 1 seconds or more than 68 years. After this change, the kernel can be compiled with -fno-strict-overflow with no ill effects. MFC after: 3 days Notes: svn path=/head/; revision=259609
* Invoke the kld_* event handlers from linker_load_file() andMark Johnston2013-12-191-24/+14
| | | | | | | | | | | | | | | | | linker_unload_file() rather than kern_kldload() and kern_kldunload(). This ensures that the handlers are invoked for files that are loaded/unloaded automatically as dependencies. Previously, they were only invoked for files loaded by a user. As a side effect, the kld_load and kld_unload handlers are now invoked with the kernel linker lock exclusively held. Reported by: avg Reviewed by: jhb MFC after: 2 weeks Notes: svn path=/head/; revision=259587
* - Rename tty_makedev() into tty_makedevf() and make it capableGleb Smirnoff2013-12-181-36/+73
| | | | | | | | | | | | | | | | | | | | | | to fail and return error. - Use make_dev_p() in tty_makedevf() instead of make_dev_cred(). - Always pass MAKEDEV_CHECKNAME flag. - Optionally pass MAKEDEV_REF flag. - Provide macro for compatibility with old API. This fixes races with simultaneous creation and desctruction of ttys, and makes it possible to call tty_makedevf() from device cloners. A race in tty_watermarks() still exist, since the latter drops lock for M_WAITOK allocation. This will be addressed in separate commit. Reviewed by: kib Sponsored by: Nginx, Inc. Notes: svn path=/head/; revision=259549
* The fasttrap fork handler is responsible for removing tracepoints in theMark Johnston2013-12-181-5/+5
| | | | | | | | | | | | | | | | | | | | | child process that were inherited from its parent. However, this should not be done in the case of a vfork, since the fork handler ends up removing the tracepoints from the shared vm space, and userland DTrace probes in the parent will no longer fire as a result. Now the child of a vfork may trigger userland DTrace probes enabled in its parent, so modify the fasttrap probe handler to handle this case and handle the child process in the same way that it would handle the traced process. In particular, if once traces function foo() in a process that vforks, and the child calls foo(), fasttrap will treat this call as having come from the parent. This is the behaviour of the upstream code. While here, add #ifdef guards to some code that isn't present upstream. MFC after: 1 month Notes: svn path=/head/; revision=259535
* If vn_open_vnode() succeeded in opening the vnode, but subsequentKonstantin Belousov2013-12-171-0/+3
| | | | | | | | | | | | advisory lock cannot be obtained, prevent double-close of the vnode in vn_close() called from the fdrop(), by resetting file' f_ops methods. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=259522
* Fix copy/paste typo.Andrey V. Elsukov2013-12-171-1/+1
| | | | | | | MFC after: 1 week Notes: svn path=/head/; revision=259520
* - Assert for not leaking readers rw locks counter on userland return.Attilio Rao2013-12-172-0/+6
| | | | | | | | | - Use a correct spin_cnt for KDTRACE_HOOK case in rw read lock. Sponsored by: EMC / Isilon storage division Notes: svn path=/head/; revision=259509
* Remove the invariants stuff I copy/paste'd from the mbuf code whenAdrian Chadd2013-12-171-7/+1
| | | | | | | | | | | setting up the UMA zone. This should (a) be correct(er) and (b) it should build on non-amd64. Pointed out by: glebius Notes: svn path=/head/; revision=259489
* Migrate the sendfile_sync struct to use a UMA zone rather than M_TEMP.Adrian Chadd2013-12-161-2/+22
| | | | | | | | | | This allows it to be better tracked as well as being able to leverage UMA for more interesting/useful behaviour at a later date. Sponsored by: Netflix, Inc. Notes: svn path=/head/; revision=259475
* Fix periodic per-CPU timers startup on boot.Alexander Motin2013-12-161-1/+2
| | | | | | | | Reported by: neel MFC after: 2 weeks Notes: svn path=/head/; revision=259464
* Properly drain the TTY when both revoke(2) and close(2) end up closingMarcel Moolenaar2013-12-161-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the TTY. In such a case, ttydev_close() is called multiple times and each time, t_revokecnt is incremented and cv_broadcast() is called for both the t_outwait and t_inwait condition variables. Let's say revoke(2) comes in first and gets to call tty_drain() from ttydev_leave(). Let's say that the revoke comes from init(8) as the result of running "shutdown -r now". Since shutdown prints various messages to the console before announing that the machine will reboot immediately, let's also say that the output queue is not empty and that tty_drain() has something to do. Let's assume this all happens on a 9600 baud serial console, so it takes a time to drain. The shutdown command will exit(2) and as such will end up closing stdout. Let's say this close will come in second, bump t_revokecnt and call tty_wakeup(). This has tty_wait() return prematurely and the next thing that will happen is that the thread doing revoke(2) will flush the TTY. Since the drain wasn't complete, the flush will effectively drop whatever is left in t_outq. This change takes into account that tty_drain() will return ERESTART due to the fact that t_revokecnt was bumped and in that case simply call tty_drain() again. The thread in question is already performing the close so it can safely finish draining the TTY before destroying the TTY structure. Now all messages from shutdown will be printed on the serial console. Obtained from: Juniper Networks, Inc. Notes: svn path=/head/; revision=259441
* Regenerate after r259438.Pawel Jakub Dawidek2013-12-151-16/+16
| | | | Notes: svn path=/head/; revision=259439
* Fix syscalls that can be loaded as kernel modules - they were not givenPawel Jakub Dawidek2013-12-151-1/+1
| | | | | | | | | the flag allowing to call them from capability mode sandbox. Noticed by: David Drysdale <drysdale@google.com> Notes: svn path=/head/; revision=259438
* Regenerate after r259436.Pawel Jakub Dawidek2013-12-151-1/+1
| | | | Notes: svn path=/head/; revision=259437
* Allow for pselect(2) in capability mode.Pawel Jakub Dawidek2013-12-151-1/+2
| | | | | | | Noticed by: David Drysdale <drysdale@google.com> Notes: svn path=/head/; revision=259436
* Forgot to regenerate after r257736.Pawel Jakub Dawidek2013-12-151-1/+1
| | | | Notes: svn path=/head/; revision=259435
* proc exit: don't take PROC_LOCK while freeing rlimitsMateusz Guzik2013-12-151-2/+0
| | | | | | | | | | Code wishing to check rlimits of some process should check whether it is exiting first, which current consumers do. MFC after: 2 weeks Notes: svn path=/head/; revision=259407
* rlimit: avoid unnecessary copying of rlimitsMateusz Guzik2013-12-131-6/+16
| | | | | | | | | If refcount is 1 just modify rlimits in place. MFC after: 2 weeks Notes: svn path=/head/; revision=259331
* rlimit: add and utilize lim_sharedMateusz Guzik2013-12-131-1/+11
| | | | | | | MFC after: 2 weeks Notes: svn path=/head/; revision=259330
* Create own free list for each of the first 32 possible allocation sizes.Alexander Motin2013-12-111-9/+17
| | | | | | | | | | | | | | | | | | | | | | | In case of 4K allocation quantum that means for allocations up to 128K. With growth of memory fragmentation these lists may grow to quite a large sizes (tenths and hundreds of thousands items). Having in one list items of different sizes in worst case may require full linear list traversal, that may be very expensive. Having lists for items of single size means that unless user specify some alignment or border requirements (that are very rare cases) first item found on the list should satisfy the request. While running SPEC NFS benchmark on top of ZFS on 24-core machine with 84GB RAM this change reduces CPU time spent in vmem_xalloc() from 8% and lock congestion spinning around it from 20% to invisible levels. And that all is by the cost of just 26 more pointers per vmem instance. If at some point our kernel will start to actively use KVA allocations with odd sizes above 128K, something may need to be done to bigger lists also. Notes: svn path=/head/; revision=259232
* Fix detection of EOF in kern_physio(). If bio_length was clipped byKonstantin Belousov2013-12-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | the excess code in g_io_check(), bio_resid is also truncated by g_io_deliver(). As result, bufdonebio() assigns truncated value to the buffer b_resid field. Use the residual bio_completed to calculate buffer b_resid from b_bcount in bufdonebio(), instead of bio_resid, calculated from bio_length in g_io_deliver(). The issue is seemingly caused by the code rearrange into g_io_check(), which is not present in stable/10. The change still looks as the useful change to have in 10 nevertheless. Reported by: Stefan Hegnauer <stefan.hegnauer@gmx.ch> Tested by: pho, Stefan Hegnauer <stefan.hegnauer@gmx.ch> Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=259200
* Merge VT(9) project (a.k.a. newcons).Aleksandr Rybalko2013-12-051-0/+602
| | | | | | | | | | Reviewed by: nwhitehorn MFC_to_10_after: re approval Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=259016
* Make panic_reboot_wait_time static.Colin Percival2013-12-051-1/+1
| | | | | | | Submitted by: jhb Notes: svn path=/head/; revision=258956
* Rename sysctl kern.supported_abis to kern.supported_archs, since it givesNathan Whitehorn2013-12-041-3/+3
| | | | | | | the set of MACHINE_ARCH values that can be run. Notes: svn path=/head/; revision=258928
* Break the loop once we know we have the SYF_CAPENABLED flag.Pawel Jakub Dawidek2013-12-041-0/+1
| | | | Notes: svn path=/head/; revision=258900
* Add a new sysctl / loader tunable kern.panic_reboot_wait_time whichColin Percival2013-12-031-4/+9
| | | | | | | | | defaults to PANIC_REBOOT_WAIT_TIME (a long-existing kernel config setting). Use this now-variable value in place of the defined constant to control how long the system waits after a panic before rebooting. Notes: svn path=/head/; revision=258893
* Fix an off-by-one error in r228960. The maximum priority delta providedJohn Baldwin2013-12-031-1/+1
| | | | | | | | | | | | | by SCHED_PRI_TICKS should be SCHED_PRI_RANGE - 1 so that the resulting priority value (before nice adjustment) is between SCHED_PRI_MIN and SCHED_PRI_MAX, inclusive. Submitted by: kib Reported by: pho MFC after: 1 week Notes: svn path=/head/; revision=258869
* Add new sysctl, kern.supported_abis, containing the list of FreeBSDNathan Whitehorn2013-12-021-0/+7
| | | | | | | | | | | | | | | | | | | | | MACHINE_ARCH values whose binaries this kernel can run. This patch provides a feature requested for implementing pkgng ABI identifiers in a robust way. The list is designed to indicate whether, say, an i386 package can be run on the current system. If kern.supported_abis contains "i386", then the answer is yes. Otherwise, the answer is no. At the moment, this only supports MACHINE_ARCH and MACHINE_ARCH32. As we gain support for more interesting combinations, this needs to become more flexible, possibily through the sysent framework, along with the hw.machine_arch emulation immediately preceding this code in kern_mib.c. Reviewed by: imp MFC after: 3 days Notes: svn path=/head/; revision=258819
* Remove unused variable.Gleb Smirnoff2013-12-011-2/+0
| | | | Notes: svn path=/head/; revision=258812
* Migrate the sendfile_sync structure into a public(ish) API in preparationAdrian Chadd2013-12-012-37/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for extending and reusing it. The sendfile_sync wrapper is mostly just a "mbuf transaction" wrapper, used to indicate that the backing store for a group of mbufs has completed. It's only being used by sendfile for now and it's only implementing a sleep/wakeup rendezvous. However, there are other potential signaling paths (kqueue) and other potential uses (socket zero-copy write) where the same mechanism would also be useful. So, with that in mind: * extract the sendfile_sync code out into sf_sync_*() methods * teach the sf_sync_alloc method about the current config flag - it will eventually know about kqueue. * move the sendfile_sync code out of do_sendfile() - the only thing it now knows about is the sfs pointer. The guts of the sync rendezvous (setup, rendezvous/wait, free) is now done in the syscall wrapper. * .. and teach the 32-bit compat sendfile call the same. This should be a no-op. It's primarily preparation work for teaching the sendfile_sync about kqueue notification. Tested: * Peter Holm's sendfile stress / regression scripts Sponsored by: Netflix, Inc. Notes: svn path=/head/; revision=258788
* Make process descriptors standard part of the kernel. rwhod(8) alreadyPawel Jakub Dawidek2013-11-305-49/+0
| | | | | | | | | | | requires process descriptors to work and having PROCDESC in GENERIC seems not enough, especially that we hope to have more and more consumers in the base. MFC after: 3 days Notes: svn path=/head/; revision=258768
* jail_v0.ip_number was always in host byte order. This was handledPeter Wemm2013-11-281-1/+1
| | | | | | | | | | | | in one of the many layers of indirection and shims through stable/7 in jail_handle_ips(). When it was cleaned up and unified through kern_jail() for 8.x, the byte order swap was lost. This only matters for ancient binaries that call jail(2) themselves internally. Notes: svn path=/head/; revision=258718
* add taskqueue_drain_allAndriy Gapon2013-11-281-0/+30
| | | | | | | | | | | | | This API has semantics similar to that of taskqueue_drain but acts on all tasks that might be queued or running on a taskqueue. A caller must ensure that no new tasks are being enqueued otherwise this call would be totally meaningless. For example, if the tasks are enqueued by an interrupt filter then its interrupt must be disabled. MFC after: 10 days Notes: svn path=/head/; revision=258713
* Add an kinfo sysctl to retrieve signal trampoline location for theKonstantin Belousov2013-11-261-0/+58
| | | | | | | | | | | | | | given process. Note that the correctness of the trampoline length returned for ABIs which do not use shared page depends on the correctness of the struct sysvec sv_szsigcodebase member, which will be fixed on as-need basis. Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=258661
* use saner calculations in should_yieldAndriy Gapon2013-11-261-1/+1
| | | | | | | | | This is based on feedback from bde. MFC after: 6 days Notes: svn path=/head/; revision=258648
* sdt: add support for solaris/illumos style DTRACE_PROBE macrosAndriy Gapon2013-11-261-0/+2
| | | | | | | | | | | | | | | | | | | | The new macros are implemented in terms of SDT_PROBE_DEFINE and SDT_PROBE. Probes defined in this way will appear under SDT provider named "sdt". Parameter types are exposed via SDT_PROBE_ARGTYPE. This is something that illumos does not have by default. This kind of SDT probes is already present in ZFS code, so those probes will now be available if KDTRACE_HOOKS options is enabled. A potential future illumos compatibility enhancement is to encode a provider name as a prefix in a probe name. Reviewed by: markj MFC after: 3 weeks X-MFC after: r258622 Notes: svn path=/head/; revision=258625
* dtrace sdt: remove the ugly sname parameter of SDT_PROBE_DEFINEAndriy Gapon2013-11-2618-112/+112
| | | | | | | | | | | In its stead use the Solaris / illumos approach of emulating '-' (dash) in probe names with '__' (two consecutive underscores). Reviewed by: markj MFC after: 3 weeks Notes: svn path=/head/; revision=258622
* Refactor out the sendfile copyout in order to make vn_sendfile()Adrian Chadd2013-11-261-3/+6
| | | | | | | | | | | | | | | | | | callable from the kernel. Right now vn_sendfile() can't be called from anything other than a syscall handler _and_ return the number of bytes queued. This simply moves the copyout() to do_sendfile() so that any kernel code can initiate vn_sendfile() outside of a syscall context. Tested: * tiny little sendfile program spitting things out a tcp socket Sponsored by: Netflix, Inc. Notes: svn path=/head/; revision=258613
* - For kernel compiled only with KDTRACE_HOOKS and not any lock debuggingAttilio Rao2013-11-2530-42/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip Notes: svn path=/head/; revision=258541
* Revert back to use int for the page counts. In vn_io_fault(), the i/oKonstantin Belousov2013-11-201-11/+9
| | | | | | | | | | | | | | | is chunked to pieces limited by integer io_hold_cnt tunable, while vm_fault_quick_hold_pages() takes integer max_count as the upper bound. Rearrange the checks to correctly handle overflowing address arithmetic. Submitted by: bde Tested by: pho Discussed with: alc MFC after: 1 week Notes: svn path=/head/; revision=258365
* taskqueue_cancel: garbage collect a write-only variableAndriy Gapon2013-11-191-2/+0
| | | | | | | MFC after: 3 days Notes: svn path=/head/; revision=258354
* Fix siginfo_t.si_status for wait6/waitid/SIGCHLD.Jilles Tjoelker2013-11-172-12/+18
| | | | | | | | | | | | | | | | | | Per POSIX, si_status should contain the value passed to exit() for si_code==CLD_EXITED and the signal number for other si_code. This was incorrect for CLD_EXITED and CLD_DUMPED. This is still not fully POSIX-compliant (Austin group issue #594 says that the full value passed to exit() shall be returned via si_status, not just the low 8 bits) but is sufficient for a si_status-related test in libnih (upstart, Debian/kFreeBSD). PR: kern/184002 Reported by: Dmitrijs Ledkovs Tested by: Dmitrijs Ledkovs Notes: svn path=/head/; revision=258281
* Replace CAP_POLL_EVENT and CAP_POST_EVENT capability rights (which I hadPawel Jakub Dawidek2013-11-153-10/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | a very hard time to fully understand) with much more intuitive rights: CAP_EVENT - when set on descriptor, the descriptor can be monitored with syscalls like select(2), poll(2), kevent(2). CAP_KQUEUE_EVENT - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the eventlist argument set to non-NULL value; in other words the given kqueue descriptor can be used to monitor other descriptors. CAP_KQUEUE_CHANGE - When set on a kqueue descriptor, the kevent(2) syscall can be called on this kqueue to with the changelist argument set to non-NULL value; in other words it allows to modify events monitored with the given kqueue descriptor. Add alias CAP_KQUEUE, which allows for both CAP_KQUEUE_EVENT and CAP_KQUEUE_CHANGE. Add backward compatibility define CAP_POLL_EVENT which is equal to CAP_EVENT. Sponsored by: The FreeBSD Foundation MFC after: 3 days Notes: svn path=/head/; revision=258181
* Don't allow vfs.lorunningspace or vfs.hirunningspace to be set suchJohn Baldwin2013-11-151-2/+33
| | | | | | | | | | that lorunningspace is greater than hirunningspace as the system performs terribly if it is mistuned in this fashion. MFC after: 1 week Notes: svn path=/head/; revision=258174
* Change cap_rights_merge(3) and cap_rights_remove(3) to return pointerPawel Jakub Dawidek2013-11-141-4/+12
| | | | | | | | | | | to the destination cap_rights_t structure. This already matches manual page. MFC after: 3 days Notes: svn path=/head/; revision=258149
* Add a note that this file is compiled as part of the kernel and libc.Pawel Jakub Dawidek2013-11-141-0/+4
| | | | | | | | Requested by: kib MFC after: 3 days Notes: svn path=/head/; revision=258148
* Fix a very bad typo from r248887.Gleb Smirnoff2013-11-141-0/+1
| | | | | | | Submitted by: art Notes: svn path=/head/; revision=258128