aboutsummaryrefslogtreecommitdiff
path: root/sys/vm/vm_object.h
Commit message (Collapse)AuthorAgeFilesLines
* Add a blocking counter KPI.Mark Johnston2020-02-281-4/+5
| | | | | | | | | | | | | | | | | | | | refcount(9) was recently extended to support waiting on a refcount to drop to zero, as this was needed for a lockless VM object paging-in-progress counter. However, this adds overhead to all uses of refcount(9) and doesn't really match traditional refcounting semantics: once a counter has dropped to zero, the protected object may be freed at any point and it is not safe to dereference the counter. This change removes that extension and instead adds a new set of KPIs, blockcount_*, for use by VM object PIP and busy. Reviewed by: jeff, kib, mjg Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D23723 Notes: svn path=/head/; revision=358432
* sys/vm: quiet -Wwrite-stringsRyan Libby2020-02-231-2/+2
| | | | | | | | | Discussed with: kib Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D23796 Notes: svn path=/head/; revision=358256
* Enable vm_object_mightbedirty() and vm_object_page_clean() for swapKonstantin Belousov2020-02-041-2/+9
| | | | | | | | | | | | | | | | | | | | objects backing tmpfs vnodes data. The clean scan is limited to only remove write permissions from the mapped pages of the objects. This fixes the issue that tmpfs vnode mtime is not updated from writes to the mmaped area after the initial page-in. Noted by: mjg Reviewed by: markj Discussed with: jeff Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D23432 Notes: svn path=/head/; revision=357514
* Don't hold the object lock while calling getpages.Jeff Roberson2020-01-191-0/+7
| | | | | | | | | | | | | The vnode pager does not want the object lock held. Moving this out allows further object lock scope reduction in callers. While here add some missing paging in progress calls and an assert. The object handle is now protected explicitly with pip. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23033 Notes: svn path=/head/; revision=356902
* Make collapse synchronization more explicit and allow it to complete duringJeff Roberson2020-01-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | paging. Shadow objects are marked with a COLLAPSING flag while they are collapsing with their backing object. This gives us an explicit test rather than overloading paging-in-progress. While split is on-going we mark an object with SPLIT. These two operations will modify the swap tree so they must be serialized and swap_pager_getpages() can now directly detect these conditions and page more conservatively. Callers to vm_object_collapse() now will reliably wait for a collapse to finish so that the backing chain is as short as possible before other decisions are made that may inflate the object chain. For example, split, coalesce, etc. It is now safe to run fault concurrently with collapse. It is safe to increase or decrease paging in progress with no lock so long as there is another valid ref on increase. This change makes collapse more reliable as a secondary benefit. The primary benefit is making it safe to drop the object lock much earlier in fault or never acquire it at all. This was tested with a new shadow chain test script that uncovered long standing bugs and will be integrated with stress2. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22908 Notes: svn path=/head/; revision=356886
* Store the bottom of the shadow chain in OBJ_ANON object->handle member.Konstantin Belousov2019-12-011-2/+4
| | | | | | | | | | | | | | | | | | | | The handle value is stable for all shadow objects in the inheritance chain. This allows to avoid descending the shadow chain to get to the bottom of it in vm_map_entry_set_vnode_text(), and eliminate corresponding object relocking which appeared to be contending. Change vm_object_allocate_anon() and vm_object_shadow() to handle more of the cred/charge initialization for the new shadow object, in addition to set up the handle. Reported by: jeff Reviewed by: alc (previous version), jeff (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differrential revision: https://reviews.freebsd.org/D22541 Notes: svn path=/head/; revision=355270
* Only keep anonymous objects on shadow lists. This eliminates locking ofJeff Roberson2019-11-201-0/+1
| | | | | | | | | | globally visible objects when they are part of a backing chain. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D22423 Notes: svn path=/head/; revision=354871
* Simplify anonymous memory handling with an OBJ_ANON flag. This eliminatesJeff Roberson2019-11-191-1/+2
| | | | | | | | | | | | | | | reudundant complicated checks and additional locking required only for anonymous memory. Introduce vm_object_allocate_anon() to create these objects. DEFAULT and SWAP objects now have the correct settings for non-anonymous consumers and so individual consumers need not modify the default flags to create super-pages and avoid ONEMAPPING/NOSPLIT. Reviewed by: alc, dougm, kib, markj Tested by: pho Differential Revision: https://reviews.freebsd.org/D22119 Notes: svn path=/head/; revision=354869
* Replace OBJ_MIGHTBEDIRTY with a system using atomics. Remove the TMPFS_DIRTYJeff Roberson2019-10-291-3/+10
| | | | | | | | | | | | | | flag and use the same system. This enables further fault locking improvements by allowing more faults to proceed with a shared lock. Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D22116 Notes: svn path=/head/; revision=354158
* Use atomics and a shared object lock to protect the object reference count.Jeff Roberson2019-10-291-1/+1
| | | | | | | | | | | | | | Certain consumers still need to guarantee a stable reference so we can not switch entirely to atomics yet. Exclusive lock holders can still modify and examine the refcount without using the ref api. Reviewed by: kib Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21598 Notes: svn path=/head/; revision=354157
* Add VV_VMSIZEVNLOCK flag.Konstantin Belousov2019-10-221-0/+1
| | | | | | | | | | | | | | | The flag specifies that vm_fault() handler should check the vnode' vm_object size under the vnode lock. It is converted into the object' OBJ_SIZEVNLOCK flag in vnode_pager_alloc(). Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883 Notes: svn path=/head/; revision=353890
* (5/6) Move the VPO_NOSYNC to PGA_NOSYNC to eliminate the dependency on theJeff Roberson2019-10-151-1/+1
| | | | | | | | | | | | object lock in vm_page_set_validclean(). Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21595 Notes: svn path=/head/; revision=353540
* (3/6) Add a shared object busy synchronization mechanism that blocks new pageJeff Roberson2019-10-151-0/+14
| | | | | | | | | | | | | | | | | busy acquires while held. This allows code that would need to acquire and release a very large number of page busy locks to use the old mechanism where busy is only checked and not held. This comes at the cost of false positives but never false negatives which the single consumer, vm_fault_soft_fast(), handles. Reviewed by: kib Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21592 Notes: svn path=/head/; revision=353538
* vm pager: writemapping accounting for OBJT_SWAPKyle Evans2019-09-031-0/+1
| | | | | | | | | | | | | | | | | | | | | Currently writemapping accounting is only done for vnode_pager which does some accounting on the underlying vnode. Extend this to allow accounting to be possible for any of the pager types. New pageops are added to update/release writecount that need to be implemented for any pager wishing to do said accounting, and we implement these methods now for both vnode_pager (unchanged) and swap_pager. The primary motivation for this is to allow other systems with OBJT_SWAP objects to check if their objects have any write mappings and reject operations with EBUSY if so. posixshm will be the first to do so in order to reject adding write seals to the shmfd if any writable mappings exist. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D21456 Notes: svn path=/head/; revision=351795
* Rework v_object lifecycle for vnodes.Konstantin Belousov2019-08-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | Current implementation of vnode_create_vobject() and vnode_destroy_vobject() is written so that it prepared to handle the vm object destruction for live vnode. Practically, no filesystems use this, except for some remnants that were present in UFS till today. One of the consequences of that model is that each filesystem must call vnode_destroy_vobject() in VOP_RECLAIM() or earlier, as result all of them get rid of the v_object in reclaim. Move the call to vnode_destroy_vobject() to vgonel() before VOP_RECLAIM(). This makes v_object stable: either the object is NULL, or it is valid vm object till the vnode reclamation. Remove code from vnode_create_vobject() to handle races with the parallel destruction. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412 Notes: svn path=/head/; revision=351598
* Use an atomic reference count for paging in progress so that callers do notJeff Roberson2019-08-191-3/+2
| | | | | | | | | | | | require the object lock. Reviewed by: markj Tested by: pho (as part of a larger branch) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21311 Notes: svn path=/head/; revision=351241
* Permit vm_pager_has_page() to run with a shared lock. IntroduceJeff Roberson2019-08-191-0/+4
| | | | | | | | | | | | | VM_OBJECT_DROP/VM_OBJECT_PICKUP to handle functions that are called with uncertain lock state. Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21310 Notes: svn path=/head/; revision=351235
* Change the vm_ooffset_t type to unsigned.Konstantin Belousov2018-12-021-9/+2
| | | | | | | | | | | | | | | | | | | The type represents byte offset in the vm_object_t data space, which does not span negative offsets in FreeBSD VM. The change matches byte offset signess with the unsignedness of the vm_pindex_t which represents the type of the page indexes in the objects. This allows to remove the UOFF_TO_IDX() macro which was used when we have to forcibly interpret the type as unsigned anyway. Also it fixes a lot of implicit bugs in the device drivers d_mmap methods. Reviewed by: alc, markj (previous version) Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=341398
* Include path for tmpfs objects in vm.objects sysctlEric van Gyzen2018-11-301-0/+3
| | | | | | | | | | | | | | This applies the fix in r283924 to the vm.objects sysctl added by r283624 so the output will include the vnode information (i.e. path) for tmpfs objects. Reviewed by: kib, dab MFC after: 2 weeks Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D2724 Notes: svn path=/head/; revision=341282
* Use per-domain locks for vm page queue free. Move paging control fromJeff Roberson2018-02-061-0/+11
| | | | | | | | | | | | | | global to per-domain state. Protect reservations with the free lock from the domain that they belong to. Refactor to make vm domains more of a first class object. Reviewed by: markj, kib, gallatin Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14000 Notes: svn path=/head/; revision=328954
* Implement 'domainset', a cpuset based NUMA policy mechanism. This allowsJeff Roberson2018-01-121-0/+2
| | | | | | | | | | | | | | | | | | | userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403 Notes: svn path=/head/; revision=327895
* SPDX: Consider code from Carnegie-Mellon University.Pedro F. Giffuni2017-11-301-1/+1
| | | | | | | Interesting cases, most likely from CMU Mach sources. Notes: svn path=/head/; revision=326403
* Eliminate kmem_arena and kmem_object in preparation for further NUMA commits.Jeff Roberson2017-11-281-2/+2
| | | | | | | | | | | | | | | | | | | | The arena argument to kmem_*() is now only used in an assert. A follow-up commit will remove the argument altogether before we freeze the API for the next release. This replaces the hard limit on kmem size with a soft limit imposed by UMA. When the soft limit is exceeded we periodically wakeup the UMA reclaim thread to attempt to shrink KVA. On 32bit architectures this should behave much more gracefully as we exhaust KVA. On 64bit the limits are likely never hit. Reviewed by: markj, kib (some objections) Discussed with: alc Tested by: pho Sponsored by: Netflix / Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D13187 Notes: svn path=/head/; revision=326347
* sys: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326023
* Use existing tag name for the vm_object' memq.Konstantin Belousov2017-09-131-1/+6
| | | | | | | | | Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=323558
* Replace global swhash in swap pager with per-object trie to track swapKonstantin Belousov2017-08-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | blocks assigned to the object pages. - The global swhash_mtx is removed, trie is synchronized by the corresponding object lock. - The swp_pager_meta_free_all() function used during object termination is optimized by only looking at the trie instead of having to search whole hash for the swap blocks owned by the object. - On swap_pager_swapoff(), instead of iterating over the swhash, global object list have to be inspected. There, we have to ensure that we do see valid trie content if we see that the object type is swap. Sizing of the swblk zone is same as for swblock zone, each swblk maps SWAP_META_PAGES pages. Proposed by: alc Reviewed by: alc, markj (previous version) Tested by: alc, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D11435 Notes: svn path=/head/; revision=322913
* Add OBJ_PG_DTOR flag to VM object.Ruslan Bukin2017-08-161-0/+1
| | | | | | | | | | | | | | | | | | Setting this flag allows us to skip pages removal from VM object queue during object termination and to leave that for cdev_pg_dtor function. Move pages removal code to separate function vm_object_terminate_pages() as comments does not survive indentation. This will be required for Intel SGX support where we will have to remove pages from VM object manually. Reviewed by: kib, alc Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D11688 Notes: svn path=/head/; revision=322571
* Fix style: change spaces to tabs.Ruslan Bukin2017-07-211-3/+3
| | | | | | | Sponsored by: DARPA, AFRL Notes: svn path=/head/; revision=321330
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* Consistently handle negative or wrapping offsets in the mmap(2) syscalls.Konstantin Belousov2017-02-121-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | For regular files and posix shared memory, POSIX requires that [offset, offset + size) range is legitimate. At the maping time, check that offset is not negative. Allowing negative offsets might expose the data that filesystem put into vm_object for internal use, esp. due to OFF_TO_IDX() signess treatment. Fault handler verifies that the mapped range is valid, assuming that mmap(2) checked that arithmetic gives no undefined results. For device mappings, leave the semantic of negative offsets to the driver. Correct object page index calculation to not erronously propagate sign. In either case, disallow overflow of offset + size. Update mmap(2) man page to explain the requirement of the range validity, and behaviour when the range becomes invalid after mapping. Reported and tested by: royger (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=313690
* Style, use tab after #define.Konstantin Belousov2017-02-041-2/+2
| | | | | | | | | Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 3 days Notes: svn path=/head/; revision=313249
* Eliminate every mention of PG_CACHED pages from the comments in the machine-Alan Cox2016-12-121-11/+0
| | | | | | | | | | | | | | | | | | | independent layer of the virtual memory system. Update some of the nearby comments to eliminate redundancy and improve clarity. In vm/vm_reserv.c, do not use hyphens after adverbs ending in -ly per The Chicago Manual of Style. Update the comment in vm/vm_page.h defining the four types of page queues to reflect the elimination of PG_CACHED pages and the introduction of the laundry queue. Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8752 Notes: svn path=/head/; revision=309898
* Add a new populate() pager method and extend device pager ops vectorKonstantin Belousov2016-12-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with cdev_pg_populate() to provide device drivers access to it. It gives drivers fine control of the pages ownership and allows drivers to implement arbitrary prefault policies. The populate method is called on a page fault and is supposed to populate the vm object with the page at the fault location and some amount of pages around it, at pager's discretion. VM provides the pager with the hints about current range of the object mapping, to avoid instantiation of immediately unused pages, if pager decides so. Also, VM passes the fault type and map entry protection to the pager, allowing it to force the optimal required ownership of the mapped pages. Installed pages must contiguously fill the returned region, be fully valid and exclusively busied. Of course, the pages must be compatible with the object' type. After populate() successfully returned, VM fault handler installs as many instantiated pages into the process page tables as it sees reasonable, while still obeying the correct semantic for COW and vm map locking. The method is opt-in, pager sets OBJ_POPULATE flag to indicate that the method can be called. If pager' vm objects can be shadowed, pager must implement the traditional getpages() method in addition to the populate(). Populate() might fall back to the getpages() on per-call basis as well, by returning VM_PAGER_BAD error code. For now for device pagers, the populate() method is only allowed to be used by the managed device pagers, but the limitation is only made because there is no unmanaged fault handlers which could use it right now. KPI designed together with, and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 weeks Notes: svn path=/head/; revision=309710
* Remove most of the code for implementing PG_CACHED pages. (This change doesAlan Cox2016-11-151-8/+0
| | | | | | | | | | | | | not remove user-space visible fields from vm_cnt or all of the references to cached pages from comments. Those changes will come later.) Reviewed by: kib, markj Tested by: pho Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D8497 Notes: svn path=/head/; revision=308691
* The vmtotal sysctl handler marks active vm objects to calculateKonstantin Belousov2016-06-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | statistics. Marking is done by setting the OBJ_ACTIVE flag. The flags change is locked, but the problem is that many parts of system assume that vm object initialization ensures that no other code could change the object, and thus performed lockless. The end result is corrupted flags in vm objects, most visible is spurious OBJ_DEAD flag, causing random hangs. Avoid the active object marking, instead provide equally inexact but immutable is_object_alive() definition for the object mapped state. Avoid iterating over the processes mappings altogether by using arguably improved definition of the paging thread as one which sleeps on the v_free_count. PR: 204764 Diagnosed by: pho Tested by: pho (previous version) Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Approved by: re (gjb) Notes: svn path=/head/; revision=302063
* Add implementation of robust mutexes, hopefully close enough to theKonstantin Belousov2016-05-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | intention of the POSIX IEEE Std 1003.1TM-2008/Cor 1-2013. A robust mutex is guaranteed to be cleared by the system upon either thread or process owner termination while the mutex is held. The next mutex locker is then notified about inconsistent mutex state and can execute (or abandon) corrective actions. The patch mostly consists of small changes here and there, adding neccessary checks for the inconsistent and abandoned conditions into existing paths. Additionally, the thread exit handler was extended to iterate over the userspace-maintained list of owned robust mutexes, unlocking and marking as terminated each of them. The list of owned robust mutexes cannot be maintained atomically synchronous with the mutex lock state (it is possible in kernel, but is too expensive). Instead, for the duration of lock or unlock operation, the current mutex is remembered in a special slot that is also checked by the kernel at thread termination. Kernel must be aware about the per-thread location of the heads of robust mutex lists and the current active mutex slot. When a thread touches a robust mutex for the first time, a new umtx op syscall is issued which informs about location of lists heads. The umtx sleep queues for PP and PI mutexes are split between non-robust and robust. Somewhat unrelated changes in the patch: 1. Style. 2. The fix for proper tdfind() call use in umtxq_sleep_pi() for shared pi mutexes. 3. Removal of the userspace struct pthread_mutex m_owner field. 4. The sysctl kern.ipc.umtx_vnode_persistent is added, which controls the lifetime of the shared mutex associated with a vnode' page. Reviewed by: jilles (previous version, supposedly the objection was fixed) Discussed with: brooks, Martin Simmons <martin@lispworks.com> (some aspects) Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=300043
* Implement process-shared locks support for libthr.so.3, withoutKonstantin Belousov2016-02-281-0/+5
| | | | | | | | | | | | | | | breaking the ABI. Special value is stored in the lock pointer to indicate shared lock, and offline page in the shared memory is allocated to store the actual lock. Reviewed by: vangyzen (previous version) Discussed with: deischen, emaste, jhb, rwatson, Martin Simmons <martin@lispworks.com> Tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=296162
* A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().Gleb Smirnoff2015-12-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=292373
* As a step towards the elimination of PG_CACHED pages, rework the handlingMark Johnston2015-09-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to the head of the inactive queue instead of being cached. This affects the implementation of POSIX_FADV_NOREUSE as well, since it works by applying POSIX_FADV_DONTNEED to file ranges after they have been read or written. At that point the corresponding buffers may still be dirty, so the previous implementation would coalesce successive ranges and apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the dirty buffers would eventually be cached. To preserve this behaviour in an efficient manner, this change adds a new buf flag, B_NOREUSE, which causes the pages backing a VMIO buf to be placed at the head of the inactive queue when the buf is released. POSIX_FADV_NOREUSE then works by setting this flag in bufs that underlie the specified range. Reviewed by: alc, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3726 Notes: svn path=/head/; revision=288431
* Revert r173708's modifications to vm_object_page_remove().Konstantin Belousov2015-07-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Assume that a vnode is mapped shared and mlocked(), and then the vnode is truncated, or truncated and then again extended past the mapping point EOF. Truncation removes the pages past the truncation point, and if pages are later created at this range, they are not properly mapped into the mlocked region, and their wiring count is wrong. The revert leaves the invalidated but wired pages on the object queue, which means that the pages are found by vm_object_unwire() when the mapped range is munlock()ed, and reused by the buffer cache when the vnode is extended again. The changes in r173708 were required since then vm_map_unwire() looked at the page tables to find the page to unwire. This is no longer needed with the vm_object_unwire() introduction, which follows the objects shadow chain. Also eliminate OBJPR_NOTWIRED flag for vm_object_page_remove(), which is now redundand, we do not remove wired pages. Reported by: trasz, Dmitry Sivachenko <trtrmitya@gmail.com> Suggested and reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=285878
* Provide vnode in memory map info for files on tmpfsEric van Gyzen2015-06-021-0/+1
| | | | | | | | | | | | | | | | | | | When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431 MFC after: 2 weeks Reviewed by: jhb Approved by: kib (mentor) Notes: svn path=/head/; revision=283924
* Introduce vm_object_color() and use it in mmap(2) to set the color ofAlan Cox2015-03-211-0/+24
| | | | | | | | | | | | | | | | | | | named objects to zero before the virtual address is selected. Previously, the color setting was delayed until after the virtual address was selected. In rtld, this delay effectively prevented the mapping of a shared library's code section using superpages. Now, for example, we see the first 1 MB of libc's code on armv6 mapped by a superpage after we've gotten through the initial cold misses that bring the first 1 MB of code into memory. (With the page clustering that we perform on read faults, this happens quickly.) Differential Revision: https://reviews.freebsd.org/D2013 Reviewed by: jhb, kib Tested by: Svatopluk Kraus (armv6) MFC after: 6 weeks Notes: svn path=/head/; revision=280327
* Update mtime for tmpfs files modified through memory mapping. SimilarKonstantin Belousov2015-01-281-0/+1
| | | | | | | | | | | | | | | | | | | | to UFS, perform updates during syncer scans, which in particular means that tmpfs now performs scan on sync. Also, this means that a mtime update may be delayed up to 30 seconds after the write. The vm_object' OBJ_TMPFS_DIRTY flag for tmpfs swap object is similar to the OBJ_MIGHTBEDIRTY flag for the vnode object, it indicates that object could have been dirtied. Adapt fast page fault handler and vm_object_set_writeable_dirty() to handle OBJ_TMPFS_NODE same as OBJT_VNODE. Reported by: Ronald Klop <ronald-lists@klop.ws> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=277828
* Update a stale comment.Alan Cox2014-09-111-1/+1
| | | | Notes: svn path=/head/; revision=271417
* Add wrappers to assert that vm object is unlocked and for try upgrade.Konstantin Belousov2014-08-061-0/+4
| | | | | | | | | Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=269642
* When unwiring a region of an address space, do not assume that theAlan Cox2014-07-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | underlying physical pages are mapped by the pmap. If, for example, the application has performed an mprotect(..., PROT_NONE) on any part of the wired region, then those pages will no longer be mapped by the pmap. So, using the pmap to lookup the wired pages in order to unwire them doesn't always work, and when it doesn't work wired pages are leaked. To avoid the leak, introduce and use a new function vm_object_unwire() that locates the wired pages by traversing the object and its backing objects. At the same time, switch from using pmap_change_wiring() to the recently introduced function pmap_unwire() for unwiring the region's mappings. pmap_unwire() is faster, because it operates a range of virtual addresses rather than a single virtual page at a time. Moreover, by operating on a range, it is superpage friendly. It doesn't waste time performing unnecessary demotions. Reported by: markj Reviewed by: kib Tested by: pho, jmg (arm) Sponsored by: EMC / Isilon Storage Division Notes: svn path=/head/; revision=269134
* The OBJ_TMPFS flag of vm_object means that there is unreclaimed tmpfsKonstantin Belousov2014-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | vnode for the tmpfs node owning this object. The flag is currently used for two purposes. First, it allows to correctly handle VV_TEXT for tmpfs vnode when the ref count on the object is decremented to 1, similar to vnode_pager_dealloc() for regular filesystems. Second, it prevents some operations, which are done on OBJT_SWAP vm objects backing user anonymous memory, but are incorrect for the object owned by tmpfs node. The second kind of use of the OBJ_TMPFS flag is incorrect, since the vnode might be reclaimed, which clears the flag, but vm object operations must still be disallowed. Introduce one more flag, OBJ_TMPFS_NODE, which is permanently set on the object for VREG tmpfs node, and used instead of OBJ_TMPFS to test whether vm object collapse and similar actions should be disabled. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=268615
* On all the architectures, avoid to preallocate the physical memoryAttilio Rao2013-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | for nodes used in vm_radix. On architectures supporting direct mapping, also avoid to pre-allocate the KVA for such nodes. In order to do so make the operations derived from vm_radix_insert() to fail and handle all the deriving failure of those. vm_radix-wise introduce a new function called vm_radix_replace(), which can replace a leaf node, already present, with a new one, and take into account the possibility, during vm_radix_insert() allocation, that the operations on the radix trie can recurse. This means that if operations in vm_radix_insert() recursed vm_radix_insert() will start from scratch again. Sponsored by: EMC / Isilon storage division Reviewed by: alc (older version) Reviewed by: jeff Tested by: pho, scottl Notes: svn path=/head/; revision=254141
* Never remove user-wired pages from an object when doingKonstantin Belousov2013-07-111-0/+1
| | | | | | | | | | | | | | | | | | msync(MS_INVALIDATE). The vm_fault_copy_entry() requires that object range which corresponds to the user-wired vm_map_entry, is always fully populated. Add OBJPR_NOTWIRED flag for vm_object_page_remove() to request the preserving behaviour, use it when calling vm_object_page_remove() from vm_object_sync(). Reported and tested by: pho Reviewed by: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=253189
* o Relax locking assertions for vm_page_find_least()Attilio Rao2013-05-211-0/+2
| | | | | | | | | | | | | | | o Relax locking assertions for pmap_enter_object() and add them also to architectures that currently don't have any o Introduce VM_OBJECT_LOCK_DOWNGRADE() which is basically a downgrade operation on the per-object rwlock o Use all the mechanisms above to make vm_map_pmap_enter() to work mostl of the times only with readlocks. Sponsored by: EMC / Isilon storage division Reviewed by: alc Notes: svn path=/head/; revision=250884