aboutsummaryrefslogtreecommitdiff
path: root/sys/dev/md
Commit message (Collapse)AuthorAgeFilesLines
* MFC r362739:Mark Johnston2020-07-061-2/+2
| | | | | | | Remove some redundant assignments and computations. Notes: svn path=/stable/12/; revision=362960
* MFC r356315: Avoid duplicate I/O statistics accounting.Alexander Motin2020-01-171-2/+5
| | | | | | | | Alike to geom_disk free the provider statistics structure and point GEOM toward local statistics. It allows to save some CPU time. Notes: svn path=/stable/12/; revision=356823
* MFC of 345758Kirk McKusick2019-04-151-4/+20
| | | | | | | | | Properly flush outstanding I/Os when forcibly deleteing a memory disk device. Sponsored by: Netflix Notes: svn path=/stable/12/; revision=346221
* md: use prestaged mfs_rootBreno Leitao2018-06-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | On PowerNV systems, the rootfs is passed through kexec, which loads the rootfs into memory and set two fdt entries to describe where the file is located in the memory; I need to pass this memory region to the md device as a mfs_root, but, current md driver does not support two things: * Just getting a pointer from an external (bootloader) memory. If I need to workaround it, I would need to declare a static array and memcopy from this external memory to this static variable. * The size of the image. The usage of mfs_root_end, which is not a pointer, seems to be not possible for this prestaged scenario. This patch simply adds a new way to load mfs_root from memory. Differential Revision: https://reviews.freebsd.org/D15625 Approved by: kib, jhibbits (mentor) Notes: svn path=/head/; revision=334784
* Move most of the contents of opt_compat.h to opt_global.h.Brooks Davis2018-04-061-1/+0
| | | | | | | | | | | | | | | | | | | | | opt_compat.h is mentioned in nearly 180 files. In-progress network driver compabibility improvements may add over 100 more so this is closer to "just about everywhere" than "only some files" per the guidance in sys/conf/options. Keep COMPAT_LINUX32 in opt_compat.h as it is confined to a subset of sys/compat/linux/*.c. A fake _COMPAT_LINUX option ensure opt_compat.h is created on all architectures. Move COMPAT_LINUXKPI to opt_dontuse.h as it is only used to control the set of compiled files. Reviewed by: kib, cem, jhb, jtl Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14941 Notes: svn path=/head/; revision=332122
* Move 32-bit compat for md(4) ioctls into the md code.Brooks Davis2018-03-271-23/+107
| | | | | | | | | | | | | | | This is more correct in that ioctl commands have no meaning until they hit the handler associated with the file descriptor. Add support for MDIOCRESIZE_32 which was missed when it was added. Reviewed by: cem, kib, markj (various versions) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14714 Notes: svn path=/head/; revision=331623
* Move uio enums to sys/_uio.h.Brooks Davis2018-03-271-0/+1
| | | | | | | | | | | | | | | | | Include _uio.h instead of uio.h in several headers to reduce header polution. Fix a few places that relied on header polution to get the uio.h header. I have not moved struct uio as many more things that use it rely on header polution to get other definitions from uio.h. Reviewed by: cem, kib, markj Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14811 Notes: svn path=/head/; revision=331621
* Add a request structure and make the implementation use it.Brooks Davis2018-03-151-115/+157
| | | | | | | | | | | | | | | | | | | This allows compatibility translation to take place on the stack (md_ioctl is too big) and is more suitable as a public interface within the kernel than the kern_ioctl interface. Except for the initialization of the md_req from the md_ioctl (including detection of kernel md_file pointers) and the updating of the md_ioctl prior to return, this is a mechanical replacment of md_ioctl and mdio with md_req and mdr. Reviewed by: markj, cem, kib (assorted versions) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14704 Notes: svn path=/head/; revision=331030
* Move implementation of ioctls into kern_*() functions.Brooks Davis2018-03-151-148/+254
| | | | | | | | | | | | | | | | Move locks from outside ioctl to the individual implementations. This is the first step of changing the implementations to act on a kernel-internal request struct rather than on struct md_ioctl and to removing the use of kern_ioctl in mountroot. Reviewed by: cem, kib, markj (prior version) Obtained from: CheriBSD Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14700 Notes: svn path=/head/; revision=331014
* Restore the behavior of returning the total number of units byBrooks Davis2018-03-151-1/+2
| | | | | | | | | | | | unconditionally incrementing i in the loop; Reported by: cem MFC with: r330880 Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14685 Notes: svn path=/head/; revision=331008
* Don't overflow the kernel struct mdio in the MDIOCLIST ioctl.Brooks Davis2018-03-131-3/+14
| | | | | | | | | | | | | | | | | | | | | | Always terminate the list with -1 and document the ioctl behavior. This preserves existing behavior as seen from userspace with the addition of the unconditional termination which will not be seen by working consumers of MDIOCLIST. Because this ioctl can only be performed by root (in default configurations) and is not used in the base system this bug is not deemed to warrant either a security advisory or an eratta notice. Reviewed by: kib Obtained from: CheriBSD Discussed with: security-officer (gordon) MFC after: 3 days Security: kernel heap buffer overflow Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14685 Notes: svn path=/head/; revision=330880
* Fix backwards MD_VERIFY logic for md devices.Jonathan T. Looney2018-01-101-1/+1
| | | | | | | | | | | | If the MD_VERIFY flag is set, we should use O_VERIFY. If the MD_VERIFY flag is not set, we should not. Reviewed by: stevek Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D13814 Notes: svn path=/head/; revision=327754
* Add a new kernel config option, MD_ROOT_READONLY, which forces on theIan Lepore2017-12-201-2/+8
| | | | | | | | | | | | | | | | | MD_READONLY flag for the md device automatically instantiated during kernel init for an mdroot filesystem. Note that there is specifically and by design no tunable or sysctl control over this feature. Without this option, you already have control over whether the mdroot fs is writeable using vfs.root.mountfrom.options from loader(8), the root_rw_mount rcvar, and by using "mount -u[rw] /" or equivelent on the fly. This option is being added to provide a way to make the mdroot fs truly immutable before userland code begins running. Differential Revision: https://reviews.freebsd.org/D13411 Notes: svn path=/head/; revision=327032
* SPDX: use the Beerware identifier.Pedro F. Giffuni2017-11-301-2/+2
| | | | Notes: svn path=/head/; revision=326408
* sys/dev: further adoption of SPDX licensing ID tags.Pedro F. Giffuni2017-11-201-0/+2
| | | | | | | | | | | | | | | | | Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point. Notes: svn path=/head/; revision=326022
* Make md(4) support GEOM::ident for vnode-backed disks. It's basedEdward Tomasz Napierala2017-10-041-0/+11
| | | | | | | | | | | | | on backing file device and inode numbers. This is useful for gmountver(8) regression tests. MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D12230 Notes: svn path=/head/; revision=324275
* When mdstart_swap() accesses a page that is already in the active queue,Alan Cox2017-10-021-1/+4
| | | | | | | | | | | | | mark the page as referenced rather than calling vm_page_activate(). This allows the page's act_count to grow beyond ACT_INIT and better reflect its usage. (See also r324146, which modified a function used by tmpfs, uiomove_object_page(), to behave in the same way.) Reviewed by: kib, markj MFC after: 2 weeks Notes: svn path=/head/; revision=324189
* Add ability to label md(4) devices.Maxim Sobolev2017-08-281-0/+16
| | | | | | | | | | | | | | | This feature comes from the fact that we rely memory-backed md(4) in our build process heavily. However, if the build goes haywire the allocated resources (i.e. swap and memory-backed md(4)'s) need to be purged. It is extremely useful to have ability to attach arbitrary labels to each of the virtual disks so that they can be identified and GC'ed if neecessary. MFC after: 4 weeks Differential Revision: https://reviews.freebsd.org/D10457 Notes: svn path=/head/; revision=322969
* Don't call vm_pager_page_unswapped() when writing or deleting a dirty page.Mark Johnston2017-06-141-6/+10
| | | | | | | | | | | | The swap space backing a clean page is released when it is first dirtied, so there's no need to attempt to release swap space when the page is already dirty. Reviewed by: alc MFC after: 1 week Notes: svn path=/head/; revision=319934
* Free the request page if an I/O error occurs while reading from swap.Mark Johnston2017-06-141-3/+3
| | | | | | | | | | | | | | After such a failure, the page is invalid, so there's point in keeping it around. Moreover, such pages were not being inserted into the active queue, making them unreclaimable until a subsequent write or delete made them valid. Reported by: alc Reviewed by: alc (previous revision) MFC after: 1 week Notes: svn path=/head/; revision=319933
* Fix handling of subpage BIO_WRITE and BIO_DELETE requests on swap MDs.Mark Johnston2017-06-141-22/+40
| | | | | | | | | | | | | | Such requests would previously mark the entire page as valid, which was incorrect since nothing guaranteed that the page's contents had been initialized. This change also modifies subpage BIO_DELETEs so that the entire page is marked dirty, rather than only a subrange. There is no benefit to creating partially dirty swap pages. Reviewed by: alc, kib (previous version) MFC after: 3 days Notes: svn path=/head/; revision=319932
* Add MD_VERIFY option to enable O_VERIFY in open for vnode type.Stephen J. Kiernan2017-05-311-2/+12
| | | | | | | | | | | | | | | | Add -o [no]verify option to mdconfig (and document in man page.) Implement GEOM attribute MNT::verified to ask md if the backing vnode is verified. Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if the underlying device has been verified. Reviewed by: rwatson Approved by: sjg (mentor) Obtained from: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D2902 Notes: svn path=/head/; revision=319358
* Renumber copyright clause 4Warner Losh2017-02-281-1/+1
| | | | | | | | | | | | Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96 Notes: svn path=/head/; revision=314436
* sys/dev: Replace zero with NULL for pointers.Pedro F. Giffuni2017-02-201-1/+1
| | | | | | | | | | | Makes things easier to read, plus architectures may set NULL to something different than zero. Found with: devel/coccinelle MFC after: 3 weeks Notes: svn path=/head/; revision=313982
* Fix typo where opening brace was needed.Stephen J. Kiernan2017-02-131-1/+1
| | | | | | | | | Reported by: Michael Butler Reviewed by: sjg Approved by: sjg (mentor) Notes: svn path=/head/; revision=313703
* For MD_PRELOAD type md(4) devices, if there is a file name in the preloadedStephen J. Kiernan2017-02-131-3/+8
| | | | | | | | | | | | | | | | | | | meta-data, copy it into the softc structure. When returning md(4) device details to the caller, include the file name in any MD_PRELOAD type devices if it is set (first character is not NUL.) In mdconfig, for "preload" type md(4) devices, if there is file config available, print it in the file column of the output. Reviewed by: brooks Approved by: sjg (mentor) MFC after: 1 month Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D9529 Notes: svn path=/head/; revision=313701
* For the MD_ROOT option don't inject /dev/md0 as root dev when ROOTDEVNAMEMaxim Sobolev2016-03-091-1/+2
| | | | | | | | | | | | | | | is defined explicitly. It's kinda pointless and results in extra step in boot sequence which is not really needed, i.e.: md0: Embedded image 1331200 bytes at 0x8038b7b4 Trying to mount root from ufs:/dev/md0 []... Mounting from ufs:/dev/md0 failed with error 22. Trying to mount root from ufs:md0.uzip []... warning: no time-of-day clock registered, system time will not be set accurately start_init: trying /sbin/init Notes: svn path=/head/; revision=296574
* Fix MFS builds when both MD_ROOT_SIZE and MFS_IMAGE are specifiedAdrian Chadd2016-02-021-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | MD_ROOT_SIZE and embed_mfs.sh were basically retired as part of https://reviews.freebsd.org/D2903 . However, when building a kernel with 'options MD_ROOT_SIZE' specified, this results in a non-working MFS, as within sys/dev/md/md.c we fall within the wrong # ifdef. This patch implements the following: * Allow kernels to be built without the MD_ROOT_SIZE option, which results in a kernel built as per D2903. * Allow kernels to be built with the MD_ROOT_SIZE option, which results in a kernel built similarly to the pre-D2903 way, with the following differences: * The MFS is now put in a separate section within the kernel (oldmfs, so it differs from the mfs section introduced by D2903). * embed_mfs.sh is changed, so it looks up the oldmfs section within the kernel, gets its size and offset, sees if the MFS will fit within the allocated oldmfs section and only if all is well does a dd of the MFS image into the kernel. Submitted by: Stanislav Galabov <sgalabov@gmail.com> Reviewed by: brooks, imp Differential Revision: https://reviews.freebsd.org/D5093 Notes: svn path=/head/; revision=295137
* A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().Gleb Smirnoff2015-12-161-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix Notes: svn path=/head/; revision=292373
* In md(4) over vnode, correct handling of the unaligned unmapped ioKonstantin Belousov2015-12-121-13/+27
| | | | | | | | | | | | | | | | | | | requests which page alignment + size is greater than MAXPHYS. Right now md(4) over vnode would use the physical buffer of the size MAXPHYS to map a data of size MAXPHYS + page offset of the user buffer. This typically corrupts next pbuf, or, if the pbuf used was the last pbuf in the map, the next page after the pbuf's map. Split request up to the size of io which fits into pbuf KVA with alignment, and retry if a part of the bio is left unprocessed. Reported by: Fabian Keil <fk@fabiankeil.de> Tested by: Fabian Keil, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Notes: svn path=/head/; revision=292128
* Add asynchronous command support to the pass(4) driver, and the newKenneth D. Merry2015-12-031-71/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | camdd(8) utility. CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and completed CCBs may be retrieved via the CAMIOGET ioctl. User processes can use poll(2) or kevent(2) to get notification when I/O has completed. While the existing CAMIOCOMMAND blocking ioctl interface only supports user virtual data pointers in a CCB (generally only one per CCB), the new CAMIOQUEUE ioctl supports user virtual and physical address pointers, as well as user virtual and physical scatter/gather lists. This allows user applications to have more flexibility in their data handling operations. Kernel memory for data transferred via the queued interface is allocated from the zone allocator in MAXPHYS sized chunks, and user data is copied in and out. This is likely faster than the vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in configurations with many processors (there are more TLB shootdowns caused by the mapping/unmapping operation) but may not be as fast as running with unmapped I/O. The new memory handling model for user requests also allows applications to send CCBs with request sizes that are larger than MAXPHYS. The pass(4) driver now limits queued requests to the I/O size listed by the SIM driver in the maxio field in the Path Inquiry (XPT_PATH_INQ) CCB. There are some things things would be good to add: 1. Come up with a way to do unmapped I/O on multiple buffers. Currently the unmapped I/O interface operates on a struct bio, which includes only one address and length. It would be nice to be able to send an unmapped scatter/gather list down to busdma. This would allow eliminating the copy we currently do for data. 2. Add an ioctl to list currently outstanding CCBs in the various queues. 3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do that. 4. Test physical address support. Virtual pointers and scatter gather lists have been tested, but I have not yet tested physical addresses or scatter/gather lists. 5. Investigate multiple queue support. At the moment there is one queue of commands per pass(4) device. If multiple processes open the device, they will submit I/O into the same queue and get events for the same completions. This is probably the right model for most applications, but it is something that could be changed later on. Also, add a new utility, camdd(8) that uses the asynchronous pass(4) driver interface. This utility is intended to be a basic data transfer/copy utility, a simple benchmark utility, and an example of how to use the asynchronous pass(4) interface. It can copy data to and from pass(4) devices using any target queue depth, starting offset and blocksize for the input and ouptut devices. It currently only supports SCSI devices, but could be easily extended to support ATA devices. It can also copy data to and from regular files, block devices, tape devices, pipes, stdin, and stdout. It does not support queueing multiple commands to any of those targets, since it uses the standard read(2)/write(2)/writev(2)/readv(2) system calls. The I/O is done by two threads, one for the reader and one for the writer. The reader thread sends completed read requests to the writer thread in strictly sequential order, even if they complete out of order. That could be modified later on for random I/O patterns or slightly out of order I/O. camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from the pass(4) driver and also to send request notifications internally. For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR) per CAM CCB on the reading side, and a scatter/gather list (CAM_DATA_SG) on the writing side. In addition to testing both interfaces, this makes any potential reblocking of I/O easier. No data is copied between the reader and the writer, but rather the reader's buffers are split into multiple I/O requests or combined into a single I/O request depending on the input and output blocksize. For the file I/O path, camdd(8) also uses a single buffer (read(2), write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list (readv(2), writev(2), preadv(2), pwritev(2)) on writes. Things that would be nice to do for camdd(8) eventually: 1. Add support for I/O pattern generation. Patterns like all zeros, all ones, LBA-based patterns, random patterns, etc. Right Now you can always use /dev/zero, /dev/random, etc. 2. Add support for a "sink" mode, so we do only reads with no writes. Right now, you can use /dev/null. 3. Add support for automatic queue depth probing, so that we can figure out the right queue depth on the input and output side for maximum throughput. At the moment it defaults to 6. 4. Add support for SATA device passthrough I/O. 5. Add support for random LBAs and/or lengths on the input and output sides. 6. Track average per-I/O latency and busy time. The busy time and latency could also feed in to the automatic queue depth determination. sys/cam/scsi/scsi_pass.h: Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue and fetch asynchronous CAM CCBs respectively. Although these ioctls do not have a declared argument, they both take a union ccb pointer. If we declare a size here, the ioctl code in sys/kern/sys_generic.c will malloc and free a buffer for either the CCB or the CCB pointer (depending on how it is declared). Since we have to keep a copy of the CCB (which is fairly large) anyway, having the ioctl malloc and free a CCB for each call is wasteful. sys/cam/scsi/scsi_pass.c: Add asynchronous CCB support. Add two new ioctls, CAMIOQUEUE and CAMIOGET. CAMIOQUEUE adds a CCB to the incoming queue. The CCB is executed immediately (and moved to the active queue) if it is an immediate CCB, but otherwise it will be executed in passstart() when a CCB is available from the transport layer. When CCBs are completed (because they are immediate or passdone() if they are queued), they are put on the done queue. If we get the final close on the device before all pending I/O is complete, all active I/O is moved to the abandoned queue and we increment the peripheral reference count so that the peripheral driver instance doesn't go away before all pending I/O is done. The new passcreatezone() function is called on the first call to the CAMIOQUEUE ioctl on a given device to allocate the UMA zones for I/O requests and S/G list buffers. This may be good to move off to a taskqueue at some point. The new passmemsetup() function allocates memory and scatter/gather lists to hold the user's data, and copies in any data that needs to be written. For virtual pointers (CAM_DATA_VADDR), the kernel buffer is malloced from the new pass(4) driver malloc bucket. For virtual scatter/gather lists (CAM_DATA_SG), buffers are allocated from a new per-pass(9) UMA zone in MAXPHYS-sized chunks. Physical pointers are passed in unchanged. We have support for up to 16 scatter/gather segments (for the user and kernel S/G lists) in the default struct pass_io_req, so requests with longer S/G lists require an extra kernel malloc. The new passcopysglist() function copies a user scatter/gather list to a kernel scatter/gather list. The number of elements in each list may be different, but (obviously) the amount of data stored has to be identical. The new passmemdone() function copies data out for the CAM_DATA_VADDR and CAM_DATA_SG cases. The new passiocleanup() function restores data pointers in user CCBs and frees memory. Add new functions to support kqueue(2)/kevent(2): passreadfilt() tells kevent whether or not the done queue is empty. passkqfilter() adds a knote to our list. passreadfiltdetach() removes a knote from our list. Add a new function, passpoll(), for poll(2)/select(2) to use. Add devstat(9) support for the queued CCB path. sys/cam/ata/ata_da.c: Add support for the BIO_VLIST bio type. sys/cam/cam_ccb.h: Add a new enumeration for the xflags field in the CCB header. (This doesn't change the CCB header, just adds an enumeration to use.) sys/cam/cam_xpt.c: Add a new function, xpt_setup_ccb_flags(), that allows specifying CCB flags. sys/cam/cam_xpt.h: Add a prototype for xpt_setup_ccb_flags(). sys/cam/scsi/scsi_da.c: Add support for BIO_VLIST. sys/dev/md/md.c: Add BIO_VLIST support to md(4). sys/geom/geom_disk.c: Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size limiting code in g_disk_start() a bit. sys/kern/subr_bus_dma.c: Change _bus_dmamap_load_vlist() to take a starting offset and length. Add a new function, _bus_dmamap_load_pages(), that will load a list of physical pages starting at an offset. Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios. Allow unmapped I/O to start at an offset. sys/kern/subr_uio.c: Add two new functions, physcopyin_vlist() and physcopyout_vlist(). sys/pc98/include/bus.h: Guard kernel-only parts of the pc98 machine/bus.h header with #ifdef _KERNEL. This allows userland programs to include <machine/bus.h> to get the definition of bus_addr_t and bus_size_t. sys/sys/bio.h: Add a new bio flag, BIO_VLIST. sys/sys/uio.h: Add prototypes for physcopyin_vlist() and physcopyout_vlist(). share/man/man4/pass.4: Document the CAMIOQUEUE and CAMIOGET ioctls. usr.sbin/Makefile: Add camdd. usr.sbin/camdd/Makefile: Add a makefile for camdd(8). usr.sbin/camdd/camdd.8: Man page for camdd(8). usr.sbin/camdd/camdd.c: The new camdd(8) utility. Sponsored by: Spectra Logic MFC after: 1 week Notes: svn path=/head/; revision=291716
* s/as/at/ in previous commit.Marcel Moolenaar2015-08-131-1/+1
| | | | | | | Pointed out by: jmallett@ Notes: svn path=/head/; revision=286741
* Change md(4) to use weak symbols as start, end and size for the embeddedMarcel Moolenaar2015-08-131-5/+22
| | | | | | | | | | | | | | | | | root disk. The embedded image is linked into the kernel in the .mfs section. Add rules and variables to kern.pre.mk and kern.post.mk that handle the linking of the image. First objcopy is used to generate an object file. Then, the object file is linked into the kernel. Submitted by: Steve Kiernan <stevek@juniper.net> Reviewed by: brooks@ Obtained from: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D2903 Notes: svn path=/head/; revision=286727
* Use g_conf_printf_escaped() to escape illegal symbols in file name.Andrey V. Elsukov2015-08-131-3/+6
| | | | | | | | PR: 202289 MFC after: 1 week Notes: svn path=/head/; revision=286720
* For md(4), posix shm(3) and tmpfs(5), free swap space used by paged inKonstantin Belousov2014-07-281-1/+3
| | | | | | | | | | | | dirty page, which is written by the process. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=269190
* vm_page_grab() and vm_pager_get_pages() can drop the vm_object lock,Attilio Rao2014-03-191-1/+1
| | | | | | | | | | | | | | then threads can sleep on the pip condition. Avoid to deadlock such threads by correctly awakening the sleeping ones after the pip is finished. swapoff side of the bug can likely result in shutdown deadlocks. Sponsored by: EMC / Isilon Storage Division Reported by: pho, pluknet Tested by: pho Notes: svn path=/head/; revision=263328
* Only assert the length of the passed bio in the mdstart_vnode() whenKonstantin Belousov2013-12-101-2/+2
| | | | | | | | | | | | | | the bio is unmapped, so we must map the bio pages into pbuf. This works around the geom classes which do not follow the MAXPHYS limit on the i/o size, since such classes do not know about unmapped bios either. Reported by: Paolo Pinto <paolo.pinto@netasq.com> Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=259198
* Change comment to match code.Edward Tomasz Napierala2013-12-041-4/+4
| | | | | | | | Discussed with: thompsa Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=258916
* Add "null" backend to mdconfig(8). This does exactly what the nameEdward Tomasz Napierala2013-12-041-0/+39
| | | | | | | | | | | suggests, and is somewhat useful for benchmarking. MFC after: 1 month No objections from: kib Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=258909
* Merge GEOM direct dispatch changes from the projects/camlock branch.Alexander Motin2013-10-221-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O. The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%. To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before. Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc). To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work. This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads). Sponsored by: iXsystems, Inc. MFC after: 2 months Notes: svn path=/head/; revision=256880
* Give the page allocations initiated by the swap-backed md(4) a higherKonstantin Belousov2013-08-301-1/+1
| | | | | | | | | | | priority. If the write is requested by a system daemon, sleeping there would starve resources and cause deadlock. Reported and tested by: pho Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=255080
* Remove the deprecated VM_ALLOC_RETRY flag for the vm_page_grab(9).Konstantin Belousov2013-08-221-2/+1
| | | | | | | | | | | The flag was mandatory since r209792, where vm_page_grab(9) was changed to only support the alloc retry semantic. Suggested and reviewed by: alc Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=254649
* The soft and hard busy mechanism rely on the vm object lock to work.Attilio Rao2013-08-091-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl Notes: svn path=/head/; revision=254138
* Fix the data corruption on the swap-backed md.Konstantin Belousov2013-05-241-1/+7
| | | | | | | | | | | | | | Assign the rv variable a success code if the pager was not asked for the page. Using an error code from the previous processed page caused zeroing of the valid page, when e.g. the previous page was not available in the pager. Reported by: lstewart Sponsored by: The FreeBSD Foundation MFC after: 1 week Notes: svn path=/head/; revision=250966
* Do not declare that preloaded md(4) supports unmapped bio requests, itKonstantin Belousov2013-04-021-1/+9
| | | | | | | | | | does not. Reported by: <mh@kernel32.de> Sponsored by: The FreeBSD Foundation Notes: svn path=/head/; revision=249032
* Support unmapped i/o for the md(4).Konstantin Belousov2013-03-191-40/+204
| | | | | | | | | | | | | The vnode-backed md(4) has to map the unmapped bio because VOP_READ() and VOP_WRITE() interfaces do not allow to pass unmapped requests to the filesystem. Vnode-backed md(4) uses pbufs instead of relying on the bio_transient_map, to avoid usual md deadlock. Sponsored by: The FreeBSD Foundation Tested by: pho, scottl Notes: svn path=/head/; revision=248518
* Switch the vm_object mutex to be a rwlock. This will enable in theAttilio Rao2013-03-091-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho Notes: svn path=/head/; revision=248084
* Print correct unit number when attaching preloaded memory disks.Jaakko Heinonen2012-11-211-6/+7
| | | | | | | Retire now unused mdunits variable. Notes: svn path=/head/; revision=243373
* Disallow attaching preloaded memory disks via ioctl.Jaakko Heinonen2012-11-211-23/+6
| | | | | | | | | | | | | | | - The feature is dangerous because the kernel code didn't check validity of the memory address provided from user space. - It seems that mdconfig(8) never really supported attaching preloaded memory disks. - Preloaded memory disks are automatically attached during md(4) initialization. Thus there shouldn't be much use for the feature. PR: kern/169683 Discussed on: freebsd-hackers Notes: svn path=/head/; revision=243372
* Zero the newly allocated md(4) swap-backed page to prevent randomKonstantin Belousov2012-11-081-0/+9
| | | | | | | | | | | | | | kernel memory leakage to userspace. For the typical use, when a filesystem put on the md disk, the change only results in CPU and memory bandwidth spent to zero the page, since filsystems make sure that user never see unwritten content. But if md disk is used as raw device by userspace, the garbage is exposed. Reported by: Paul Schenkeveld <freebsd@psconsult.nl> MFC after: 2 weeks Notes: svn path=/head/; revision=242744