docs/BLOCK_DEVICE_LOCKING.md - systemd-stable - Rivoreo Source Code Repositories

 ---
 title: Locking Block Device Access
 category: Interfaces
 layout: default
 ---

 # Locking Block Device Access

 *TL;DR: Use BSD file locks
 [(`flock(2)`)](http://man7.org/linux/man-pages/man2/flock.2.html) on block
 device nodes to synchronize access for partitioning and file system formatting
 tools.*

 `systemd-udevd` probes all block devices showing up for file system superblock
 and partition table information (utilizing `libblkid`). If another program
 concurrently modifies a superblock or partition table this probing might be
 affected, which is bad in itself, but also might in turn result in undesired
 effects in programs subscribing to `udev` events.

 Applications manipulating a block device can temporarily stop `systemd-udevd`
 from processing rules on it — and thus bar it from probing the device — by
 taking a BSD file lock on the block device node. Specifically, whenever
 `systemd-udevd` starts processing a block device it takes a `LOCK_SH|LOCK_NB`
 lock using [`flock(2)`](http://man7.org/linux/man-pages/man2/flock.2.html) on
 the main block device (i.e. never on any partition block device, but on the
 device the partition belongs to). If this lock cannot be taken (i.e. `flock()`
 returns `EBUSY`), it refrains from processing the device. If it manages to take
 the lock it is kept for the entire time the device is processed.

 Note that `systemd-udevd` also watches all block device nodes it manages for
 `inotify()` `IN_CLOSE_WRITE` events: whenever such an event is seen, this is
 used as trigger to re-run the rule-set for the device.

 These two concepts allow tools such as disk partitioners or file system
 formatting tools to safely and easily take exclusive ownership of a block
 device while operating: before starting work on the block device, they should
 take an `LOCK_EX` lock on it. This has two effects: first of all, in case
 `systemd-udevd` is still processing the device the tool will wait for it to
 finish. Second, after the lock is taken, it can be sure that `systemd-udevd`
 will refrain from processing the block device, and thus all other client
 applications subscribed to it won't get device notifications from potentially
 half-written data either. After the operation is complete the
 partitioner/formatter can simply close the device node. This has two effects:
 it implicitly releases the lock, so that `systemd-udevd` can process events on
 the device node again. Secondly, it results an `IN_CLOSE_WRITE` event, which
 causes `systemd-udevd` to immediately re-process the device — seeing all
 changes the tool made — and notify subscribed clients about it.

 Ideally, `systemd-udevd` would explicitly watch block devices for `LOCK_EX`
 locks being released. Such monitoring is not supported on Linux however, which
 is why it watches for `IN_CLOSE_WRITE` instead, i.e. for `close()` calls to
 writable file descriptors referring to the block device. In almost all cases,
 the difference between these two events does not matter much, as any locks
 taken are implicitly released by `close()`. However, it should be noted that if
 an application unlocks a device after completing its work without closing it,
 i.e. while keeping the file descriptor open for further, longer time, then
 `systemd-udevd` will not notice this and not retrigger and thus reprobe the
 device.

 Besides synchronizing block device access between `systemd-udevd` and such
 tools this scheme may also be used to synchronize access between those tools
 themselves. However, do note that `flock()` locks are advisory only. This means
 if one tool honours this scheme and another tool does not, they will of course
 not be synchronized properly, and might interfere with each other's work.

 Note that the file locks follow the usual access semantics of BSD locks: since
 `systemd-udevd` never writes to such block devices it only takes a `LOCK_SH`
 *shared* lock. A program intending to make changes to the block device should
 take a `LOCK_EX` *exclusive* lock instead. For further details, see the
 `flock(2)` man page.

 And please keep in mind: BSD file locks (`flock()`) and POSIX file locks
 (`lockf()`, `F_SETLK`, …) are different concepts, and in their effect
 orthogonal. The scheme discussed above uses the former and not the latter,
 because these types of locks more closely match the required semantics.

 Summarizing: it is recommended to take `LOCK_EX` BSD file locks when
 manipulating block devices in all tools that change file system block devices
 (`mkfs`, `fsck`, …) or partition tables (`fdisk`, `parted`, …), right after
 opening the node.
	---
	title: Locking Block Device Access
	category: Interfaces
	layout: default
	---

	# Locking Block Device Access

	*TL;DR: Use BSD file locks
	[(`flock(2)`)](http://man7.org/linux/man-pages/man2/flock.2.html) on block
	device nodes to synchronize access for partitioning and file system formatting
	tools.*

	`systemd-udevd` probes all block devices showing up for file system superblock
	and partition table information (utilizing `libblkid`). If another program
	concurrently modifies a superblock or partition table this probing might be
	affected, which is bad in itself, but also might in turn result in undesired
	effects in programs subscribing to `udev` events.

	Applications manipulating a block device can temporarily stop `systemd-udevd`
	from processing rules on it — and thus bar it from probing the device — by
	taking a BSD file lock on the block device node. Specifically, whenever
	`systemd-udevd` starts processing a block device it takes a `LOCK_SH\|LOCK_NB`
	lock using [`flock(2)`](http://man7.org/linux/man-pages/man2/flock.2.html) on
	the main block device (i.e. never on any partition block device, but on the
	device the partition belongs to). If this lock cannot be taken (i.e. `flock()`
	returns `EBUSY`), it refrains from processing the device. If it manages to take
	the lock it is kept for the entire time the device is processed.

	Note that `systemd-udevd` also watches all block device nodes it manages for
	`inotify()` `IN_CLOSE_WRITE` events: whenever such an event is seen, this is
	used as trigger to re-run the rule-set for the device.

	These two concepts allow tools such as disk partitioners or file system
	formatting tools to safely and easily take exclusive ownership of a block
	device while operating: before starting work on the block device, they should
	take an `LOCK_EX` lock on it. This has two effects: first of all, in case
	`systemd-udevd` is still processing the device the tool will wait for it to
	finish. Second, after the lock is taken, it can be sure that `systemd-udevd`
	will refrain from processing the block device, and thus all other client
	applications subscribed to it won't get device notifications from potentially
	half-written data either. After the operation is complete the
	partitioner/formatter can simply close the device node. This has two effects:
	it implicitly releases the lock, so that `systemd-udevd` can process events on
	the device node again. Secondly, it results an `IN_CLOSE_WRITE` event, which
	causes `systemd-udevd` to immediately re-process the device — seeing all
	changes the tool made — and notify subscribed clients about it.

	Ideally, `systemd-udevd` would explicitly watch block devices for `LOCK_EX`
	locks being released. Such monitoring is not supported on Linux however, which
	is why it watches for `IN_CLOSE_WRITE` instead, i.e. for `close()` calls to
	writable file descriptors referring to the block device. In almost all cases,
	the difference between these two events does not matter much, as any locks
	taken are implicitly released by `close()`. However, it should be noted that if
	an application unlocks a device after completing its work without closing it,
	i.e. while keeping the file descriptor open for further, longer time, then
	`systemd-udevd` will not notice this and not retrigger and thus reprobe the
	device.

	Besides synchronizing block device access between `systemd-udevd` and such
	tools this scheme may also be used to synchronize access between those tools
	themselves. However, do note that `flock()` locks are advisory only. This means
	if one tool honours this scheme and another tool does not, they will of course
	not be synchronized properly, and might interfere with each other's work.

	Note that the file locks follow the usual access semantics of BSD locks: since
	`systemd-udevd` never writes to such block devices it only takes a `LOCK_SH`
	shared lock. A program intending to make changes to the block device should
	take a `LOCK_EX` exclusive lock instead. For further details, see the
	`flock(2)` man page.

	And please keep in mind: BSD file locks (`flock()`) and POSIX file locks
	(`lockf()`, `F_SETLK`, …) are different concepts, and in their effect
	orthogonal. The scheme discussed above uses the former and not the latter,
	because these types of locks more closely match the required semantics.

	Summarizing: it is recommended to take `LOCK_EX` BSD file locks when
	manipulating block devices in all tools that change file system block devices
	(`mkfs`, `fsck`, …) or partition tables (`fdisk`, `parted`, …), right after
	opening the node.