| --- |
| title: Random Seeds |
| --- |
| |
| # Random Seeds |
| |
| systemd can help in a number of ways with providing reliable, high quality |
| random numbers from early boot on. |
| |
| ## Linux Kernel Entropy Pool |
| |
| Today's computer systems require random number generators for numerous |
| cryptographic and other purposes. On Linux systems, the kernel's entropy pool |
| is typically used as high-quality source of random numbers. The kernel's |
| entropy pool combines various entropy inputs together, mixes them and provides |
| an API to userspace as well as to internal kernel subsystems to retrieve |
| it. This entropy pool needs to be initialized with a minimal level of entropy |
| before it can provide high quality, cryptographic random numbers to |
| applications. Until the entropy pool is fully initialized application requests |
| for high-quality random numbers cannot be fulfilled. |
| |
| The Linux kernel provides three relevant userspace APIs to request random data |
| from the kernel's entropy pool: |
| |
| * The [`getrandom()`](http://man7.org/linux/man-pages/man2/getrandom.2.html) |
| system call with its `flags` parameter set to 0. If invoked the calling |
| program will synchronously block until the random pool is fully initialized |
| and the requested bytes can be provided. |
| |
| * The `getrandom()` system call with its `flags` parameter set to |
| `GRND_NONBLOCK`. If invoked the request for random bytes will fail if the |
| pool is not initialized yet. |
| |
| * Reading from the |
| [`/dev/urandom`](http://man7.org/linux/man-pages/man4/urandom.4.html) |
| pseudo-device will always return random bytes immediately, even if the pool |
| is not initialized. The provided random bytes will be of low quality in this |
| case however. Moreover the kernel will log about all programs using this |
| interface in this state, and which thus potentially rely on an uninitialized |
| entropy pool. |
| |
| (Strictly speaking there are more APIs, for example `/dev/random`, but these |
| should not be used by almost any application and hence aren't mentioned here.) |
| |
| Note that the time it takes to initialize the random pool may differ between |
| systems. If local hardware random number generators are available, |
| initialization is likely quick, but particularly in embedded and virtualized |
| environments available entropy is small and thus random pool initialization |
| might take a long time (up to tens of minutes!). |
| |
| Modern hardware tends to come with a number of hardware random number |
| generators (hwrng), that may be used to relatively quickly fill up the entropy |
| pool. Specifically: |
| |
| * All recent Intel and AMD CPUs provide the CPU opcode |
| [RDRAND](https://en.wikipedia.org/wiki/RdRand) to acquire random bytes. Linux |
| includes random bytes generated this way in its entropy pool, but didn't use |
| to credit entropy for it (i.e. data from this source wasn't considered good |
| enough to consider the entropy pool properly filled even though it was |
| used). This has changed recently however, and most big distributions have |
| turned on the `CONFIG_RANDOM_TRUST_CPU=y` kernel compile time option. This |
| means systems with CPUs supporting this opcode will be able to very quickly |
| reach the "pool filled" state. |
| |
| * The TPM security chip that is available on all modern desktop systems has a |
| hwrng. It is also fed into the entropy pool, but generally not credited |
| entropy. You may use `rng_core.default_quality=1000` on the kernel command |
| line to change that, but note that this is a global setting affect all |
| hwrngs. (Yeah, that's weird.) |
| |
| * Many Intel and AMD chipsets have hwrng chips. Their Linux drivers usually |
| don't credit entropy. (But there's `rng_core.default_quality=1000`, see |
| above.) |
| |
| * Various embedded boards have hwrng chips. Some drivers automatically credit |
| entropy, others do not. Some WiFi chips appear to have hwrng sources too, and |
| they usually do not credit entropy for them. |
| |
| * `virtio-rng` is used in virtualized environments and retrieves random data |
| from the VM host. It credits full entropy. |
| |
| * The EFI firmware typically provides a RNG API. When transitioning from UEFI |
| to kernel mode Linux will query some random data through it, and feed it into |
| the pool, but not credit entropy to it. What kind of random source is behind |
| the EFI RNG API is often not entirely clear, but it hopefully is some kind of |
| hardware source. |
| |
| If neither of these are available (in fact, even if they are), Linux generates |
| entropy from various non-hwrng sources in various subsystems, all of which |
| ultimately are rooted in IRQ noise, a very "slow" source of entropy, in |
| particular in virtualized environments. |
| |
| ## `systemd`'s Use of Random Numbers |
| |
| systemd is responsible for bringing up the OS. It generally runs as the first |
| userspace process the kernel invokes. Because of that it runs at a time where |
| the entropy pool is typically not yet initialized, and thus requests to acquire |
| random bytes will either be delayed, will fail or result in a noisy kernel log |
| message (see above). |
| |
| Various other components run during early boot that require random bytes. For |
| example, initial RAM disks nowadays communicate with encrypted networks or |
| access encrypted storage which might need random numbers. systemd itself |
| requires random numbers as well, including for the following uses: |
| |
| * systemd assigns 'invocation' UUIDs to all services it invokes that uniquely |
| identify each invocation. This is useful retain a global handle on a specific |
| service invocation and relate it to other data. For example, log data |
| collected by the journal usually includes the invocation UUID and thus the |
| runtime context the service manager maintains can be neatly matched up with |
| the log data a specific service invocation generated. systemd also |
| initializes `/etc/machine-id` with a randomized UUID. (systemd also makes use |
| of the randomized "boot id" the kernel exposes in |
| `/proc/sys/kernel/random/boot_id`). These UUIDs are exclusively Type 4 UUIDs, |
| i.e. randomly generated ones. |
| |
| * systemd maintains various hash tables internally. In order to harden them |
| against [collision |
| attacks](https://rt.perl.org/Public/Bug/Display.html?CSRF_Token=165691af9ddaa95f653402f1b68de728) |
| they are seeded with random numbers. |
| |
| * At various places systemd needs random bytes for temporary file name |
| generation, UID allocation randomization, and similar. |
| |
| * systemd-resolved and systemd-networkd use random number generators to harden |
| the protocols they implement against packet forgery. |
| |
| * systemd-udevd and systemd-nspawn can generate randomized MAC addresses for |
| network devices. |
| |
| Note that these cases generally do not require a cryptographic-grade random |
| number generator, as most of these utilize random numbers to minimize risk of |
| collision and not to generate secret key material. However, they usually do |
| require "medium-grade" random data. For example: systemd's hash-maps are |
| reseeded if they grow beyond certain thresholds (and thus collisions are more |
| likely). This means they are generally fine with low-quality (even constant) |
| random numbers initially as long as they get better with time, so that |
| collision attacks are eventually thwarted as better, non-guessable seeds are |
| acquired. |
| |
| ## Keeping `systemd'`s Demand on the Kernel Entropy Pool Minimal |
| |
| Since most of systemd's own use of random numbers do not require |
| cryptographic-grade RNGs, it tries to avoid reading entropy from the kernel |
| entropy pool if possible. If it succeeds this has the benefit that there's no |
| need to delay the early boot process until entropy is available, and noisy |
| kernel log messages about early reading from `/dev/urandom` are avoided |
| too. Specifically: |
| |
| 1. When generating [Type 4 |
| UUIDs](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_\(random\)), |
| systemd tries to use Intel's and AMD's RDRAND CPU opcode directly, if |
| available. While some doubt the quality and trustworthiness of the entropy |
| provided by these opcodes, they should be good enough for generating UUIDs, |
| if not key material (though, as mentioned, today's big distributions opted |
| to trust it for that too, now, see above — but we are not going to make that |
| decision for you, and for anything key material related will only use the |
| kernel's entropy pool). If RDRAND is not available or doesn't work, it will |
| use synchronous `getrandom()` as fallback, and `/dev/urandom` on old kernels |
| where that system call doesn't exist yet. This means on non-Intel/AMD |
| systems UUID generation will block on kernel entropy initialization. |
| |
| 2. For seeding hash tables, and all the other similar purposes systemd first |
| tries RDRAND, and if that's not available will try to use asynchronous |
| `getrandom()` (if the kernel doesn't support this system call, |
| `/dev/urandom` is used). This may fail too in case the pool is not |
| initialized yet, in which case it will fall back to glibc's internal rand() |
| calls, i.e. weak pseudo-random numbers. This should make sure we use good |
| random bytes if we can, but neither delay boot nor trigger noisy kernel log |
| messages during early boot for these use-cases. |
| |
| ## `systemd`'s Support for Filling the Kernel Entropy Pool |
| |
| systemd has various provisions to ensure the kernel entropy is filled during |
| boot, in order to ensure the entropy pool is filled up quickly. |
| |
| 1. When systemd's PID 1 detects it runs in a virtualized environment providing |
| the `virtio-rng` interface it will load the necessary kernel modules to make |
| use of it during earliest boot, if possible — much earlier than regular |
| kernel module loading done by `systemd-udevd.service`. This should ensure |
| that in VM environments the entropy pool is quickly filled, even before |
| systemd invokes the first service process — as long as the VM environment |
| provides virtualized RNG hardware (and VM environments really should!). |
| |
| 2. The |
| [`systemd-random-seed.service`](https://www.freedesktop.org/software/systemd/man/systemd-random-seed.service.html) |
| system service will load a random seed from `/var/lib/systemd/random-seed` |
| into the kernel entropy pool. By default it does not credit entropy for it |
| though, since the seed is — more often than not — not reset when 'golden' |
| master images of an OS are created, and thus replicated into every |
| installation. If OS image builders carefully reset the random seed file |
| before generating the image it should be safe to credit entropy, which can |
| be enabled by setting the `$SYSTEMD_RANDOM_SEED_CREDIT` environment variable |
| for the service to `1` (or even `force`, see man page). Note however, that |
| this service typically runs relatively late during early boot: long after |
| the initial RAM disk (`initrd`) completed, and after the `/var/` file system |
| became writable. This is usually too late for many applications, it is hence |
| not advised to rely exclusively on this functionality to seed the kernel's |
| entropy pool. Also note that this service synchronously waits until the |
| kernel's entropy pool is initialized before completing start-up. It may thus |
| be used by other services as synchronization point to order against, if they |
| require an initialized entropy pool to operate correctly. |
| |
| 3. The |
| [`systemd-boot`](https://www.freedesktop.org/software/systemd/man/systemd-boot.html) |
| EFI boot loader included in systemd is able to maintain and provide a random |
| seed stored in the EFI System Partition (ESP) to the booted OS, which allows |
| booting up with a fully initialized entropy pool from earliest boot |
| on. During installation of the boot loader (or when invoking [`bootctl |
| random-seed`](https://www.freedesktop.org/software/systemd/man/bootctl.html#random-seed)) |
| a seed file with an initial seed is placed in a file `/loader/random-seed` |
| in the ESP. In addition, an identically sized randomized EFI variable called |
| the the 'system token' is set, which is written to the machine's firmware |
| NVRAM. During boot, when `systemd-boot` finds both the random seed file and |
| the system token they are combined and hashed with SHA256 (in counter mode, |
| to generate sufficient data), to generate a new random seed file to store in |
| the ESP as well as a random seed to pass to the OS kernel. The new random |
| seed file for the ESP is then written to the ESP, ensuring this is completed |
| before the OS is invoked. Very early during initialization PID 1 will read |
| the random seed provided in the EFI variable and credit it fully to the |
| kernel's entropy pool. |
| |
| This mechanism is able to safely provide an initialized entropy pool already |
| in the `initrd` and guarantees that different seeds are passed from the boot |
| loader to the OS on every boot (in a way that does not allow regeneration of |
| an old seed file from a new seed file). Moreover, when an OS image is |
| replicated between multiple images and the random seed is not reset, this |
| will still result in different random seeds being passed to the OS, as the |
| per-machine 'system token' is specific to the physical host, and not |
| included in OS disk images. If the 'system token' is properly initialized |
| and kept sufficiently secret it should not be possible to regenerate the |
| entropy pool of different machines, even if this seed is the only source of |
| entropy. |
| |
| Note that the writes to the ESP needed to maintain the random seed should be |
| minimal. The size of the random seed file is directly derived from the Linux |
| kernel's entropy pool size, which defaults to 512 bytes. This means updating |
| the random seed in the ESP should be doable safely with a single sector |
| write (since hard-disk sectors typically happen to be 512 bytes long, too), |
| which should be safe even with FAT file system drivers built into |
| low-quality EFI firmwares. |
| |
| As a special restriction: in virtualized environments PID 1 will refrain |
| from using this mechanism, for safety reasons. This is because on VM |
| environments the EFI variable space and the disk space is generally not |
| maintained physically separate (for example, `qemu` in EFI mode stores the |
| variables in the ESP itself). The robustness towards sloppy OS image |
| generation is the main purpose of maintaining the 'system token' however, |
| and if the EFI variable storage is not kept physically separate from the OS |
| image there's no point in it. That said, OS builders that know that they are |
| not going to replicate the built image on multiple systems may opt to turn |
| off the 'system token' concept by setting `random-seed-mode always` in the |
| ESP's |
| [`/loader/loader.conf`](https://www.freedesktop.org/software/systemd/man/loader.conf.html) |
| file. If done, `systemd-boot` will use the random seed file even if no |
| system token is found in EFI variables. |
| |
| With the three mechanisms described above it should be possible to provide |
| early-boot entropy in most cases. Specifically: |
| |
| 1. On EFI systems, `systemd-boot`'s random seed logic should make sure good |
| entropy is available during earliest boot — as long as `systemd-boot` is |
| used as boot loader, and outside of virtualized environments. |
| |
| 2. On virtualized systems, the early `virtio-rng` hookup should ensure entropy |
| is available early on — as long as the VM environment provides virtualized |
| RNG devices, which they really should all do in 2019. Complain to your |
| hosting provider if they don't. |
| |
| 3. On Intel/AMD systems systemd's own reliance on the kernel entropy pool is |
| minimal (as RDRAND is used on those for UUID generation). This only works if |
| the CPU has RDRAND of course, which most physical CPUs do (but I hear many |
| virtualized CPUs do not. Pity.) |
| |
| 4. In all other cases, `systemd-random-seed.service` will help a bit, but — as |
| mentioned — is too late to help with early boot. |
| |
| This primarily leaves two kind of systems in the cold: |
| |
| 1. Some embedded systems. Many embedded chipsets have hwrng functionality these |
| days. Consider using them while crediting |
| entropy. (i.e. `rng_core.default_quality=1000` on the kernel command line is |
| your friend). Or accept that the system might take a bit longer to |
| boot. Alternatively, consider implementing a solution similar to |
| systemd-boot's random seed concept in your platform's boot loader. |
| |
| 2. Virtualized environments that lack both virtio-rng and RDRAND. Tough |
| luck. Talk to your hosting provider, and ask them to fix this. |
| |
| 3. Also note: if you deploy an image without any random seed and/or without |
| installing any 'system token' in an EFI variable, as described above, this |
| means that on the first boot no seed can be passed to the OS |
| either. However, as the boot completes (with entropy acquired elsewhere), |
| systemd will automatically install both a random seed in the GPT and a |
| 'system token' in the EFI variable space, so that any future boots will have |
| entropy from earliest boot on — all provided `systemd-boot` is used. |
| |
| ## Frequently Asked Questions |
| |
| 1. *Why don't you just use getrandom()? That's all you need!* |
| |
| Did you read any of the above? getrandom() is hooked to the kernel entropy |
| pool, and during early boot it's not going to be filled yet, very likely. We |
| do use it in many cases, but not in all. Please read the above again! |
| |
| 2. *Why don't you use |
| [getentropy()](http://man7.org/linux/man-pages/man3/getentropy.3.html)? That's |
| all you need!* |
| |
| Same story. That call is just a different name for `getrandom()` with |
| `flags` set to zero, and some additional limitations, and thus it also needs |
| the kernel's entropy pool to be initialized, which is the whole problem we |
| are trying to address here. |
| |
| 3. *Why don't you generate your UUIDs with |
| [`uuidd`](http://man7.org/linux/man-pages/man8/uuidd.8.html)? That's all you |
| need!* |
| |
| First of all, that's a system service, i.e. something that runs as "payload" |
| of systemd, long after systemd is already up and hence can't provide us |
| UUIDs during earliest boot yet. Don't forget: to assign the invocation UUID |
| for the `uuidd.service` start we already need a UUID that the service is |
| supposed to provide us. More importantly though, `uuidd` needs state/a random |
| seed/a MAC address/host ID to operate, all of which are not available during |
| early boot. |
| |
| 4. *Why don't you generate your UUIDs with `/proc/sys/kernel/random/uuid`? |
| That's all you need!* |
| |
| This is just a different, more limited interface to `/dev/urandom`. It gains |
| us nothing. |
| |
| 5. *Why don't you use [`rngd`](https://github.com/nhorman/rng-tools), |
| [`haveged`](http://www.issihosts.com/haveged/), |
| [`egd`](http://egd.sourceforge.net/)? That's all you need!* |
| |
| Like `uuidd` above these are system services, hence come too late for our |
| use-case. In addition much of what `rngd` provides appears to be equivalent |
| to `CONFIG_RANDOM_TRUST_CPU=y` or `rng_core.default_quality=1000`, except |
| being more complex and involving userspace. These services partly measure |
| system behavior (such as scheduling effects) which the kernel either |
| already feeds into its pool anyway (and thus shouldn't be fed into it a |
| second time, crediting entropy for it a second time) or is at least |
| something the kernel could much better do on its own. Hence, if what these |
| daemons do is still desirable today, this would be much better implemented |
| in kernel (which would be very welcome of course, but wouldn't really help |
| us here in our specific problem, see above). |
| |
| 6. *Why don't you use [`arc4random()`](https://man.openbsd.org/arc4random.3)? |
| That's all you need!* |
| |
| This doesn't solve the issue, since it requires a nonce to start from, and |
| it gets that from `getrandom()`, and thus we have to wait for random pool |
| initialization the same way as calling `getrandom()` |
| directly. `arc4random()` is nothing more than optimization, in fact it |
| implements similar algorithms that the kernel entropy pool implements |
| anyway, hence besides being able to provide random bytes with higher |
| throughput there's little it gets us over just using `getrandom()`. Also, |
| it's not supported by glibc. And as long as that's the case we are not keen |
| on using it, as we'd have to maintain that on our own, and we don't want to |
| maintain our own cryptographic primitives if we don't have to. Since |
| systemd's uses are not performance relevant (besides the pool initialization |
| delay, which this doesn't solve), there's hence little benefit for us to |
| call these functions. That said, if glibc learns these APIs one day, we'll |
| certainly make use of them where appropriate. |
| |
| 7. *This is boring: NetBSD had [boot loader entropy seed |
| support](https://netbsd.gw.com/cgi-bin/man-cgi?boot+8) since ages!* |
| |
| Yes, NetBSD has that, and the above is inspired by that (note though: this |
| article is about a lot more than that). NetBSD's support is not really safe, |
| since it neither updates the random seed before using it, nor has any |
| safeguards against replicating the same disk image with its random seed on |
| multiple machines (which the 'system token' mentioned above is supposed to |
| address). This means reuse of the same random seed by the boot loader is |
| much more likely. |
| |
| 8. *Why does PID 1 upload the boot loader provided random seed into kernel |
| instead of kernel doing that on its own?* |
| |
| That's a good question. Ideally the kernel would do that on its own, and we |
| wouldn't have to involve userspace in this. |
| |
| 9. *What about non-EFI?* |
| |
| The boot loader random seed logic described above uses EFI variables to pass |
| the seed from the boot loader to the OS. Other systems might have similar |
| functionality though, and it shouldn't be too hard to implement something |
| similar for them. Ideally, we'd have an official way to pass such a seed as |
| part of the `struct boot_params` from the boot loader to the kernel, but |
| this is currently not available. |
| |
| 10. *I use a different boot loader than `systemd-boot`, I'd like to use boot |
| loader random seeds too!* |
| |
| Well, consider just switching to `systemd-boot`, it's worth it. See |
| [systemd-boot(7)](https://www.freedesktop.org/software/systemd/man/systemd-boot.html) |
| for an introduction why. That said, any boot loader can re-implement the |
| logic described above, and can pass a random seed that systemd as PID 1 |
| will then upload into the kernel's entropy pool. For details see the [Boot |
| Loader Interface](https://systemd.io/BOOT_LOADER_INTERFACE) documentation. |
| |
| 11. *Why not pass the boot loader random seed via kernel command line instead |
| of as EFI variable?* |
| |
| The kernel command line is accessible to unprivileged processes via |
| `/proc/cmdline`. It's not desirable if unprivileged processes can use this |
| information to possibly gain too much information about the current state |
| of the kernel's entropy pool. |
| |
| 12. *Why doesn't `systemd-boot` rewrite the 'system token' too each time |
| when updating the random seed file stored in the ESP?* |
| |
| The system token is stored as persistent EFI variable, i.e. in some form of |
| NVRAM. These memory chips tend be of low quality in many machines, and |
| hence we shouldn't write them too often. Writing them once during |
| installation should generally be OK, but rewriting them on every single |
| boot would probably wear the chip out too much, and we shouldn't risk that. |