The cachedir-tag specification defines how to mark directories as
cache-directories. This allows tools like `tar` to ignore those
directories if desired (e.g., see `tar --ignore-caches`). This is very
useful to avoid huge cache-directories in backups and remote
synchronizations.
The spec simply defines a file called `CACHEDIR.TAG` with the first 43
bytes to be: "Signature: 8a477f597d28d172789f06886806bc55" (which
happens to be the MD5-checksum of ".IsCacheDirectory". Further content
is to be ignored. Any such files marks the directory in question as a
cache-directory.
The cachedir-tag has been successfully deployed in tools like `cargo`
and `VLC`, and is currently discussed to be implemented in Firefox. More
information is available here: https://bford.info/cachedir/
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Add trace-hooks to the FsCache._atomic_open() helper, including a
primitive trace-infrastructure. They allow interrupting cache operation
and running arbitrary code.
The trace-hooks will be used by the test-suite to trigger the races we
want to protect against. During runtime, the traces should not be used
and thus will always be `None`.
This is a very primitive way to hook into the runtime execution and test
the atomicity of the operations. However, it is simple enough for our
tests and avoids pulling in huge tracing suites.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
On NFS, we need to be careful with cached metadata. To make sure our
_atomic_open() can correctly catch races during open+lock, we must be
careful to catch `ESTALE` and `ENOENT` from `stat()` calls. Otherwise,
the lock-acquisition guarantees that data is coherent, even on NFS.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
We used to commit cache-entries with a rename+RENAME_NOREPLACE. This,
however, is not available on NFS. Change the code to use `os.rename()`
and rely on the _documented_ kernel behavior that non-empty target
directories cannot be replaced.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
The `RENAME_NOREPLACE` option is not available on NFS. Avoid using it
in _atomic_file() to allow NFS backed storage.
If the caller allows replacing the destination entry, we simply use the
original `os.rename()` system call. This will unconditionally replace
the destination on all file-systems.
If the caller requests `no-replace`, we cannot use `os.rename()`.
Instead, we use `os.link()` to create a new hard-link on the
destination. This will always fail if the destination already exists.
We then rely on the cleanup-path to unlink the original temporary
entry.
This will require adjustments in future maintenance tasks on the cache,
since they need to be aware that entries can be hardlinked temporarily.
However, we already consider `uuid-*` entries in the object-store to be
temporary and unaccounted for similar reasons, so this doesn't even
break our cache-maintenance ideas.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Add a helper that copies an entire directory tree including all metadata
into the cache. Use it in the ObjectStore to commit entries.
Unlike FsCache.store() this does not require entering the context from
the call-site. Instead, all data is directly passed to the cache and the
operation is under full control of the cache.
The ObjectStore is adjusted to make use of this. This requires exposing
the root-path (rather than the tree-path) to be accessible for
individual objects, hence a `path`-@property is added alongside the
`tree`-@property. Note that `__fspath__` still refers to the tree-path,
since this is the only path really required for outside access other
than from the object-manager itself.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
The default value for `get()` is `None`, so no reason to specify it
explicitly. Simplify the respective calls in FsCache.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
New utility function to clamp all mtimes of a given path to a
certain timestamp. Clamp here means that any timestamp later
than the specified upper bound will be set to the upper bound.
Add a new field to the cache-information called `version`, which is a
simple integer that is incremented on any backward-incompatible change.
The cache-implementation is modified to avoid any access to the cache
except for `<cache>/staging/`. This means, changes to the staging area
must be backwards compatible at all cost. Furthermore, it means we can
always successfully run osbuild even on possibly incompatible caches,
because we can always just ignore the cache and fully rely on the
staging area being accessible.
The `load()` method will always return cache-misses. The `store()`
method simply discards the entry instead of storing it. Note that
`store()` needs to provide a context to the caller, hence this
implementation simply creates another staging-context to provide to the
caller and then discard. This is non-optimal, but keeps the API simple
and avoids raising an exception to the caller (but this can be changed
if it turns out to be problematic or unwanted).
Lastly, the `cache.info` field behaves as usual, since this is also the
field used to read the cache-version. However, this file is never
written to improve resiliency and allow blacklisting buggy versions from
the past.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Code is based on `common.DataSizeToUint64` in Composer, with a
modification to allow `unlimited` so that the result is compatible
with `fscache.MaximumSizeType`.
[1] f4aed3e6e2/internal/common/helpers.go (L46)
This commit introduces a new utility module called `fscache`. It
implements a cache module that stores data on the file system. It
supports parallel access and protects data with file-system locks. It
provides three basic functions:
FsCache.load("<name>"):
Loads the cache entry with the specified name, acquires a
read-lock and yields control to the caller to use the entry.
Once control returns, the entry is unlocked again.
If the entry cannot be found, a cache miss is signalled via
FsCache.MissError.
FsCache.store("<name>"):
Creates a new anonymous cache entry and yields control to the
caller to fill in. Once control returns, the entry is renamed
to the specified name, thus committing it to the object store.
FsCache.stage():
Create a new anonymous staging entry and yield control to the
caller. Once control returns, the entry is completely
discarded.
This is primarily used to create a working directory for osbuild
pipeline operations. The entries are volatile and automatic
cleanup is provided.
To commit a staging entry, you would eventually use
FsCache.store() and rename the entire data directory into the
non-volatile entry. If the staging area and store are on
different file-systems, or if the data is to be retained for
further operations, then the data directory needs to be copied.
Additionally, the cache maintains a size limit and discards any entries
if the limit is exceeded. Future extensions will implement cache pruning
if a configured watermark is reached, based on last-recently-used
logics.
Many more cache extensions are possible. This module introduces a first
draft of the most basic cache and hopefully lays ground for a new cache
infrastructure.
Lastly, note that this only introduces the utility helper. Further work
is required to hook it up with osbuild/objectstore.py.
Add a new utility that wraps ctypes.CDLL() for the self-embedded
libc.so. Initially, it only exposes renameat2(2), but more can be added
when needed in the future.
The Libc class is very similar to the existing LibCap class, with a
similar instantiation logic with singleton access.
In the future, the Libc class will allow access to other system calls
and libc.so functionality, when needed.
A new helper for the util.linux module which exposes the linux boot-id.
For security reasons, the boot-id is never exposed directly, but
instead only exposed through an application-id combined with the boot-id
via HMAC-SHA256.
Note that a raw kernel boot-id is always considered confidential, since
we never want an outside entity to deduce any information when they see
a boot-id used in protocol A and one in protocol B. It should not be
possible to tell whether both are from the same user and boot or not.
Hence, both should use their own boot-id namespace.
This adds a new accessor-function for the file-locking operations
through `fcntl(2)`. In particular, it adds the new function
`fcntl_flock()`, which wraps the `F_OFD_SETLK` command on `fcntl(2)`.
There were a few design considerations:
* The name `fcntl_flock` comes from the `struct flock` structure that
is the argument type of all file-locking syscalls. Furthermore, it
mirrors what the `fcntl` module already provides as a wrapper for
the classic file-locking syscall.
* The wrapper only exposes very limited access to the file-locking
commands. There already is `fcntl.fcntl()` and `fcntl.fcntl_flock()`
in the standard library, which expose the classic file-locks.
However, those are implemented in C, which gives much more freedom
and access to architecture dependent types and functions.
We do not have that freedom (see the in-code comments for the
things to consider when exposing more fcntl-locking features).
Hence, this only exposes a very limited set of functionality,
exactly the parts we need in the objectstore rework.
* We cannot use `fcntl.fcntl_flock()` from the standard library,
because we really want the `OFD` version. OFD stands for
`open-file-description`. These locks were introduced in 2014 to the
linux kernel and mirror what the non-OFD locks do, but bind the
locks to the file-description, rather than to a process. Therefore,
closing a file-description will release all held locks on that
file-description.
This is so much more convenient to work with, and much less
error-prone than the old-style locks. Hence, we really want these,
even if it means that we have to introduce this new helper.
* There is an open bug to add this to the python standard library:
https://bugs.python.org/issue22367
This is unresolved since 2014.
The implementation of the `fcntl_flock()` helper is straighforward and
should be easy to understand. However, the reasoning behind the design
decisions are not. Hence, the code contains a rather elaborate comment
explaining why it is done this way.
Lastly, this adds a small, but I think sufficient unit-test suite which
makes sure the API works as expected. It does not test for full
functionality of the underlying locking features, but that is not the
job of a wrapping layer, I think. But more tests can always be added.
Add an new module with utility functions to inspect PE32+ files,
mainly listing the sections and their addresses and sizes.
Include a simple test to check that we can successfully parse the
EFI stub contained in systemd (systemd-udev package).
The consumer certs are used to uniquely identify a system against
candlepin. These consumer certs can be used to identify the system when
pulling from RH controlled ostree repositories.
Add a new class `SubIdsDB` as a database of subordinate Ids, like the
ones in `/etc/subuid` and `/etc/subgid`. Methods to read and write
data from these two files are provided.
Add corresponding unit tests.
This is basically a re-implementation of `setfilecon(3)` minus the
translation of human readable context to raw context. Add test for
the new function.
The udev inhibitor rules are checking for `device-$major:$minor`
but we created them with `f"device-{major}-{minor}"`. So they
did indeed not actually work. Fix that.
ioctl contants are platform dependent. It should be the same on
x86, aarch64 and s390x but it is indeed different on ppc64le.
This lead to the call to `ioctl_blockdev_flushbuf` actually
raising an exception of `OSError: [Errno 22] Invalid argument`.
The constant was calculated with a little python snippet that
in theory could also go directly into the code, but for now
the simpler condition in this patch is enough.
The snippet is a port of the defines from the Linux kernel,
specifically /usr/include/asm-generic/ioctl.h.
class IOConstants:
"""IO Commands for Linux"""
if platform.machine() == "ppc64le":
NRBITS = 8
TYPEBITS = 8
SIZEBITS = 13
DIR_NONE = 1
else:
NRBITS = 8
TYPEBITS = 8
SIZEBITS = 14
DIR_NONE = 0
NRSHIFT = 0
TYPESHIFT = NRSHIFT+NRBITS
SIZESHIFT = TYPESHIFT+TYPEBITS
DIRSHIFT = SIZESHIFT+SIZEBITS
@classmethod
def make(cls, directory, iotype, nr, size):
return ((directory << cls.DIRSHIFT) |
(iotype << cls.TYPESHIFT) |
(nr << cls.NRSHIFT) |
(size << cls.SIZESHIFT))
@classmethod
def make_dir_none(cls, iotype, nr):
return cls.make(cls.DIR_NONE, iotype, nr, 0)
This is used to get the value for `BLKFLSBUF` taken from the
include `/usr/include/linux/fs.h`:
#define BLKFLSBUF _IO(0x12,97) /* flush buffer cache */
The value is then obtained via:
print("0x%x" % IOConstants.make_dir_none(0x12,97))
0x20001261
Certain udev rules for block devices are problematic for osbuild.
One prominent example is LVM2 related rules that would trigger
a scan and auto-activation of logical volumes. This rules are
triggered for new block devices or when the backing file of an
loop devices changes. The rules will lead to a `lvm pvscan
--cache --activate ay` via the `lvm2-pvscan@.service` systemd
service. This will auto-activate all LVM2 logical volumes and
thus interfering with our own device handling in `devices/
org.osbuild.lvm2.lv`, where we only want to activate a single
logical volume.
Also, if the lvm2 devices get activated after the manual metadata
change done in `org.osbuild.lvm2.metadata` the volume group names
might conflict which results in all lvm2 based tooling to be very,
ver sad and also said stage to hang since the loopback device can
not be detached since the activate logical volumes keep it open.
To work-around this we therefore implement a udev rule inhibition
mechanism: on the osbuild side a lock file is created via the new
class called `UdevInhibitor` in `utils/udev.py`. A custom set of
udev rules in `10-osbuild-inhibitor.rules` is then acting on the
existence of that lock file and if present will opt-out of certain
further processing. See the udev rules file for more details.
In fact, we want this custom inhibition mechanism, for all block
devices that are under osbuild's control, since these rules are
there to provide automatisms and integrations with the host,
something we never want.
NB: this should not affect the detection of devices, since lvm2
does do a scan of devices when we call `lvdisplay` in `lvm2.lv`.
The call chain as of lvm2 git rev f773040:
_lvdisplay_single [tools/lvdisplay.c
process_each_lv [tools/toollib.c
lvmcache_label_scan [lib/cache/lvmcache.c
label_scan [ibidem, here is the device detection!
lvdisplay_full [lib/display/display.c
The current implementation of `rmtree` will try to fix permissions
when it encounters permission errors during its operation. This is
done by opening the target via `os.open` and then adjusting the
immutable flag and the permission bits. This is a problem when the
target is a broken symlink since open will fail with `ENOENT`. A
simple reproducer of this scenario is:
$ mkdir subdir
$ ln -s foo subdir/broken
$ chmod a-w subdir/
$ python3 -c 'import osbuild; osbuild.util.rmrf.rmtree("subdir")'
Since subdir is not writable, removing `subdir/broken` will fail
with `EPERM` and the `on_error` callback will try to fix it by
invoking `fixperms` on `subdir/broken` which will in `open` since
the target does not exist (broken symlink).
This is fixed by using `O_NOFOLLOW` to open so we will never open
the target. Instead `open` will fail with `ELOOP`; we ignore that
error and in fact we ignore now all errors from `open` since it
does not matter: if fixing the permissions didn't work `unlink`
will just fail (again) with `EPERM` and for symlinks it actually
doesn't matter since "on Linux the permissions of an ordinary
symbolic link are not used in an operations", see symlinks(7).
When `get_fallback_rhsm_secrets` was used, `Subscriptions.repositories`
was None, and `get_secrets` never returned the fallback secrets.
So check if `repositories` is None before
iterating over it, otherwise return the fallback secrets.
This stage takes /usr/lib/passwd and /usr/etc/passwd from an OSTree
checkout, merges them into one file, and store it as /etc/passwd in the
buildroot.
It does the same for /etc/group.
The reason for doing this is that there is an issue with unstable UIDs
and GIDs when creating OSTree commits from scratch. When there is a
package that creates a system user or a system group, it can change the
UID and GID of users and groups that are created later.
This is not a problem in traditional deployments because already created
users and groups never change their UIDs and GIDs, but with OSTree we
recreate the files from scratch and then replace the previous one so it
can actually change.
By copying the files to the build root before doing any other
operations, we can make sure that the UIDs and GIDs of already existing
users and groups won't change.
Co-author: Christian Kellner <christian@kellner.me>
Add a helper method to call `ioctl(fd, BLK_IOC_FLUSH_BUFFER, 0)`
from python. NB: the ioctl number 0x1261 is wrong on at least
alpha and sparc. A later test will use this call so we should
catch the usage of it on those platforms.
This module provides a `Disk` class that can be used
to read in LVM images and explore and manipulate its
metadata directly, i.e. it reads and writes the data
and headers directly. This allows one to rename an
volume group without having to involve the kernel,
which does not like to have two active LVM volume
groups with the same name.
The problem is that some deployments might not have the redhat.repo
file, yet they might have the key and certificate to access Red Hat CDN.
If that was the case, the new approach would cause a regression compared
to the previous behavior.
This patch uses the previous method if the redhat.repo file is not
found or does not contain any matching URL.
Add a simple helper method that returns the path for a deployment,
given the sysroot, the osname, the reference or commit and the
deployment serial. Path might not exist.
`
Checks if one path is a child of a second one. Useful for checking if
paths defined in a manifest exist inside the tree.
Optionally checks if the target path exists.
Often, a message is being sent and followed by a call to `recv`
to wait for a reply. Create a simple helper `send_and_recv` that
does both in one method.
Add a simple check for that helper to the tests.
Add a new constructor method that allows creating a `Socket` from
an existing file-descriptor of a socket. This might be need when
the socket was passed to a child process.
Add a simple test for the new constructor method.
Add a new constructor method, `Socket.new_pair`, to create a pair
of connected sockets (via `socketpair`) and wrap both sides via
`jsoncomm.Socket`.
Add a simple test to check it.
The previous version covered too few use cases, more specifically a
single subscription. That is of course not the case for many hosts, so
osbuild needs to understand subscriptions.
When running org.osbuild.curl source, read the
/etc/yum.repos.d/redhat.repo file and load the system subscriptions from
there. While processing each url, guess which subscription is tied to
the url and use the CA certificate, client certificate, and client key
associated with this subscription. It must be done this way because the
depsolving and fetching of RPMs may be performed on different hosts and
the subscription credentials are different in such case.
More detailed description of why this approach was chosen is available
in osbuild-composer git: https://github.com/osbuild/osbuild-composer/pull/1405
A new module that can parse and execute Lorax script templates,
which are mako template based files that support a limited set
of commands, like "install", "remove" and such.
The module provides helper functions to parse such templates
and execute them by providing a re-implementation of a subset
of the commands. All commands needed for running the post
installationtemplates were implemented.