The udev inhibitor rules are checking for `device-$major:$minor`
but we created them with `f"device-{major}-{minor}"`. So they
did indeed not actually work. Fix that.
Add support for emitting signals to host.Service which can be used to
transmit data back to the client during an ongoing method call. This
provides the possibility for the services to send information to their
client counterpart while running. The signal can take file descriptors
as extra parameters to send data on separate files.
This adds a stage called org.osbuild.skopeo that installs docker and
oci archive files into the container storage of the tree being
constructed.
The source can either be a file from another pipeline, for example one
created with the existing org.osbuild.oci-archive stage, or it can
be using the new org.osbuild.skopeo source and org.osbuild.containers
input, which will download an image from a registry and install that.
There is an optional option in the install stage that lets you
configure a custom storage location, which allows the use of the
additionalimagestores option in the container storage.conf
to use a read-only image stores (instead of /var/lib/container).
Note: skopeo fails to start if /etc/containers/policy.json is
not available, so we bind mount it from the build tree to the
buildroot if available.
The client side does meta.get("source-epoch", default), but for
this to work we need to have the key unset if not specified,
but currently we set it to None.
Also, make sure the check for "not None" is explicit, because
we do consider a value of `0` to be a valid source-epoch.
ioctl contants are platform dependent. It should be the same on
x86, aarch64 and s390x but it is indeed different on ppc64le.
This lead to the call to `ioctl_blockdev_flushbuf` actually
raising an exception of `OSError: [Errno 22] Invalid argument`.
The constant was calculated with a little python snippet that
in theory could also go directly into the code, but for now
the simpler condition in this patch is enough.
The snippet is a port of the defines from the Linux kernel,
specifically /usr/include/asm-generic/ioctl.h.
class IOConstants:
"""IO Commands for Linux"""
if platform.machine() == "ppc64le":
NRBITS = 8
TYPEBITS = 8
SIZEBITS = 13
DIR_NONE = 1
else:
NRBITS = 8
TYPEBITS = 8
SIZEBITS = 14
DIR_NONE = 0
NRSHIFT = 0
TYPESHIFT = NRSHIFT+NRBITS
SIZESHIFT = TYPESHIFT+TYPEBITS
DIRSHIFT = SIZESHIFT+SIZEBITS
@classmethod
def make(cls, directory, iotype, nr, size):
return ((directory << cls.DIRSHIFT) |
(iotype << cls.TYPESHIFT) |
(nr << cls.NRSHIFT) |
(size << cls.SIZESHIFT))
@classmethod
def make_dir_none(cls, iotype, nr):
return cls.make(cls.DIR_NONE, iotype, nr, 0)
This is used to get the value for `BLKFLSBUF` taken from the
include `/usr/include/linux/fs.h`:
#define BLKFLSBUF _IO(0x12,97) /* flush buffer cache */
The value is then obtained via:
print("0x%x" % IOConstants.make_dir_none(0x12,97))
0x20001261
The treesum of a filesystem tree is the content hash of all its
files, its directory structure and file metadata.
By storing trees by their treesum we avoid storing duplicates of
identical trees, at the cost of computing the hashes for every
commit to the store.
This has limited benefit as the likelihood of two trees being
identical is slim, in particular when we already have the ability
to cache based on pipeline/stage ID (i.e., we can avoid rebuilding
trees if the pipelines that built them were the same).
Drop the concept of a treesum entirely, even though I very much
liked the idea in theory...
Signed-off-by: Tom Gundersen <teg@jklm.no>
Set the container environment variable to indicate to programs
inside the build root that they are indeed running inside a
container (see also https://systemd.io/CONTAINER_INTERFACE/).
Create a well-defined environment with and use that for the build
root. It is not desirable to have the host's environment leak
into the container. Add a test to ensure that this works.
NB: This was probably an oversight when we switched from systemd-
nspawn to bubblewrap.
Introduce two new command line arguments, which can be used to
specify which monitor class to use (`--monitor`) and what file
descriptor to use for monitoring (`--monitor-fd`). The latter
defaults to standard out. The monitor class, if not specified,
is depended on the `--json` argument.
Add a new callback parameter to `LoopControl` that, if specified,
will be invoked after the loop device is opened but before any
other operation is done, like setting the backing file. Can be
used to perform custom setup tasks.
Add a new signal like callback to the `Loop` class which will be
invoked before the actual loop device is closed, i.e. the loop
device has an open file descriptor to the device node and it is
being closed. Can be used to perform custom cleanup tasks.
Certain udev rules for block devices are problematic for osbuild.
One prominent example is LVM2 related rules that would trigger
a scan and auto-activation of logical volumes. This rules are
triggered for new block devices or when the backing file of an
loop devices changes. The rules will lead to a `lvm pvscan
--cache --activate ay` via the `lvm2-pvscan@.service` systemd
service. This will auto-activate all LVM2 logical volumes and
thus interfering with our own device handling in `devices/
org.osbuild.lvm2.lv`, where we only want to activate a single
logical volume.
Also, if the lvm2 devices get activated after the manual metadata
change done in `org.osbuild.lvm2.metadata` the volume group names
might conflict which results in all lvm2 based tooling to be very,
ver sad and also said stage to hang since the loopback device can
not be detached since the activate logical volumes keep it open.
To work-around this we therefore implement a udev rule inhibition
mechanism: on the osbuild side a lock file is created via the new
class called `UdevInhibitor` in `utils/udev.py`. A custom set of
udev rules in `10-osbuild-inhibitor.rules` is then acting on the
existence of that lock file and if present will opt-out of certain
further processing. See the udev rules file for more details.
In fact, we want this custom inhibition mechanism, for all block
devices that are under osbuild's control, since these rules are
there to provide automatisms and integrations with the host,
something we never want.
NB: this should not affect the detection of devices, since lvm2
does do a scan of devices when we call `lvdisplay` in `lvm2.lv`.
The call chain as of lvm2 git rev f773040:
_lvdisplay_single [tools/lvdisplay.c
process_each_lv [tools/toollib.c
lvmcache_label_scan [lib/cache/lvmcache.c
label_scan [ibidem, here is the device detection!
lvdisplay_full [lib/display/display.c
The build root does not know anything about stages; the stage
concept is one level higher. Therefore rename `stage_timeout`
to `timeout` in the buildroot.
Remove the constraint that either checkpoints or the output directory
has to be supplied on the command line. Now that we have `--export`
and on-demand building it is perfectly fine to supply neither an output
directory nor checkpoints and implicitly no --export` which corresponds
to a download only mode.
Use the new Manifest.depsolve function to only build the pipelines that
were explicitly requested and their dependencies, taking into account
what is already present in the store.
Since now not all pipeline will be built, there wont be a result entry
for all the pipelines, thus the format version 2 result formatting was
changed to not require the pipeline to be present in result set.
New function that take a list of pipelines and return the list of
pipelines that need to be build, i.e. the pipelines and all their
dependencies that are not already present in the store.
Add corresponding test.
When formatting the the result, switch the default to success,
but then properly propagate the status of the build pipeline.
This should ensure that if there are no pipeline results but
a failed build pipeline, the overall status will be 'failed'.
On the other hand, if no pipelines were built, including tree
or build, the overall status will be 'success'.
When building a version 1 manifest, the assembler would always be
exported, even when not requested via the `--export` command line
option. This was done for backwards compatibility so to not break
tools relying on that behavior. The problem is that support for
this uses a completely different code path and might also now be
confusing behavior. Thus remove the implicit and really only ever
export what was explicitly requested by the caller.
Instead of building everything and then failing if an export is
invalid and could not be found, resolve the exports early and
also ensure that if `--export` is given, `--output-directory`
is present too.
The current implementation of `rmtree` will try to fix permissions
when it encounters permission errors during its operation. This is
done by opening the target via `os.open` and then adjusting the
immutable flag and the permission bits. This is a problem when the
target is a broken symlink since open will fail with `ENOENT`. A
simple reproducer of this scenario is:
$ mkdir subdir
$ ln -s foo subdir/broken
$ chmod a-w subdir/
$ python3 -c 'import osbuild; osbuild.util.rmrf.rmtree("subdir")'
Since subdir is not writable, removing `subdir/broken` will fail
with `EPERM` and the `on_error` callback will try to fix it by
invoking `fixperms` on `subdir/broken` which will in `open` since
the target does not exist (broken symlink).
This is fixed by using `O_NOFOLLOW` to open so we will never open
the target. Instead `open` will fail with `ELOOP`; we ignore that
error and in fact we ignore now all errors from `open` since it
does not matter: if fixing the permissions didn't work `unlink`
will just fail (again) with `EPERM` and for symlinks it actually
doesn't matter since "on Linux the permissions of an ordinary
symbolic link are not used in an operations", see symlinks(7).
Since we bind `/proc` inside the container, we leak certain information
that comes with it. One of this is the kernel command line. None of the
decisions done by software running inside the container should depend
on the kernel command line on the host, so overwrite the kernel command
line by creating a temporary directory and mapping it inside the build-
root. For now we default to a simple `root=/dev/osbuild` fake kernel
command line.
Add a simple check for it as well.
Instead of having a temporary directory on the host for `/var` inside
the container, create a generic temporary directory that can be used
for other things and create the `var` inside that.
Commit 5b1cd2b made `source` and `target` for mounts optional, but
the corresponding code in `describe` still assumes that the device
will always be present. Fix this so that source will only be used
if it is set.
Allow mount services to return None, which means they have not
actually mounted anything within the mount root. This might be
because they have bind mounted directories within the tree.
These mounts do not need any path translation.
If a stage has not itself defined the `mounts` property, allow any
mounts. This is in preparation to support specialized mounts, such
as bind mounts or ostree deployment mounts to transparently work
with any stage.
NB: devices are not allowed so this will not be applicable for the
current filesystem mounts.
The previous commit gave the individual mounts more control over the
source and target properties. Do not require them at the global
schema but hand the control if they are optional over to the modules.
Define the mount schema in the actual mounts at a higher level. This
is in preparation to give the modules more control over the `source`
and `target` properties.
Introduce a new specialized service manager class `MountManager` to
manage mounts. It uses the newly introduced `DeviceManager` to look
up devices and stores the reference to the mount point root path.
See the commit that introduced the `DeviceManager` for more info.
Add new helper functions that can translate from a managed device
to its path. One is relative and one is the absolute path on the
host, i.e. to the device node on the host.
Introduce a new class to manage devices, `DeviceManger` and move the
code to open devices from the `Device` here. The main insight of why
the logic should be place here is that certain information is needed
to open the devices, independently of specific type: the path to the
device node directory, `devpath`, the actual `tree` and the service
manager instance to start the actual service. Instead of passing all
this information again and again to the `Device` class, we now have
a specialized (service) manager class for devices that has all the
needed information all the time. Additionally, the special handling
of parent devices is moved from the pipeline to the service manager,
which is where it belongs.
This will make even more sense for mounts, where the `DeviceManger`
can then be passed to access the individual devices.
Port the test to use the `DeviceManager`.
When setting up the build root, only bind mount the `/boot` dir
from the supplied build tree, if the build tree is not the host
itself, since we never want to leak any host specific data and
the `/boot` directory should never be needed when building the
build root. The only reason `/boot` is mounted at all is for
the grub2 stage to copy efi binaries to the tree since they
directly installed to `/boot` by the respective bootloader
packages.
Currently, we take to paths from the root file system supplied
to the `BuildRoot` class: `/boot` and `/usr`. The reason for
mounting `/boot` is that grub2 and shim install efi binaries
there and for certain images we want to copy the binaries from
the build root and not install the respective packages.
However, if we build to build root itself, we probably don't
want the mount the hosts' `/boot` since we don't want to copy
anything from there. This change should give us the ability to
do exactly that.
If there are fds to send back to the client, do a check that none
of them are invalid, so that we do not raise an exception in send
later. This allows us to send a proper RemoteError instead of no
reply at all.
When decoding a message, first check that it is not empty and
raise a `ProtocolError` otherwise. This prevent a more obscure
error like "NoneType has no get method".
Since source were converted to host services it now uses a unix
socket instead of stdin to pass the arguments, which includes
the list of items to download. The latter can become quite big,
in fact too big to fit into a single package (NB: SOCK_SEQPACKET
is used for the underlying transport).
Therefore write the actual items to a temporary file and pass
the fd of it along the message.
On the service server side, i.e. the actual host service binary,
when we receive a message that contains file descriptors, clean
then up eagerly, instead relying on the garbage collector.
More importantly, the fds that we get from as a reply, if any,
need to be closed since in the current model the ownership is
transferred to the caller of `dispatch`.
Port sources to also use the host services infrastructure that is
used by inputs, devices and mounts. Sources are a bit different
from the other services that they don't run for the duration of
the stage but are run before anything is built. By using the same
infrastructure we re-use the process management and inter process
communcation. Additionally, this will forward all messages from
sources to the existing monitoring framework.
Adapt all existing sources and tests.
When `get_fallback_rhsm_secrets` was used, `Subscriptions.repositories`
was None, and `get_secrets` never returned the fallback secrets.
So check if `repositories` is None before
iterating over it, otherwise return the fallback secrets.
This stage takes /usr/lib/passwd and /usr/etc/passwd from an OSTree
checkout, merges them into one file, and store it as /etc/passwd in the
buildroot.
It does the same for /etc/group.
The reason for doing this is that there is an issue with unstable UIDs
and GIDs when creating OSTree commits from scratch. When there is a
package that creates a system user or a system group, it can change the
UID and GID of users and groups that are created later.
This is not a problem in traditional deployments because already created
users and groups never change their UIDs and GIDs, but with OSTree we
recreate the files from scratch and then replace the previous one so it
can actually change.
By copying the files to the build root before doing any other
operations, we can make sure that the UIDs and GIDs of already existing
users and groups won't change.
Co-author: Christian Kellner <christian@kellner.me>