Commit graph

174 commits

Author SHA1 Message Date
Christian Kellner
917c5bb2f5 objectstore: store object data within subfolder
Instead of storing the (tree) data directly at the root of the
object specific directory, move it into a `data/tree` subfolder.
This prepares for two things:
1) the `tree` folder will allow us to add another folder next to
   it to store metadata.
2) storing both, `tree` and the future metadata folder in a
   common subfolder, prepares for the future integration
   with the new caching layer (`FsCache`).
2022-12-09 12:03:40 +01:00
David Rheinsberg
8511add169 test/fscache: drop PathLike annotation
Drop the PathLike annotation, since it is not compatible to py-3.6.

Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
2022-12-07 20:11:05 +01:00
David Rheinsberg
4df05b8509 util: add file system cache
This commit introduces a new utility module called `fscache`. It
implements a cache module that stores data on the file system. It
supports parallel access and protects data with file-system locks. It
provides three basic functions:

    FsCache.load("<name>"):
        Loads the cache entry with the specified name, acquires a
        read-lock and yields control to the caller to use the entry.
        Once control returns, the entry is unlocked again.

        If the entry cannot be found, a cache miss is signalled via
        FsCache.MissError.

    FsCache.store("<name>"):
        Creates a new anonymous cache entry and yields control to the
        caller to fill in. Once control returns, the entry is renamed
        to the specified name, thus committing it to the object store.

    FsCache.stage():
        Create a new anonymous staging entry and yield control to the
        caller. Once control returns, the entry is completely
        discarded.

        This is primarily used to create a working directory for osbuild
        pipeline operations. The entries are volatile and automatic
        cleanup is provided.

        To commit a staging entry, you would eventually use
        FsCache.store() and rename the entire data directory into the
        non-volatile entry. If the staging area and store are on
        different file-systems, or if the data is to be retained for
        further operations, then the data directory needs to be copied.

Additionally, the cache maintains a size limit and discards any entries
if the limit is exceeded. Future extensions will implement cache pruning
if a configured watermark is reached, based on last-recently-used
logics.

Many more cache extensions are possible. This module introduces a first
draft of the most basic cache and hopefully lays ground for a new cache
infrastructure.

Lastly, note that this only introduces the utility helper. Further work
is required to hook it up with osbuild/objectstore.py.
2022-12-06 09:48:38 +01:00
David Rheinsberg
efe4ad4b92 linux: add Libc accessor with renameat2(2)
Add a new utility that wraps ctypes.CDLL() for the self-embedded
libc.so. Initially, it only exposes renameat2(2), but more can be added
when needed in the future.

The Libc class is very similar to the existing LibCap class, with a
similar instantiation logic with singleton access.

In the future, the Libc class will allow access to other system calls
and libc.so functionality, when needed.
2022-12-06 09:48:38 +01:00
David Rheinsberg
ebbedd1e89 linux: add proc_boot_id()
A new helper for the util.linux module which exposes the linux boot-id.
For security reasons, the boot-id is never exposed directly, but
instead only exposed through an application-id combined with the boot-id
via HMAC-SHA256.

Note that a raw kernel boot-id is always considered confidential, since
we never want an outside entity to deduce any information when they see
a boot-id used in protocol A and one in protocol B. It should not be
possible to tell whether both are from the same user and boot or not.
Hence, both should use their own boot-id namespace.
2022-12-06 09:48:38 +01:00
David Rheinsberg
aefaf21411 linux: add accessor for fcntl file locking ops
This adds a new accessor-function for the file-locking operations
through `fcntl(2)`. In particular, it adds the new function
`fcntl_flock()`, which wraps the `F_OFD_SETLK` command on `fcntl(2)`.

There were a few design considerations:

  * The name `fcntl_flock` comes from the `struct flock` structure that
    is the argument type of all file-locking syscalls. Furthermore, it
    mirrors what the `fcntl` module already provides as a wrapper for
    the classic file-locking syscall.

  * The wrapper only exposes very limited access to the file-locking
    commands. There already is `fcntl.fcntl()` and `fcntl.fcntl_flock()`
    in the standard library, which expose the classic file-locks.
    However, those are implemented in C, which gives much more freedom
    and access to architecture dependent types and functions.
    We do not have that freedom (see the in-code comments for the
    things to consider when exposing more fcntl-locking features).
    Hence, this only exposes a very limited set of functionality,
    exactly the parts we need in the objectstore rework.

  * We cannot use `fcntl.fcntl_flock()` from the standard library,
    because we really want the `OFD` version. OFD stands for
    `open-file-description`. These locks were introduced in 2014 to the
    linux kernel and mirror what the non-OFD locks do, but bind the
    locks to the file-description, rather than to a process. Therefore,
    closing a file-description will release all held locks on that
    file-description.
    This is so much more convenient to work with, and much less
    error-prone than the old-style locks. Hence, we really want these,
    even if it means that we have to introduce this new helper.

  * There is an open bug to add this to the python standard library:

        https://bugs.python.org/issue22367

    This is unresolved since 2014.

The implementation of the `fcntl_flock()` helper is straighforward and
should be easy to understand. However, the reasoning behind the design
decisions are not. Hence, the code contains a rather elaborate comment
explaining why it is done this way.

Lastly, this adds a small, but I think sufficient unit-test suite which
makes sure the API works as expected. It does not test for full
functionality of the underlying locking features, but that is not the
job of a wrapping layer, I think. But more tests can always be added.
2022-12-06 09:48:38 +01:00
Christian Kellner
f8ca0cf4bc objectstore: direct path i/o for Object
The `Object.{read,write}` methods were introduced to implement
copy on write support. Calling `write` would trigger the copy,
if the object had a `base`. Additionally, a level of indirection
was introduced via bind mounts, which allowed to hide the actual
path of the object in the store and make sure that `read` really
returned a read-only path.
Support for copy-on-write was recently removed[1], and thus the
need for the `read` and `write` methods. We lose the benefits
of the indirection, but they are not really needed: the path to
the object is not really hidden since one can always use the
`resolve_ref` method to obtain the actual store object path.
The read only property of build trees is ensured via read only
bind mounts in the build root.
Instead of using `read` and `write`, `Object` now gained a new
`tree` property that is the path to the objects tree and also
is implementing `__fspath__` and so behaves like an `os.PathLike`
object and can thus transparently be used in many places, like
e.g. `os.path.join` or `pathlib.Path`.

[1] 5346025031
2022-11-21 17:26:53 +01:00
Christian Kellner
28b8252a04 objectstore: implicit clone based on object ids
If the object's id does not match with the one supplied for the
commit, we create a clone. Otherwise we store the tree.
The code path is arranged in a way that we always go through
`Object.store_tree` so we always call `Object.finalize` as a
prepration for the future, where we might actually do something
meaningful in the finalizer, like reset the *times or count the
tree size.
2022-11-16 11:09:44 +01:00
Christian Kellner
5346025031 objectstore: remove copy on write from object
Remove copy-on-write support from `objectstore.Object`. The main
reason for introducing copy-on-write was to save an additional
copy in the non DAG-pipeline model[1]. With the introduction of
the latter and the explicit `--export` option, we can achieve the
same result without the complexity of copy-on-write semantics.

[1] See commit 39213b7, part of 3b7c87d5..42a365d1 changeset.
2022-11-16 11:09:44 +01:00
Christian Kellner
afc82ee465 test/objectstore: always setup a fresh store
There is little use in sharing the store between test, quite to
opposite: all tests expect a clean store and some currently set
that up themselves. Create a fresh store for each test.
2022-11-16 11:09:44 +01:00
Christian Kellner
0a41742d27 test/objectstore: small check for clone on commit
Add a small test that checks we indeed copied the object by
verifying a file in the store has the same content after
committing but a different inode.
2022-11-16 11:09:44 +01:00
Christian Kellner
76d6bfa4e8 test/objectstore: use helper to assert contents 2022-11-16 11:09:44 +01:00
Christian Kellner
ecb24a8eb7 util: add module to parse PE32+ files
Add an new module with utility functions to inspect PE32+ files,
mainly listing the sections and their addresses and sizes.
Include a simple test to check that we can successfully parse the
EFI stub contained in systemd (systemd-udev package).
2022-11-14 20:10:59 +01:00
Christian Kellner
5bdc8d030c osbuild: auto-detect best available runner
Use the new `Index.detect_runner` method that will give us the best
available runner for a requested one. To do so a new `pipeline.Runner`
class is introduced that stores the `meta.RunnerInfo` class for the
specific runner and the original name that was requested.
In the manifest loading and describing functions of the formats, use
`Index.detect_runner` to get the `RunnerInfo` for a requested runner
and then wrap it in a `pipeline.Runner` object, which is then passed
to the `Manifest.add_pipeline` method.
See also commit "meta: ability to auto-detect runner".
Adjust all test.
2022-10-11 12:49:16 +02:00
Christian Kellner
0554ac652b test/fmt/v1: use existing runner in manifests
Instead of using a non-existing runner `org.osbuild.test` use an
existing one `org.osbuild.linux`. This prepares the switch to
using runner auto-detection, which will rely on existing runners.
2022-10-11 12:49:16 +02:00
Christian Kellner
683a8cbfa7 meta: cache list of runners
Instead of enumerating all existing runners -- doing i/o -- we
cache the list at the `Index` level.
2022-10-11 12:49:16 +02:00
Christian Kellner
ec1c5bb37c test: checks for runner detection
Add a test suite for the runner detection logic.
2022-10-11 12:49:16 +02:00
Christian Kellner
49dc76c434 test: add new test suite for 'meta' module
Move the checks for `meta.Schema` from `test_osbuild.py` into the new
test suite, converting it to use pytest in the process.
2022-10-11 12:49:16 +02:00
David Rheinsberg
2d6d902428 tree: pep8 + linter fixes
For some reasons I forgot to fix those in the previous runs. Fix a
linter and pep8 warning.

Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
2022-09-23 12:08:10 +02:00
Simon de Vlieger
ea6085fae6 osbuild: run isort on all files 2022-09-12 13:32:51 +02:00
Simon de Vlieger
a5be1cc4d2 linting: use-implicit-booleaness-not-comparison
Newer warning from pylint, also consistent with how we do things
elsewhere. Note that this only applies to one file in the tests but
disabling it would be very weird for such a small fix.
2022-09-12 10:52:09 +02:00
Simon de Vlieger
38d2ab685c test: explicit encodings for open() 2022-09-09 15:33:29 +02:00
Christian Kellner
f05078f66e global: fix PEP-8 formatting
This patch was generated by running `autopep8 --diff` on the
source tree and then applying the diff.
2022-08-05 09:41:05 +02:00
Christian Kellner
e0db89284d tests/objectstore: remove "duplicate" test case
The idea of this test case was to check that two identical trees are
only stored once, via their treesum in the object store; but this
functionality was removed in commit e97f6ef34 and instead of treesums
random uuids are now used. As a result there is no de-duplication
anymore -- the subject of the test. So remove the test.
2022-06-21 15:08:32 +02:00
Thomas Lavocat
441e67a6f6 ostree: show commit metadata
This new API call allows one to check (among other things) if a commit
exists in a repo. It'll throw a RuntimeException if the commit is
missing.
2022-05-11 04:32:42 -05:00
Christian Kellner
f2aa688d3e test/monitor: properly initialize output
It was not initialized in `__init__`, do so.
2022-05-06 17:33:23 +02:00
Christian Kellner
1e4507c3d6 util/ostree: new class to store subordinate ids
Add a new class `SubIdsDB` as a database of subordinate Ids, like the
ones in `/etc/subuid` and `/etc/subgid`. Methods to read and write
data from these two files are provided.
Add corresponding unit tests.
2022-04-28 14:38:24 +01:00
Christian Kellner
4ac62abbc3 buildroot: ability to drop capabilities
Add a new member variable `caps` that if not `None` indicates the
capabilities to retain, i.e. all other capabilities not specified
will be dropped via `bubblewrap` (`--cap-drop`).
Add corresponding tests.
2022-04-27 23:05:11 +01:00
Christian Kellner
1874c71920 util/linux: add capability utilities 2022-04-27 23:05:11 +01:00
Christian Kellner
46fd8958bb test/util: convert util_linux to pytest
Convert the test from `unittest` to `pytest`. No semantic change.
2022-04-27 23:05:11 +01:00
Christian Kellner
99abc1373d inputs: support array of objects references
This extends the possible ways of passing references to inputs. The
current ways possible are:
 1) "plain references", an array of strings:
    ["ref1", "ref2", ...]
 2) "object references", a mapping of keys to objects:
    {"ref1": { <options> }, "ref2": { <options> }, ...}

This patch adds a new way:
  3) "array of object references":
    [{"id": "ref1", "options": { ... }}, {"id": ... }, ]

While osbuild promises to preserves the order for "object references"
not all JSON serialization libraries preserve the order since the
JSON specification does leave this up to the implementation.

The new "array of object references" thus allows for specifying the
references together with reference specific options and this in a
specific order.

Additionally this paves the way for specifying the same input twice,
e.g. in the case of the `org.osbuild.files` input where a pipeline
could then be specified twice with different files. This needs core
rework though, since internally we use dictionaries right now.
2022-04-21 16:39:58 +02:00
Christian Kellner
c25857020d test/fmt_v2: add simple check for input references
Specifically this test checks that the order given in the manifest is
preserved when loaded, i.e. the internal dict has the keys ordered in
the same way, independently in which way they were specified -- list
or object.
2022-04-21 16:39:58 +02:00
Christian Kellner
75df59bace util/selinux: add setfilecon method
This is basically a re-implementation of `setfilecon(3)` minus the
translation of human readable context to raw context. Add test for
the new function.
2022-03-18 20:36:10 +01:00
Christian Kellner
5735357b74 test: convert util.selinux test to pytest
No semantic change in the test itself.
2022-03-18 20:36:10 +01:00
Christian Kellner
3b40125d4a test/lvm2: separate stdout and stderr
In all the invocation of `subprocess.run` stderr and stdout were both
combined in a shared pipe, but lvm sometimes spits out notices and
informational messages on stderr and thus potentially interfering
with the data we are interested in on stdout. Separate the two.
2022-03-04 08:42:35 +01:00
Thomas Lavocat
1ceb096594 host: add support for emitting signals
Add support for emitting signals to host.Service which can be used to
transmit data back to the client during an ongoing method call. This
provides the possibility for the services to send information to their
client counterpart while running. The signal can take file descriptors
as extra parameters to send data on separate files.
2022-02-22 10:38:43 +01:00
Alexander Larsson
b31c91d671 v2: Add source-epoch key in pipeline declaration and pass to buildroot
If this is set it is passed down to all stages and set as
SOURCE_DATE_EPOCH in the buildroot environment. This implements
the spec at:
  https://reproducible-builds.org/docs/source-date-epoch/
2022-02-09 09:58:49 +01:00
Christian Kellner
94e9f62f63 test/osbuild: check devices, mounts schema
Also check that the schema is valid for devices and mounts.
2022-01-06 15:09:33 +00:00
Tom Gundersen
e97f6ef34e objectstore: don't store objects by their treesum
The treesum of a filesystem tree is the content hash of all its
files, its directory structure and file metadata.

By storing trees by their treesum we avoid storing duplicates of
identical trees, at the cost of computing the hashes for every
commit to the store.

This has limited benefit as the likelihood of two trees being
identical is slim, in particular when we already have the ability
to cache based on pipeline/stage ID (i.e., we can avoid rebuilding
trees if the pipelines that built them were the same).

Drop the concept of a treesum entirely, even though I very much
liked the idea in theory...

Signed-off-by: Tom Gundersen <teg@jklm.no>
2021-12-16 16:44:07 +00:00
Christian Kellner
c825c7e4fa buildroot: set container env variable
Set the container environment variable to indicate to programs
inside the build root that they are indeed running inside a
container (see also https://systemd.io/CONTAINER_INTERFACE/).
2021-12-09 13:14:27 +01:00
Christian Kellner
0c71289067 buildroot: isolate environment from the host
Create a well-defined environment with and use that for the build
root. It is not desirable to have the host's environment leak
into the container. Add a test to ensure that this works.
NB: This was probably an oversight when we switched from systemd-
nspawn to bubblewrap.
2021-12-09 13:14:27 +01:00
Christian Kellner
568a4ad97a loop: add new on_close callback to Loop
Add a new signal like callback to the `Loop` class which will be
invoked before the actual loop device is closed, i.e. the loop
device has an open file descriptor to the device node and it is
being closed. Can be used to perform custom cleanup tasks.
2021-12-09 00:44:21 +00:00
Christian Kellner
7e2bb524a4 devices: add custom udev rule inhibitor mechanism
Certain udev rules for block devices are problematic for osbuild.
One prominent example is LVM2 related rules that would trigger
a scan and auto-activation of logical volumes. This rules are
triggered for new block devices or when the backing file of an
loop devices changes. The rules will lead to a `lvm pvscan
--cache --activate ay` via the `lvm2-pvscan@.service` systemd
service. This will auto-activate all LVM2 logical volumes and
thus interfering with our own device handling in `devices/
org.osbuild.lvm2.lv`, where we only want to activate a single
logical volume.
Also, if the lvm2 devices get activated after the manual metadata
change done in `org.osbuild.lvm2.metadata` the volume group names
might conflict which results in all lvm2 based tooling to be very,
ver sad and also said stage to hang since the loopback device can
not be detached since the activate logical volumes keep it open.

To work-around this we therefore implement a udev rule inhibition
mechanism: on the osbuild side a lock file is created via the new
class called `UdevInhibitor` in `utils/udev.py`. A custom set of
udev rules in `10-osbuild-inhibitor.rules` is then acting on the
existence of that lock file and if present will opt-out of certain
further processing. See the udev rules file for more details.

In fact, we want this custom inhibition mechanism, for all block
devices that are under osbuild's control, since these rules are
there to provide automatisms and integrations with the host,
something we never want.

NB: this should not affect the detection of devices, since lvm2
does do a scan of devices when we call `lvdisplay` in `lvm2.lv`.
The call chain as of lvm2 git rev f773040:

  _lvdisplay_single           [tools/lvdisplay.c
    process_each_lv           [tools/toollib.c
      lvmcache_label_scan     [lib/cache/lvmcache.c
        label_scan            [ibidem, here is the device detection!
      lvdisplay_full          [lib/display/display.c
2021-12-09 00:44:21 +00:00
Christian Kellner
3958a6140c test/buildroot: test timeout at the run level
Check the timeout functionality at the `Buildroot.run` level not at
the `read_with_timeout` level, which is an implementation detail.
2021-12-07 09:47:01 +00:00
Christian Kellner
2129f3d68b test/osbuild: add order check for on_demand
Add a check that ensures the order of inputs to `depsolve` is
preserved in the result.
2021-12-03 17:09:33 +00:00
AaronH88
99c739fd60 test: test buildroot read_with_timeout function
- Added a new stage that is stuck in an infinite loop
- Added two tests that use this stage and force a timeout
2021-12-03 14:29:36 +00:00
Christian Kellner
29f2a68eeb osbuild: on-demand building of pipelines
Use the new Manifest.depsolve function to only build the pipelines that
were explicitly requested and their dependencies, taking into account
what is already present in the store.
Since now not all pipeline will be built, there wont be a result entry
for all the pipelines, thus the format version 2 result formatting was
changed to not require the pipeline to be present in result set.
2021-12-02 12:51:30 +00:00
Christian Kellner
749912c75a manifest: implement pipeline depsolving
New function that take a list of pipelines and return the list of
pipelines that need to be build, i.e. the pipelines and all their
dependencies that are not already present in the store.
Add corresponding test.
2021-12-02 12:51:30 +00:00
Christian Kellner
8770bdf10a formats/v1: remove implicit assembler export
When building a version 1 manifest, the assembler would always be
exported, even when not requested via the `--export` command line
option. This was done for backwards compatibility so to not break
tools relying on that behavior. The problem is that support for
this uses a completely different code path and might also now be
confusing behavior. Thus remove the implicit and really only ever
export what was explicitly requested by the caller.
2021-12-02 12:51:30 +00:00
Christian Kellner
36356342b0 buildroot: mask /proc/cmdline
Since we bind `/proc` inside the container, we leak certain information
that comes with it. One of this is the kernel command line. None of the
decisions done by software running inside the container should depend
on the kernel command line on the host, so overwrite the kernel command
line by creating a temporary directory and mapping it inside the build-
root. For now we default to a simple `root=/dev/osbuild` fake kernel
command line.
Add a simple check for it as well.
2021-11-30 12:01:13 +01:00