Add a new optional stage option to not fail if the specified directory
already exists. This will make it easier to support creation of custom
repositories via customizations in osbuild-composer. The reason is that
if a specified directory exists in an image, because it was created by
an RPM, then creating it would fail. However, the user may have
specified different mode for the directory, than it already has. Since
there is no way to know for sure if the directory already exists on the
image, without building the image itself, it is desired to handle this
case gracefully as valid in specific use cases.
The default behavior stays the same - specifying an existing directory
path will lead to an error.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
Documentation for os.mkdir() says that the mode is
ignored on some systems. Also umask value may affect
the final mode. So we set the mode explicitly.
Set the mode explicitly.
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
Instead of creating the file in /usr/share and symlinking to /etc,
create it directly in /etc. This fixes an issue with SELinux labeling.
The file in /usr/share does not get labelled correctly because it
doesn't match the policy and causes issues with some tools (rhc).
See rhbz#2147450.
Instead of using `Path.stat` use `os.stat` since the former only
gained the `follow_symlinks` argument in 3.10 but we still need
to support Python 3.6 for RHEL 7 and 8.
Additionally, reduce the precision by converting timestamps to an
integer to avoid false negatives due to floating point arithmetic.
The cachedir-tag specification defines how to mark directories as
cache-directories. This allows tools like `tar` to ignore those
directories if desired (e.g., see `tar --ignore-caches`). This is very
useful to avoid huge cache-directories in backups and remote
synchronizations.
The spec simply defines a file called `CACHEDIR.TAG` with the first 43
bytes to be: "Signature: 8a477f597d28d172789f06886806bc55" (which
happens to be the MD5-checksum of ".IsCacheDirectory". Further content
is to be ignored. Any such files marks the directory in question as a
cache-directory.
The cachedir-tag has been successfully deployed in tools like `cargo`
and `VLC`, and is currently discussed to be implemented in Firefox. More
information is available here: https://bford.info/cachedir/
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Add an extension to the FsCache tests which verifies cache coherency and
atomicity of the FsCache implementation. Additionally, if available, it
utilizes a cache on NFS storage to test network-support.
Unfortunately, the stress-tests keep triggering kernel-oopses in the NFS
client driver, so they are disabled for now. However, once investigated,
we can re-enable them.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Add trace-hooks to the FsCache._atomic_open() helper, including a
primitive trace-infrastructure. They allow interrupting cache operation
and running arbitrary code.
The trace-hooks will be used by the test-suite to trigger the races we
want to protect against. During runtime, the traces should not be used
and thus will always be `None`.
This is a very primitive way to hook into the runtime execution and test
the atomicity of the operations. However, it is simple enough for our
tests and avoids pulling in huge tracing suites.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
On NFS, we need to be careful with cached metadata. To make sure our
_atomic_open() can correctly catch races during open+lock, we must be
careful to catch `ESTALE` and `ENOENT` from `stat()` calls. Otherwise,
the lock-acquisition guarantees that data is coherent, even on NFS.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
We used to commit cache-entries with a rename+RENAME_NOREPLACE. This,
however, is not available on NFS. Change the code to use `os.rename()`
and rely on the _documented_ kernel behavior that non-empty target
directories cannot be replaced.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
The `RENAME_NOREPLACE` option is not available on NFS. Avoid using it
in _atomic_file() to allow NFS backed storage.
If the caller allows replacing the destination entry, we simply use the
original `os.rename()` system call. This will unconditionally replace
the destination on all file-systems.
If the caller requests `no-replace`, we cannot use `os.rename()`.
Instead, we use `os.link()` to create a new hard-link on the
destination. This will always fail if the destination already exists.
We then rely on the cleanup-path to unlink the original temporary
entry.
This will require adjustments in future maintenance tasks on the cache,
since they need to be aware that entries can be hardlinked temporarily.
However, we already consider `uuid-*` entries in the object-store to be
temporary and unaccounted for similar reasons, so this doesn't even
break our cache-maintenance ideas.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Add a helper that copies an entire directory tree including all metadata
into the cache. Use it in the ObjectStore to commit entries.
Unlike FsCache.store() this does not require entering the context from
the call-site. Instead, all data is directly passed to the cache and the
operation is under full control of the cache.
The ObjectStore is adjusted to make use of this. This requires exposing
the root-path (rather than the tree-path) to be accessible for
individual objects, hence a `path`-@property is added alongside the
`tree`-@property. Note that `__fspath__` still refers to the tree-path,
since this is the only path really required for outside access other
than from the object-manager itself.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
The default value for `get()` is `None`, so no reason to specify it
explicitly. Simplify the respective calls in FsCache.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
This will lead to all mtimes that are newer than the creation time
of `tree` being clamped to `source_epoch`, if that was specified
for the pipeline. Specifically it means that all files that were
created during the build will be clamped to it. This should make
builds more reproducible.
When we commit objects to the store and there is a `source_epoch`
set on the `Object`, clamp the mtime. This is needed because it
is possible that the object corresponds to the last stage of a
pipeline[1] and it could later directly be exported without going
through `finalize` again. Also we are doing in on object itself
and not the cloned path so that resuming and checkpointing will
behave identical.
[1] not even necessarily the pipeline we are currently building.
Add a new `source_epoch` attribute that if set, will lead to all
mtimes that are newer or equal to the creation date being clamped
to the specified `source_epoch` time when the object is finalized.
When an new Object is created, save the creation time in a new
metadata entry called `info`. A new property called `created`
is added to inspect the creation date.
New utility function to clamp all mtimes of a given path to a
certain timestamp. Clamp here means that any timestamp later
than the specified upper bound will be set to the upper bound.
We want to add aboot to the list of possible bootloaders so we can
distinguish if we are using aboot or one of the other bootloaders.
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
Add a new field to the cache-information called `version`, which is a
simple integer that is incremented on any backward-incompatible change.
The cache-implementation is modified to avoid any access to the cache
except for `<cache>/staging/`. This means, changes to the staging area
must be backwards compatible at all cost. Furthermore, it means we can
always successfully run osbuild even on possibly incompatible caches,
because we can always just ignore the cache and fully rely on the
staging area being accessible.
The `load()` method will always return cache-misses. The `store()`
method simply discards the entry instead of storing it. Note that
`store()` needs to provide a context to the caller, hence this
implementation simply creates another staging-context to provide to the
caller and then discard. This is non-optimal, but keeps the API simple
and avoids raising an exception to the caller (but this can be changed
if it turns out to be problematic or unwanted).
Lastly, the `cache.info` field behaves as usual, since this is also the
field used to read the cache-version. However, this file is never
written to improve resiliency and allow blacklisting buggy versions from
the past.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Port the existing object store tests from `unittest` to `pytest`.
Allow all tests that can run without root privileges to do so. No
functional change of the test itself.
The default cache location for `osbuild-image-test` is actually
`/var/lib/osbuild/store`. Pass that to `osbuild` when setting
the `maximum cache size to set the size for the correct location.
Integrate the recently added file system cache `FsCache` into our
object store `ObjectStore`. NB: This changes the semantics of it:
previously a call to `ObjectStore.commit` resulted in the object
being in the cache (i/o errors aside). But `FsCache.store`, which
is now the backing store for objects, will only commit objects if
there is enough space left. Thus we cannot rely that objects are
present for reading after a call to `FsCache.store`. To cope with
this we now always copy the object into the cache, even for cases
where we previously moved it: for the case where commit is called
with `object_id` matching `Object.id`, which is the case for when
`commit` is called for last stage in the pipeline. We could keep
this optimization but then we would have to special case it and
not call `commit` for these cases but only after we exported all
objects; or in other words, after we are sure we will never read
from any committed object again. The extra complexity seems not
worth it for the little gain of the optimization.
Convert all the tests for the new semantic and also remove a lot
of them that make no sense under this new paradigm.
Add a new command line option `--cache-max-size` which will set
the maximum size of the cache, if specified.
Code is based on `common.DataSizeToUint64` in Composer, with a
modification to allow `unlimited` so that the result is compatible
with `fscache.MaximumSizeType`.
[1] f4aed3e6e2/internal/common/helpers.go (L46)
In the `store_server` test, pass the store to `enter_context`,
instead of the `stack`; the latter is an interesting form of
recursion, and totally not what we want.
Instead of transmitting stage metadata over a socket and then
writing it via `Object.meta.write`, use the latter and bind
mount the corresponding file into the stage so it can directly
be written to from the stage. Change `api.metadata` to do so,
which means that this change is transparent for the stages.
Now that metadata is stored and can be accessed via `Object.meta`,
read it from the built or stored objects when serializing the
result in the `format.output` functions.
Integrate the new `Metadata` object as `meta` property on `Object`.
Use it to actually store metadata after a successful stage run.
A new class `PathAdapter` is introduce which is in turned used to
expose the base path of `Object` as `os.PathLike` so it can be
passed as path to `Metadata`. The advantage is that any changes
to the base path in `Object` will automatically be picked up by
`Metadata`; the prominent, and currently only, case where this is
happening in `Object` is `store_tree`.
Implement a new class, nested inside `Object`, to read and write
metadata. It is indexed by a key and individual pieces of meta-
data are stored in separate files. Empty files are not created.
Create a proper `ObjectStore.Object` to use as tree for the `run`
method of `Stage`, since that is what is also normally passed to
it from `Pipeline.run`. It prepares for a future where `Object`
is not just used as `os.PathLike`.
Extract the piece of code that gathers the metadata from the
result struct into its own (nested) method. It is easier to
read but also prepares for a future change where we read the
metadata from the store instead of the result dict.
Move the reporting of results into the try-cache and ObjectStore
context. This prepares to use the store during the `fmt.output`
call and possible reporting of store cache usages.
Instead of storing the (tree) data directly at the root of the
object specific directory, move it into a `data/tree` subfolder.
This prepares for two things:
1) the `tree` folder will allow us to add another folder next to
it to store metadata.
2) storing both, `tree` and the future metadata folder in a
common subfolder, prepares for the future integration
with the new caching layer (`FsCache`).
If a home directory is specified for an existing user that does
not have one, `usermod` does not create one. This case is now
detected and `mkhomedir_helper(8)` is run inside the chroot to
create the home dir. In Fedora this utility is provided by the
`pam` package so this is now installed in the corresponding
tests together with a new user that simulates the aforementioned
scenario.
Enahnce the stage description: drop an superflous line and add
a description for the home-dir scenario.