Commit graph

82 commits

Author SHA1 Message Date
Michael Vogt
e35d841509 objectstore: add new skip_preserve_owner to Object.export()
This commit allows to exclude preserving ownership from an object
export. This is required to fix the issue that on macOS the an
podman based workflow cannot export objects with preserving
ownerships.

Originally this was a `no_preserve: Optional[List[str]] = None)`
to be super flexible in what we pass to `cp` but then I felt like
YAGNI - if we need more we can trivially change this (internal)
API again :)
2023-12-20 09:28:39 +01:00
Dusty Mabe
d4b3e3655d objectstore: also mount /etc/containers for "host" buildroot
In the case we are not using a buildroot (i.e. we are using
the host as the buildroot) let's also mount in /etc/containers
into the environment. There are sometimes where software running
from /usr can't operate without configuration in /etc and this
will allow it to work.

An example of software hitting this problem is skopeo. With a
simple config like:

```
version: '2'
mpp-vars:
  release: 38
pipelines:
  - name: skopeo-tree
    # build: name:build
    source-epoch: 1659397331
    stages:
      - type: org.osbuild.skopeo
        inputs:
          images:
            type: org.osbuild.containers
            origin: org.osbuild.source
            mpp-resolve-images:
              images:
                - source: quay.io/fedora/fedora-coreos
                  tag: stable
                  name: localhost/fcos
        options:
          destination:
            type: containers-storage
            storage-path: /usr/share/containers/storage
```

We end up hitting an error like this:

```
time="2023-10-24T18:27:14Z" level=fatal msg="Error loading trust policy: open /etc/containers/policy.json: no such file or directory"
Traceback (most recent call last):
  File "/run/osbuild/bin/org.osbuild.skopeo", line 90, in <module>
    r = main(args["inputs"], args["tree"], args["options"])
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/run/osbuild/bin/org.osbuild.skopeo", line 73, in main
    subprocess.run(["skopeo", "copy", image_source, dest], check=True)
  File "/usr/lib64/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['skopeo', 'copy', 'dir:/tmp/tmp5_qcng99/image', 'containers-storage:[overlay@/run/osbuild/tree/usr/share/containers/storage+/run/containers/storage]localhost/fcos']' returned non-zero exit status 1.
```

This PR adds in a mount for /etc/containers from the host so that
/etc/containers/policy.json can be accessed.
2023-10-25 22:05:54 +02:00
Simon de Vlieger
d60690ce46 tox: add tox
`tox` is a standard testing tool for Python projects, this allows you to
test locally with all your installed Python version with the following
command:

`tox -m test -p all`

To run the tests in parallel for all supported Python versions.

To run linters or type analysis:

```
tox -m lint -p all
tox -m type -p all
```

This commit *also* disables the `import-error` warning from `pylint`,
not all Python versions have the system-installed Python libraries
available and they can't be fetched from PyPI.

Some linters have been added and the general order linters run in has
been changed. This allows for quicker test failure when running
`tox -m lint`. As a consequence the `test_pylint` test has been removed
as it's role can now be fulfilled by `tox`.

Other assorted linter fixes due to newer versions:
- use a str.join method (`consider-using-join`)
- fix various (newer) mypy and pylint issues
- comments starting with `#` and no space due to `autopep8`

This also changes our CI to use the new `tox` setup and on top of that
pins the versions of linters used. This might move into separate
requirements.txt files later on to allow for easier updating of those
dependencies.
2023-08-01 15:01:13 +02:00
David Rheinsberg
8a9efa89fc util/fscache: provide store_tree() helper
Add a helper that copies an entire directory tree including all metadata
into the cache. Use it in the ObjectStore to commit entries.

Unlike FsCache.store() this does not require entering the context from
the call-site. Instead, all data is directly passed to the cache and the
operation is under full control of the cache.

The ObjectStore is adjusted to make use of this. This requires exposing
the root-path (rather than the tree-path) to be accessible for
individual objects, hence a `path`-@property is added alongside the
`tree`-@property. Note that `__fspath__` still refers to the tree-path,
since this is the only path really required for outside access other
than from the object-manager itself.

Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
2022-12-20 16:56:32 +01:00
Christian Kellner
15dc8b7a00 objectstore: clamp mtime on commit
When we commit objects to the store and there is a `source_epoch`
set on the `Object`, clamp the mtime. This is needed because it
is possible that the object corresponds to the last stage of a
pipeline[1] and it could later directly be exported without going
through `finalize` again. Also we are doing in on object itself
and not the cloned path so that resuming and checkpointing will
behave identical.

[1] not even necessarily the pipeline we are currently building.
2022-12-15 13:10:35 +00:00
Christian Kellner
76197c70c4 objectstore: support source_epoch for Object
Add a new `source_epoch` attribute that if set, will lead to all
mtimes that are newer or equal to the creation date being clamped
to the specified `source_epoch` time when the object is finalized.
2022-12-15 13:10:35 +00:00
Christian Kellner
b3c53e7275 objectstore: record creation time in Object
When an new Object is created, save the creation time in a new
metadata entry called `info`. A new property called `created`
is added to inspect the creation date.
2022-12-15 13:10:35 +00:00
Christian Kellner
ae0680da11 osbuid: integrate FsCache into ObjectStore
Integrate the recently added file system cache `FsCache` into our
object store `ObjectStore`. NB: This changes the semantics of it:
previously a call to `ObjectStore.commit` resulted in the object
being in the cache (i/o errors aside). But `FsCache.store`, which
is now the backing store for objects, will only commit objects if
there is enough space left. Thus we cannot rely that objects are
present for reading after a call to `FsCache.store`. To cope with
this we now always copy the object into the cache, even for cases
where we previously moved it: for the case where commit is called
with `object_id` matching `Object.id`, which is the case for when
`commit` is called for last stage in the pipeline. We could keep
this optimization but then we would have to special case it and
not call `commit` for these cases but only after we exported all
objects; or in other words, after we are sure we will never read
from any committed object again. The extra complexity seems not
worth it for the little gain of the optimization.
Convert all the tests for the new semantic and also remove a lot
of them that make no sense under this new paradigm.

Add a new command line option `--cache-max-size` which will set
the maximum size of the cache, if specified.
2022-12-09 12:03:40 +01:00
Christian Kellner
1205de0abb objectstore: integrate metadata object
Integrate the new `Metadata` object as `meta` property on `Object`.
Use it to actually store metadata after a successful stage run.
A new class `PathAdapter` is introduce which is in turned used to
expose the base path of `Object` as `os.PathLike` so it can be
passed as path to `Metadata`. The advantage is that any changes
to the base path in `Object` will automatically be picked up by
`Metadata`; the prominent, and currently only, case where this is
happening in `Object` is `store_tree`.
2022-12-09 12:03:40 +01:00
Christian Kellner
fec9dcea97 objectstore: implement a new metadata class
Implement a new class, nested inside `Object`, to read and write
metadata. It is indexed by a key and individual pieces of meta-
data are stored in separate files. Empty files are not created.
2022-12-09 12:03:40 +01:00
Christian Kellner
917c5bb2f5 objectstore: store object data within subfolder
Instead of storing the (tree) data directly at the root of the
object specific directory, move it into a `data/tree` subfolder.
This prepares for two things:
1) the `tree` folder will allow us to add another folder next to
   it to store metadata.
2) storing both, `tree` and the future metadata folder in a
   common subfolder, prepares for the future integration
   with the new caching layer (`FsCache`).
2022-12-09 12:03:40 +01:00
Christian Kellner
f8ca0cf4bc objectstore: direct path i/o for Object
The `Object.{read,write}` methods were introduced to implement
copy on write support. Calling `write` would trigger the copy,
if the object had a `base`. Additionally, a level of indirection
was introduced via bind mounts, which allowed to hide the actual
path of the object in the store and make sure that `read` really
returned a read-only path.
Support for copy-on-write was recently removed[1], and thus the
need for the `read` and `write` methods. We lose the benefits
of the indirection, but they are not really needed: the path to
the object is not really hidden since one can always use the
`resolve_ref` method to obtain the actual store object path.
The read only property of build trees is ensured via read only
bind mounts in the build root.
Instead of using `read` and `write`, `Object` now gained a new
`tree` property that is the path to the objects tree and also
is implementing `__fspath__` and so behaves like an `os.PathLike`
object and can thus transparently be used in many places, like
e.g. `os.path.join` or `pathlib.Path`.

[1] 5346025031
2022-11-21 17:26:53 +01:00
Christian Kellner
74e1dea1f7 objectstore: remove context manager from Object
As `ObjectStore.object` is currently not used via a context
manager anywhere in the source, remove the code.
2022-11-16 11:09:44 +01:00
Christian Kellner
28b8252a04 objectstore: implicit clone based on object ids
If the object's id does not match with the one supplied for the
commit, we create a clone. Otherwise we store the tree.
The code path is arranged in a way that we always go through
`Object.store_tree` so we always call `Object.finalize` as a
prepration for the future, where we might actually do something
meaningful in the finalizer, like reset the *times or count the
tree size.
2022-11-16 11:09:44 +01:00
Christian Kellner
5346025031 objectstore: remove copy on write from object
Remove copy-on-write support from `objectstore.Object`. The main
reason for introducing copy-on-write was to save an additional
copy in the non DAG-pipeline model[1]. With the introduction of
the latter and the explicit `--export` option, we can achieve the
same result without the complexity of copy-on-write semantics.

[1] See commit 39213b7, part of 3b7c87d5..42a365d1 changeset.
2022-11-16 11:09:44 +01:00
Christian Kellner
daa2e1c3bb objectstore: option to clone object on commit
Add a new `clone` parameter to the `commit` method on `ObjectStore`
that when used will clone the object to the store instead of using
the `store_tree` method which moves the object and resets it. This
is the first step of removing copy-on-write support from `Object`.
2022-11-16 11:09:44 +01:00
Christian Kellner
1762048c1f objectstore: add clone method for object
Right now this is basically a clone(!) of `export` but this will
change in the future when we change the layout of how objects
are stored.
2022-11-16 11:09:44 +01:00
Christian Kellner
c3c06a1ebd objectstore: small comment fix
Just fix a typo, and start the comment with a capital letter.
2022-11-16 11:09:44 +01:00
Simon de Vlieger
ea6085fae6 osbuild: run isort on all files 2022-09-12 13:32:51 +02:00
Christian Kellner
2e09e7937c objectstore, move {u,}mount methods to util.mnt
Move the mount and umount helpers to the new mount utility module.
No semantic change in the function.
2022-08-13 19:21:52 +01:00
Simon de Vlieger
3fd864e5a9 osbuild: fix optional-types
Optional types were provided in places but were not always correct. Add
mypy checking and fix those that fail(ed).
2022-07-13 17:31:37 +02:00
Christian Kellner
383e9320ae objectstore: remove unused method from Object
This function was used for the treesum calculations which is not done
anymore. Remove it.
2022-06-21 15:08:32 +02:00
Tom Gundersen
e97f6ef34e objectstore: don't store objects by their treesum
The treesum of a filesystem tree is the content hash of all its
files, its directory structure and file metadata.

By storing trees by their treesum we avoid storing duplicates of
identical trees, at the cost of computing the hashes for every
commit to the store.

This has limited benefit as the likelihood of two trees being
identical is slim, in particular when we already have the ability
to cache based on pipeline/stage ID (i.e., we can avoid rebuilding
trees if the pipelines that built them were the same).

Drop the concept of a treesum entirely, even though I very much
liked the idea in theory...

Signed-off-by: Tom Gundersen <teg@jklm.no>
2021-12-16 16:44:07 +00:00
Christian Kellner
0c871c26c0 objectstore: use recursive bind mounts
When bind-mounting the tree for i/o, use recursive bind mounts.
This could be needed in the case that `/usr` is not one single
mount but assembled from different ones. Normally this should
not be the case but we want to support in, just in case.
Conversely, when unmounting, do so recursively too.
NB: This should not make any differences for trees that we have
built ourselves since they don't contain any mounts.
2021-07-09 18:09:37 +01:00
Christian Kellner
2b4e913e1e objectstore: only bind-mount /usr for host trees
The only thing we should ever need from the host is `/usr`. Therefore
instead of bind-mounting the entirety that is `/`, just bind-mount
`/usr`.
2021-07-09 18:09:37 +01:00
Christian Kellner
23628b3f62 objectstore: sync before unmounting
This should, in theory, not be necessary because the bubblewrap
process and its children should be stopped already and umount
should just block until it is finished. But, if the store is on
a filesystem, like the one used by docker machine, unmounting
frequently produces errors like:
  `umount: .../tmp9nlyzwdu-writer: target is busy.`
Syncing the filesystem before that seems to help in some cases
and it surely does not hurt.
2021-07-07 17:24:58 +01:00
Christian Kellner
18f2d8ced5 objectstore: eagerly unmount bind-mounts
In the object store, temporary bind mounts are used when accessing the
content, i.e. the individual trees. Their unmount is currently done
with the `--lazy` flag. The use of this flag goes way back to commit
da121beda1, which sadly does not mention
why the flag was introduced. Since the tree and files in the tree will
be used by consequent stages it seems reasonable to do the un-mounting
eagerly and thus this reverts back to that behavior.
2021-06-23 21:01:05 +01:00
Christian Kellner
496d21de54 objectstore: sub-tree support for read_at
Add the ability to only read a sub-tree of a tree via `Object.read_at`.
Expose the functionality via the `Store{Server,Client}.read_tree_at`.
Extend the tests to check this new functionality.
2021-06-09 18:37:47 +01:00
Christian Kellner
1743eceb41 objectstore: runtime exceptions for mount errors
Instead if using `check=True` for `subprocess.run`, which turns
a process failure (i.e. non-zero return codes) into generic a
`CalledProcessError` exception, use `check=False` and explicitly
handle mount errors, translating them into a `RuntimeError` with
a better error message.
2021-06-09 18:37:47 +01:00
Christian Kellner
f8428e56e2 objectstore: add Object.read_at method
Implement a new `read_at` method that will bind mount the tree of the
object to a specified location, instead of a temporary directory as
it done in the `read` method. Implement the latter via `read_at`.
Implement the corresponding methods for `Store{Client,Server}`. Since
the `ObjectStore.read_at` method will fail if the target directory
does not exist (or is of the wrong type), catch any exceptions in
the `StoreServer` and send those to the `StoreClient` via an `error`
entry.
This one is for David: also fix a missing blank line.
2021-06-09 18:37:47 +01:00
Christian Kellner
aa9fee7b51 objectstore: add source method to api
Add a new jsoncomm rpc method call, `source`, that will return
the directory within the store where resources for that specific
type of resource, like e.g. tree, files, or ostree can be found
or stored.
2021-02-06 12:04:30 +01:00
Christian Kellner
79d9066861 objectstore: add api server and client
The StoreServer and corresponding Client provide access to small
subset of the store methods to other process than the main osbuild one.
Currently it can be used to read trees of objects given their id and
create  temporary directories within the store's tmp path.
The lifetime of the result of both operations are bound to the Server.
2021-01-18 17:44:46 +01:00
Christian Kellner
76e72b1c3f objectstore: keep strong reference of objects
The objectstore always tracked all objects that were returned from
it, but it did so via weak references, which means it did not keep
the objects alive itself. With the introduction of identifiers for
temporary objects (floating objects), it makes sense to keep all
created objects alive so that they can in fact be used.
2021-01-15 13:20:31 +01:00
Christian Kellner
e5b12e55f4 objectstore: transparant access for floating objs
A "floating" object is a temporary object that is identified, i.e.
has an `id` and is thus also locked, but is not committed to the
store.
The `contains` and `get` methods of ObjectStore will now return such
floating objects as if they were committed ones, provind transparent
access to object that have been built during the exectuin of osbuild.
2021-01-15 13:20:31 +01:00
Christian Kellner
f7bcec60f4 objectstore: make objects identifyable
This adds a new `id` property to the ObjectStore.Object, that is
meant to reflect the identifer of the Stage to build the contents
of it. This will help to transparently access objects that have
been built but not committed to the store.
Setting the `base_id` of an object will also set its `id`. When
the object is then modified via write() the `id` will be set to
None, since no the content and the id are out of sync. In the
same way, restting an object will reset its `id` to None.
2021-01-15 13:20:31 +01:00
Christian Kellner
b7ae7a01c6 objectstore: fix typo in comment
It is "already" not "alreday".
2020-12-04 12:28:30 +01:00
chloenayon
35fa429965 objectstore: get returns object not path
Change objectstore.get to return an object or None instead of a path.
2020-08-26 15:10:12 +02:00
Christian Kellner
0aa44c23bb objectstore: use types.PathLike
Use the new `types.PathLike`, which is exactly the type that this
module defined too.
2020-07-27 12:50:38 +01:00
Christian Kellner
6813fa4acc objectstore: proper path handling for ObjectStore
Instead of using string interpolation, use `os.path.join` in all
places. This should allow the use of `os.PathLike` objects as well
as bytes (i.e. `objectstore.PathLike` types) to be used and is
generally cleaner.
2020-07-22 09:37:30 +01:00
Christian Kellner
833a79ee6f objectstore: support os.PathLike in Object.export
Support `os.PathLike` arguments in `Object.export` by explicitly
converting the supplied argument via `os.fspath`. Additionally,
declare the support for those via the Python typing system with
a new Union type for general `PathLike` type, i.e. all valid
types for `os.fspath`, which are `str`, `bytes`, `os.PathLike`.
2020-07-22 09:37:30 +01:00
Christian Kellner
8250bd0b94 objectstore: re-use Object.export in Object.init
Instead of having a duplication of the invocation of `cp`, once in
`init`, once in `export`, re-use the latter in the former: the to
be copied object is accessed in the normal way via the store, and
then "exported" to the new location. This gets rid of the call to
resolve_ref as a nice side effect, which means less poking into
the internals of the store.
2020-07-22 09:37:30 +01:00
Christian Kellner
291fadd0b2 pylint: increase max attributes to 10
In three places we have more than 7 instances attributes, but less
then 10; instead of disabling the warning for all these cases,
increase the limit to a reasonable size of 10 and re-enable the
warnings in all the places.
2020-07-21 13:25:04 +02:00
David Rheinsberg
43ddcf895d pipeline: drop output_id and pull in output-directory
Now that no caller requires the "output_id" anymore, drop it from our
results-dictionary. Instead, pass the output-directory through and copy
outputs where we produce / fetch them.

This still uses `objectstore.resolve_ref()`, since we do not have the
outputs pinned at the places where we want to copy. This needs a little
bit more rework, but we might just delay that until we have the cache
rework landed.

This already simplifies the output-directory path and drops the slight
hack which checked very late for produced outputs.

Note that we must be careful not to copy things too early, because we
do not want remnants in the output-directory if we return failure.
Hence, keep the copy-operation close to the commit-operation on the
store.
2020-05-28 11:16:15 +02:00
David Rheinsberg
8a195d7502 util/ctx: extract suppress_oserror()
Extract the `suppress_oserror()` function from the ObjectManager and
make it available as utility for other code as well.

This also adds a bunch of tests that verify it works as expected.
2020-05-11 18:05:12 +02:00
David Rheinsberg
2624be92dc osbuild: cleanup contextlib usage
Two cleanups for the context-managers we use:

  * Use `contextlib.AbstractContextManager` if possible. This class
    simply provides a default `__enter__` implementation which just
    returns `self`. So use it where applicable.
    Additionally, it provides an abstract `__exit__` method and thus
    allows static checks for an existance of `__exit__` in the dependent
    class. We might use that everywhere, but this is a separate
    decision, so not included here.

  * Explicitly return `None` from `__exit__`. The python docs state:

        If an exception is supplied, and the method wishes to suppress
        the exception (i.e., prevent it from being propagated), it
        should return a true value. Otherwise, the exception will be
        processed normally upon exit from this method.

    That is, unless we want exceptions to be suppressed, we should
    never return a `truthy` value. The python contextlib suggest using
    `None` as a default return value, so lets just do that.

    In particular, the explicit `return exc_type is None` that we use
    has no effect at all, since it only returns `True` if no exception
    was raised.

    This commit cleans this up and just follows what the `contextlib`
    module does and returns None everywhere (well, it returns nothing
    which apparently is the same as returning `None` in python). It is
    unlikely that we ever want to suppress any exceptions, anyway.
2020-04-21 16:02:20 +02:00
David Rheinsberg
2cc9160099 objectstore: extract remove_tree()
Move remove_tree() into its own module in `osbuild.util.rmrf`. This way
we can use it in other modules as well, without cross-referencing
internal helpers.
2020-04-21 14:46:02 +02:00
Christian Kellner
64b8c0643a objectstore: use ioctl to clear immutable flag
Instead of using the chattr binary, which adds another dependency
use what amounts to ioctl(fd, ,FS_IOC_SETFLAGS, ~FS_IMMUTABLE_FL),
to clear the immutable flag. Constants are taken from linux/fs.h.
2020-03-30 23:58:33 +02:00
Christian Kellner
04aa5e0aeb objectstore: manually cleanup tree dir for Object
The tree, which is created by stages and assemblers, might contain
immutable files, which for Python 3 currently (version 3.8) leads
to errors when the tempfile.TemporaryDirectory is being cleaned up.
Therefore, manually cleanup the tree directory, if it exists, via
shutil.rmtree with a custom onerror handler that also removes the
immutable bit on permission errors.
2020-03-30 23:58:33 +02:00
Christian Kellner
457f21a336 objectstore: add HostTree class to access host fs
Simple new object that should expose the root file system with the
same API as `objectstore.Object` but as read-only. This means that
the `read` call works exactly as for `Object` but `write` raises
an exception.
Add tests to specifically check the read-only properties.
2020-03-07 17:13:21 +01:00
Christian Kellner
856698ee9c objectstore: keep track of created objects
Keep track of all created objects via weak references. Add support
to use ObjectStore as context manager and ensure that all objects
are cleaned up when the context is exited.
2020-03-07 17:13:21 +01:00