Commit graph

60 commits

Author SHA1 Message Date
Tom Gundersen
e97f6ef34e objectstore: don't store objects by their treesum
The treesum of a filesystem tree is the content hash of all its
files, its directory structure and file metadata.

By storing trees by their treesum we avoid storing duplicates of
identical trees, at the cost of computing the hashes for every
commit to the store.

This has limited benefit as the likelihood of two trees being
identical is slim, in particular when we already have the ability
to cache based on pipeline/stage ID (i.e., we can avoid rebuilding
trees if the pipelines that built them were the same).

Drop the concept of a treesum entirely, even though I very much
liked the idea in theory...

Signed-off-by: Tom Gundersen <teg@jklm.no>
2021-12-16 16:44:07 +00:00
Christian Kellner
0c871c26c0 objectstore: use recursive bind mounts
When bind-mounting the tree for i/o, use recursive bind mounts.
This could be needed in the case that `/usr` is not one single
mount but assembled from different ones. Normally this should
not be the case but we want to support in, just in case.
Conversely, when unmounting, do so recursively too.
NB: This should not make any differences for trees that we have
built ourselves since they don't contain any mounts.
2021-07-09 18:09:37 +01:00
Christian Kellner
2b4e913e1e objectstore: only bind-mount /usr for host trees
The only thing we should ever need from the host is `/usr`. Therefore
instead of bind-mounting the entirety that is `/`, just bind-mount
`/usr`.
2021-07-09 18:09:37 +01:00
Christian Kellner
23628b3f62 objectstore: sync before unmounting
This should, in theory, not be necessary because the bubblewrap
process and its children should be stopped already and umount
should just block until it is finished. But, if the store is on
a filesystem, like the one used by docker machine, unmounting
frequently produces errors like:
  `umount: .../tmp9nlyzwdu-writer: target is busy.`
Syncing the filesystem before that seems to help in some cases
and it surely does not hurt.
2021-07-07 17:24:58 +01:00
Christian Kellner
18f2d8ced5 objectstore: eagerly unmount bind-mounts
In the object store, temporary bind mounts are used when accessing the
content, i.e. the individual trees. Their unmount is currently done
with the `--lazy` flag. The use of this flag goes way back to commit
da121beda1, which sadly does not mention
why the flag was introduced. Since the tree and files in the tree will
be used by consequent stages it seems reasonable to do the un-mounting
eagerly and thus this reverts back to that behavior.
2021-06-23 21:01:05 +01:00
Christian Kellner
496d21de54 objectstore: sub-tree support for read_at
Add the ability to only read a sub-tree of a tree via `Object.read_at`.
Expose the functionality via the `Store{Server,Client}.read_tree_at`.
Extend the tests to check this new functionality.
2021-06-09 18:37:47 +01:00
Christian Kellner
1743eceb41 objectstore: runtime exceptions for mount errors
Instead if using `check=True` for `subprocess.run`, which turns
a process failure (i.e. non-zero return codes) into generic a
`CalledProcessError` exception, use `check=False` and explicitly
handle mount errors, translating them into a `RuntimeError` with
a better error message.
2021-06-09 18:37:47 +01:00
Christian Kellner
f8428e56e2 objectstore: add Object.read_at method
Implement a new `read_at` method that will bind mount the tree of the
object to a specified location, instead of a temporary directory as
it done in the `read` method. Implement the latter via `read_at`.
Implement the corresponding methods for `Store{Client,Server}`. Since
the `ObjectStore.read_at` method will fail if the target directory
does not exist (or is of the wrong type), catch any exceptions in
the `StoreServer` and send those to the `StoreClient` via an `error`
entry.
This one is for David: also fix a missing blank line.
2021-06-09 18:37:47 +01:00
Christian Kellner
aa9fee7b51 objectstore: add source method to api
Add a new jsoncomm rpc method call, `source`, that will return
the directory within the store where resources for that specific
type of resource, like e.g. tree, files, or ostree can be found
or stored.
2021-02-06 12:04:30 +01:00
Christian Kellner
79d9066861 objectstore: add api server and client
The StoreServer and corresponding Client provide access to small
subset of the store methods to other process than the main osbuild one.
Currently it can be used to read trees of objects given their id and
create  temporary directories within the store's tmp path.
The lifetime of the result of both operations are bound to the Server.
2021-01-18 17:44:46 +01:00
Christian Kellner
76e72b1c3f objectstore: keep strong reference of objects
The objectstore always tracked all objects that were returned from
it, but it did so via weak references, which means it did not keep
the objects alive itself. With the introduction of identifiers for
temporary objects (floating objects), it makes sense to keep all
created objects alive so that they can in fact be used.
2021-01-15 13:20:31 +01:00
Christian Kellner
e5b12e55f4 objectstore: transparant access for floating objs
A "floating" object is a temporary object that is identified, i.e.
has an `id` and is thus also locked, but is not committed to the
store.
The `contains` and `get` methods of ObjectStore will now return such
floating objects as if they were committed ones, provind transparent
access to object that have been built during the exectuin of osbuild.
2021-01-15 13:20:31 +01:00
Christian Kellner
f7bcec60f4 objectstore: make objects identifyable
This adds a new `id` property to the ObjectStore.Object, that is
meant to reflect the identifer of the Stage to build the contents
of it. This will help to transparently access objects that have
been built but not committed to the store.
Setting the `base_id` of an object will also set its `id`. When
the object is then modified via write() the `id` will be set to
None, since no the content and the id are out of sync. In the
same way, restting an object will reset its `id` to None.
2021-01-15 13:20:31 +01:00
Christian Kellner
b7ae7a01c6 objectstore: fix typo in comment
It is "already" not "alreday".
2020-12-04 12:28:30 +01:00
chloenayon
35fa429965 objectstore: get returns object not path
Change objectstore.get to return an object or None instead of a path.
2020-08-26 15:10:12 +02:00
Christian Kellner
0aa44c23bb objectstore: use types.PathLike
Use the new `types.PathLike`, which is exactly the type that this
module defined too.
2020-07-27 12:50:38 +01:00
Christian Kellner
6813fa4acc objectstore: proper path handling for ObjectStore
Instead of using string interpolation, use `os.path.join` in all
places. This should allow the use of `os.PathLike` objects as well
as bytes (i.e. `objectstore.PathLike` types) to be used and is
generally cleaner.
2020-07-22 09:37:30 +01:00
Christian Kellner
833a79ee6f objectstore: support os.PathLike in Object.export
Support `os.PathLike` arguments in `Object.export` by explicitly
converting the supplied argument via `os.fspath`. Additionally,
declare the support for those via the Python typing system with
a new Union type for general `PathLike` type, i.e. all valid
types for `os.fspath`, which are `str`, `bytes`, `os.PathLike`.
2020-07-22 09:37:30 +01:00
Christian Kellner
8250bd0b94 objectstore: re-use Object.export in Object.init
Instead of having a duplication of the invocation of `cp`, once in
`init`, once in `export`, re-use the latter in the former: the to
be copied object is accessed in the normal way via the store, and
then "exported" to the new location. This gets rid of the call to
resolve_ref as a nice side effect, which means less poking into
the internals of the store.
2020-07-22 09:37:30 +01:00
Christian Kellner
291fadd0b2 pylint: increase max attributes to 10
In three places we have more than 7 instances attributes, but less
then 10; instead of disabling the warning for all these cases,
increase the limit to a reasonable size of 10 and re-enable the
warnings in all the places.
2020-07-21 13:25:04 +02:00
David Rheinsberg
43ddcf895d pipeline: drop output_id and pull in output-directory
Now that no caller requires the "output_id" anymore, drop it from our
results-dictionary. Instead, pass the output-directory through and copy
outputs where we produce / fetch them.

This still uses `objectstore.resolve_ref()`, since we do not have the
outputs pinned at the places where we want to copy. This needs a little
bit more rework, but we might just delay that until we have the cache
rework landed.

This already simplifies the output-directory path and drops the slight
hack which checked very late for produced outputs.

Note that we must be careful not to copy things too early, because we
do not want remnants in the output-directory if we return failure.
Hence, keep the copy-operation close to the commit-operation on the
store.
2020-05-28 11:16:15 +02:00
David Rheinsberg
8a195d7502 util/ctx: extract suppress_oserror()
Extract the `suppress_oserror()` function from the ObjectManager and
make it available as utility for other code as well.

This also adds a bunch of tests that verify it works as expected.
2020-05-11 18:05:12 +02:00
David Rheinsberg
2624be92dc osbuild: cleanup contextlib usage
Two cleanups for the context-managers we use:

  * Use `contextlib.AbstractContextManager` if possible. This class
    simply provides a default `__enter__` implementation which just
    returns `self`. So use it where applicable.
    Additionally, it provides an abstract `__exit__` method and thus
    allows static checks for an existance of `__exit__` in the dependent
    class. We might use that everywhere, but this is a separate
    decision, so not included here.

  * Explicitly return `None` from `__exit__`. The python docs state:

        If an exception is supplied, and the method wishes to suppress
        the exception (i.e., prevent it from being propagated), it
        should return a true value. Otherwise, the exception will be
        processed normally upon exit from this method.

    That is, unless we want exceptions to be suppressed, we should
    never return a `truthy` value. The python contextlib suggest using
    `None` as a default return value, so lets just do that.

    In particular, the explicit `return exc_type is None` that we use
    has no effect at all, since it only returns `True` if no exception
    was raised.

    This commit cleans this up and just follows what the `contextlib`
    module does and returns None everywhere (well, it returns nothing
    which apparently is the same as returning `None` in python). It is
    unlikely that we ever want to suppress any exceptions, anyway.
2020-04-21 16:02:20 +02:00
David Rheinsberg
2cc9160099 objectstore: extract remove_tree()
Move remove_tree() into its own module in `osbuild.util.rmrf`. This way
we can use it in other modules as well, without cross-referencing
internal helpers.
2020-04-21 14:46:02 +02:00
Christian Kellner
64b8c0643a objectstore: use ioctl to clear immutable flag
Instead of using the chattr binary, which adds another dependency
use what amounts to ioctl(fd, ,FS_IOC_SETFLAGS, ~FS_IMMUTABLE_FL),
to clear the immutable flag. Constants are taken from linux/fs.h.
2020-03-30 23:58:33 +02:00
Christian Kellner
04aa5e0aeb objectstore: manually cleanup tree dir for Object
The tree, which is created by stages and assemblers, might contain
immutable files, which for Python 3 currently (version 3.8) leads
to errors when the tempfile.TemporaryDirectory is being cleaned up.
Therefore, manually cleanup the tree directory, if it exists, via
shutil.rmtree with a custom onerror handler that also removes the
immutable bit on permission errors.
2020-03-30 23:58:33 +02:00
Christian Kellner
457f21a336 objectstore: add HostTree class to access host fs
Simple new object that should expose the root file system with the
same API as `objectstore.Object` but as read-only. This means that
the `read` call works exactly as for `Object` but `write` raises
an exception.
Add tests to specifically check the read-only properties.
2020-03-07 17:13:21 +01:00
Christian Kellner
856698ee9c objectstore: keep track of created objects
Keep track of all created objects via weak references. Add support
to use ObjectStore as context manager and ensure that all objects
are cleaned up when the context is exited.
2020-03-07 17:13:21 +01:00
Christian Kellner
5fbf5bb431 objectstore: use 'tmp' subdir for temp directories
Instead of creating temporary directories at the root of the store
create them in a sub-directory called 'tmp'. This should make it
easy to cleanup left-over (temporary) dirs in case of crashes.
Additionally, it has the nice side effect that it is possible to
check that there are no objects that are still in-flight, i.e. not
cleaned-up.
2020-03-07 17:13:21 +01:00
Christian Kellner
01d989e718 objectstore: remove context manager for new method
Turn `ObjectStore.new` into a plain method, since `Object` itself
can be used as a context manager, which is now directly returned,
instead of internally wrapped in a `with` statement and then
yielded. Thus for callers of the method nothing changes and the
behavior of `with objectstore.new() as x` is exactly the same.
2020-03-07 17:13:21 +01:00
Christian Kellner
4b790ac284 objectstore: use a context also for Object.write
Reading from an `Object` via `read` already uses a context manager
to manage the read-only bind mount and also maintain a count of
currently active readers. With this an attempt to start a new
`write` operation while readers were active can be detected and
an exception is throw. Since `write` was not introducing a context
the inverted situation, i.e. reads while a write is ongoing, was
not possible to detect.
This commit therefore introduces a context also for `.write` so
that we can enforce the policy to have either many readers but no
writers, or just one writer and no readers.
A bind mount is also used for write (in read-write mode) to hide
the internal path of the tree.
2020-02-29 01:14:24 +01:00
Christian Kellner
42a365d12f osbuild: no auto commit of the last stage
Do not automatically commit the last stage of the pipeline to the
store. The last stage is most likely not what should be cached,
because it will contain all the individual customization and thus
be very likely different for different users. Instead, the dnf or
rpm stages have a higher chance of being the same and thus are
better candidates for caching.
Technically this change is done via two big changes that build
upon new features introduces in the previous commits, most notably
the copy on write semantics of Object and that input/output is
being done via `objectstore.Object` instead of plain paths. The
first of the two big changes is  to create one new `Object` at
the beginning of `pipeline.run` and use that, in write mode via
`Object.write` across invocations of `stage.run` calls, with
checkpoints being created after each stage on demand.
The very same `Object` is then used in read mode via `Object.read`
as the input tree for the Assembler. After the assembler is done
the resulting image/tree is manually committed to the store.
The other big change is to remove the `ObjectStore.commit` call
from the `ObjectStore.new` method and thus the automatic commit
after the last stage is gone.
NB: since the build tree is being retrieved in `get_buildtree`
from the store, a checkpoint for the last stage of the build
pipeline is forced for now. Future commits will refactor will
do away with that forced commit as well.
Change osbuildtest.TestCase to always create a checkpoint at
the final tree (the last stage of the pipeline), since tests
need it to check the tree contents.
2020-02-28 16:11:49 +01:00
Christian Kellner
7720b5508d objectstore: refactor .get() to use Object
Instead of using custom bind-mount based logic in ObjectStore.get,
use a combination of Object + `Object.read` with the supplied base
(that can be None), which will lead to exactly the same outcome.
2020-02-28 16:11:49 +01:00
Christian Kellner
be8aafbb90 objectstore: Object.read() for read only access
Provide a way to read the current contents of the object, in a way
the follows the copy-on-write semantics: If `base` is set but the
object has not yet been written to, the `base` content will be
exposed. If no base is set or the object has been written to, the
current (temporary) tree will be exposed. In either way it is done
via a bind mount so it is assured that the contents indeed can only
be read from, but not written to.
The code also currently make sure that there is no write operation
started as long as there is at least one reader.
Additionally, also introduce checks that the object is intact, i.e.
not cleaned up, for all operations that require such a state.
2020-02-28 16:11:49 +01:00
Christian Kellner
c73a28613b objectstore: fix Object._open exception handling
Move the call to os.open outside of the try block so that if an
exception occurs it will be properly propagated to the callers.
2020-02-28 16:11:49 +01:00
Christian Kellner
007488488e objectstore: extract mount code to small helpers
Extract the mount code into small little helpers, intended to be
reused from different places. Adapt ObjectStore to use those.
2020-02-28 16:11:49 +01:00
Christian Kellner
e610fa9659 objectstore: use private bind mounts for get()
Use `--make-private` for the bind mount in `ObjectStore.get`.
2020-02-28 16:11:49 +01:00
Christian Kellner
9b61f50792 objectstore: make Object.open a private method
Analogous to `_path`, it is not possible to identify the intended
mode of the i/o operation from using `open` (whether it is a read
or a write operation) and thus make it an internal method and only
use it for read operations.
2020-02-28 16:11:49 +01:00
Christian Kellner
3258bb62d4 objectstore: make Object.path a private property
Since it is hard to infer the intended modus of the i/o operation,
i.e. whether it is going to be a read or a write from accessing the
`path` property make it an internal method. Do not initialize the
method on property access but return the writable tree, if Object
is initialized, the path to its base tree otherwise.
Adapt all the usage internally: Use `path` for read operations and
initialize the object and then directly use `_tree` for write ops.
2020-02-28 16:11:49 +01:00
Christian Kellner
0ef5de3c94 objectstore: Object stores it base id not path
Instead of storing the base path, store the object id of its base
and resolve it to the path via the ObjectStore whenever needed.
2020-02-28 16:11:49 +01:00
Christian Kellner
6a2a7d99f7 objectstore: unify commit and snapshot code paths
As a result of the previous commits that implement copy on write
semantics, `commit` can now be used to create snapshots. Whenever
an Object is committed, its tree is moved to the store and it is
being reset, i.e. a new clean workdir is created and the old one
discarded. The moved tree is then set as the base of the reset
Object. On the next call to `write` the moved tree will be copied
over and forms the basis of the Object again. Should nobody want
to write to Object after the snapshot, i.e. the `commit`, no copy
will be made.
NB: snapshots/commits will act now act as synchronization points:
if a object with the same treesum, i.e. the very same content
already exists, the move (i.e. `store_tree`) will gracefully fail
and the existing content will be set as the base for Object.
2020-02-28 16:11:49 +01:00
Christian Kellner
39213b7f44 objectstore: copy on write semantics for Object
Since Object knows its base now, the initialization of the tree
with the content of its base can be delayed until the moment
someone wants to actually modify the tree, thus implementing
copy on write semantics. For this a new `write` method is added
that will initialize the base and return the writable tree. It
should be used instead of `path` whenever the a client wants to
write to the tree of the Object.
Adapt the pipeline and the tests to use the new `write` method
in all the appropriate places.
NB: since the intention can not be inferred when using `path`
directly, the Object is still being initialized there.
2020-02-28 16:11:49 +01:00
Christian Kellner
0874b80734 objectstore: Object knows its base
When a new Object is created it can have a `base`, i.e. another
object that is already committed to the store, which is then used
to initialize the tree of the new object. That is, the contents
of the new Object will be based on the contents of the existing.
The initialization of an Object with its base (if any) was done
by the ObjectStore. Move all of that logic inside `Object`:
The Object will store its base, which `Object.init` will use to
initialize itself. Additionally, if `Object.path` is accessed
`init` is being called as well to make sure it is properly
initialized, i.e. the tree initialized with the base content.
2020-02-28 16:11:49 +01:00
Christian Kellner
25b3807a5b objectstore: snapshot takes Object not path
Refactor the `ObjectStore.snapshot` method to take an `Object` not
a plain filesystem tree, so the latter is more encapsulated from
the ObjectStore user (e.g. the pipeline) and prepares a unified
code-path for `snapshot` and `commit` in the future.
2020-02-28 16:11:49 +01:00
Christian Kellner
5deb1be514 objectstore: change Object.move to .store_tree
Now that Object manages its work directory itself, re-create the
latter when the its tree is moved, i.e. when the object is being
committed to the store. This means that after the object has been
written to the store it is in the same state is if it was new and
can be used in the very same way.
If the move itself fails (the rename(2) fails), the tree and its
contents is cleaned up with the reset of the work directory.
Rename the `move` method to `store_tree` to better reflect how the
method should be used, i.e. to store the tree corresponding to the
Object instance.
2020-02-28 16:11:49 +01:00
Christian Kellner
6d14dee9a2 objectstore: object manages its work dir
When a new object is being created, a corresponding temporary
directory is created within the file system tree of the object
store, which shall be called the "work dir". Within that dir a
well-known directory ('tree') is created that then is the root
of the filesystem tree[1] that clients use to store the tree
or the resulting image in.
Previously, the work dir was managed, i.e. created and cleaned
up (via a context manager) by the ObjectStore. Now the Object
itself manages the tree and thus the lifetime of the work dir
is more directly integrated and controlled by it. As a result
the Object itself is now a context manager. On exit of the
context the work dir is cleaned up.

[1] For the assembler this is the output directory that will
    contain the final image.
2020-02-28 16:11:49 +01:00
Christian Kellner
399606528c objectstore: helper to create temp dirs inside the store
Create a small helper method that creates a new temporary directory
of type tempfile.TemporaryDirectory within the store and returns it.
2020-02-28 16:11:49 +01:00
Christian Kellner
d10537da42 objectstore: yield Object not path from .new()
Instead of just returning the path of the temporary object that is
created in .new() the actual instance of the new `Object` is being
returned, which can then provide a richer interface for clients
than a plain directory path.
2020-02-28 16:11:49 +01:00
Christian Kellner
52736169f1 objectstore: Object keeps reference to store
Keep are reference to the parent store, which this object is tied
to. It is currently not yet used directly but is a preparation for
a closer Object and ObjectStore integration that will happen in
commits to follow.
2020-02-28 16:11:49 +01:00
Christian Kellner
19f49e5dc3 objectstore: rename TreeObject to Object
As the name implies, the ObjectStore stores objects, which can be
trees but also everything an Assembler can make of the input tree,
like qcow2 images, tarballs and other non tree-like outputs.
Therefore rename the TreeObject to Object to better reflect that it
is representing any object, not only trees, in the store.
2020-02-28 16:11:49 +01:00