Rename the function to `describe_os()`. We do no actual detection, nor
verification here. That is, the return value of this function is in no
way guaranteed to be a valid runner. That is, error-handling needs to
be done in the caller. Make this clear by renaming the function.
Note: Currently, in case no runner exists for the OS, we end up with:
execv(...) failed: No such file or directory
This needs to be fixed in the future.
Reading from an `Object` via `read` already uses a context manager
to manage the read-only bind mount and also maintain a count of
currently active readers. With this an attempt to start a new
`write` operation while readers were active can be detected and
an exception is throw. Since `write` was not introducing a context
the inverted situation, i.e. reads while a write is ongoing, was
not possible to detect.
This commit therefore introduces a context also for `.write` so
that we can enforce the policy to have either many readers but no
writers, or just one writer and no readers.
A bind mount is also used for write (in read-write mode) to hide
the internal path of the tree.
The exception that was thrown by {stage.run, assembler.run} was a
necessary ingredient that in combination with the context manager
around `Objectstore.new` made sure that tree the object was only
auto-committed to the store when there was no error during the
executing of any of the `.run` methods.
Now that the auto-commit feature got removed and committing of any
object to the store is explicitly done via `objectstore.commit`,
the whole exception throwing and handling can be removed. Status
reporting was already done in `BuildResult.success` and the new
code will use that to exit the function early on stage/asm errors.
Refactor `get_buildtree` to do input/output via `Object`, i.e. by
creating a new `Object`, setting its base accordingly and then
use its `read` and `write` methods. This is what `ObjectStore.get`
does as well.
In the case that there is no build pipeline, use the mount helpers
of `objectstore` instead of the custom mount calls.
Do not automatically commit the last stage of the pipeline to the
store. The last stage is most likely not what should be cached,
because it will contain all the individual customization and thus
be very likely different for different users. Instead, the dnf or
rpm stages have a higher chance of being the same and thus are
better candidates for caching.
Technically this change is done via two big changes that build
upon new features introduces in the previous commits, most notably
the copy on write semantics of Object and that input/output is
being done via `objectstore.Object` instead of plain paths. The
first of the two big changes is to create one new `Object` at
the beginning of `pipeline.run` and use that, in write mode via
`Object.write` across invocations of `stage.run` calls, with
checkpoints being created after each stage on demand.
The very same `Object` is then used in read mode via `Object.read`
as the input tree for the Assembler. After the assembler is done
the resulting image/tree is manually committed to the store.
The other big change is to remove the `ObjectStore.commit` call
from the `ObjectStore.new` method and thus the automatic commit
after the last stage is gone.
NB: since the build tree is being retrieved in `get_buildtree`
from the store, a checkpoint for the last stage of the build
pipeline is forced for now. Future commits will refactor will
do away with that forced commit as well.
Change osbuildtest.TestCase to always create a checkpoint at
the final tree (the last stage of the pipeline), since tests
need it to check the tree contents.
Instead of using custom bind-mount based logic in ObjectStore.get,
use a combination of Object + `Object.read` with the supplied base
(that can be None), which will lead to exactly the same outcome.
Provide a way to read the current contents of the object, in a way
the follows the copy-on-write semantics: If `base` is set but the
object has not yet been written to, the `base` content will be
exposed. If no base is set or the object has been written to, the
current (temporary) tree will be exposed. In either way it is done
via a bind mount so it is assured that the contents indeed can only
be read from, but not written to.
The code also currently make sure that there is no write operation
started as long as there is at least one reader.
Additionally, also introduce checks that the object is intact, i.e.
not cleaned up, for all operations that require such a state.
Analogous to `_path`, it is not possible to identify the intended
mode of the i/o operation from using `open` (whether it is a read
or a write operation) and thus make it an internal method and only
use it for read operations.
Since it is hard to infer the intended modus of the i/o operation,
i.e. whether it is going to be a read or a write from accessing the
`path` property make it an internal method. Do not initialize the
method on property access but return the writable tree, if Object
is initialized, the path to its base tree otherwise.
Adapt all the usage internally: Use `path` for read operations and
initialize the object and then directly use `_tree` for write ops.
As a result of the previous commits that implement copy on write
semantics, `commit` can now be used to create snapshots. Whenever
an Object is committed, its tree is moved to the store and it is
being reset, i.e. a new clean workdir is created and the old one
discarded. The moved tree is then set as the base of the reset
Object. On the next call to `write` the moved tree will be copied
over and forms the basis of the Object again. Should nobody want
to write to Object after the snapshot, i.e. the `commit`, no copy
will be made.
NB: snapshots/commits will act now act as synchronization points:
if a object with the same treesum, i.e. the very same content
already exists, the move (i.e. `store_tree`) will gracefully fail
and the existing content will be set as the base for Object.
Since Object knows its base now, the initialization of the tree
with the content of its base can be delayed until the moment
someone wants to actually modify the tree, thus implementing
copy on write semantics. For this a new `write` method is added
that will initialize the base and return the writable tree. It
should be used instead of `path` whenever the a client wants to
write to the tree of the Object.
Adapt the pipeline and the tests to use the new `write` method
in all the appropriate places.
NB: since the intention can not be inferred when using `path`
directly, the Object is still being initialized there.
When a new Object is created it can have a `base`, i.e. another
object that is already committed to the store, which is then used
to initialize the tree of the new object. That is, the contents
of the new Object will be based on the contents of the existing.
The initialization of an Object with its base (if any) was done
by the ObjectStore. Move all of that logic inside `Object`:
The Object will store its base, which `Object.init` will use to
initialize itself. Additionally, if `Object.path` is accessed
`init` is being called as well to make sure it is properly
initialized, i.e. the tree initialized with the base content.
Refactor the `ObjectStore.snapshot` method to take an `Object` not
a plain filesystem tree, so the latter is more encapsulated from
the ObjectStore user (e.g. the pipeline) and prepares a unified
code-path for `snapshot` and `commit` in the future.
Now that Object manages its work directory itself, re-create the
latter when the its tree is moved, i.e. when the object is being
committed to the store. This means that after the object has been
written to the store it is in the same state is if it was new and
can be used in the very same way.
If the move itself fails (the rename(2) fails), the tree and its
contents is cleaned up with the reset of the work directory.
Rename the `move` method to `store_tree` to better reflect how the
method should be used, i.e. to store the tree corresponding to the
Object instance.
When a new object is being created, a corresponding temporary
directory is created within the file system tree of the object
store, which shall be called the "work dir". Within that dir a
well-known directory ('tree') is created that then is the root
of the filesystem tree[1] that clients use to store the tree
or the resulting image in.
Previously, the work dir was managed, i.e. created and cleaned
up (via a context manager) by the ObjectStore. Now the Object
itself manages the tree and thus the lifetime of the work dir
is more directly integrated and controlled by it. As a result
the Object itself is now a context manager. On exit of the
context the work dir is cleaned up.
[1] For the assembler this is the output directory that will
contain the final image.
Instead of just returning the path of the temporary object that is
created in .new() the actual instance of the new `Object` is being
returned, which can then provide a richer interface for clients
than a plain directory path.
Keep are reference to the parent store, which this object is tied
to. It is currently not yet used directly but is a preparation for
a closer Object and ObjectStore integration that will happen in
commits to follow.
As the name implies, the ObjectStore stores objects, which can be
trees but also everything an Assembler can make of the input tree,
like qcow2 images, tarballs and other non tree-like outputs.
Therefore rename the TreeObject to Object to better reflect that it
is representing any object, not only trees, in the store.
Detect the host dynamically from os-release(5) instead of relying on the
`org.osbuild.host` symlink.
It is awkward to install a symlink that tells osbuild which distro is is
running on, when there is a standard way to detect this.
This makes it easier to run osbuild from sources and removes the need to
include every host in the spec file. The latter became hard to do,
because there's no obvious way to distinguish RHEL minor releases.
Currently stdin is taken to be the pipeline to be built, this allows
it to be instead a map containing the suorces and the pipeline.
We would imagine passing around the sources and pipeline together, so
this just makes the behavior of osbuild more closely match the intended
use and semantics of the sources configuration.
This keeps backwards compatibility for now, but that may be dropped as
soon as osbuild-composer no longer relies on the old behavior.
Disable too-many-{branches,statements} pylint warnings in __main__.py.
These do not seem helpful, but could be reenabled if we drop some
options in the future.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Make the sources options a static property of the pipeline, in
particular of each stage, rather than being passed in on `run()`.
This more closely matches the intended semantics of sources and
pipeline having similar lifetimes and being fairly coupled together.
The difference between the pipeline and the sources is that the
sources do not contribute to identifying the pipeline (they are not
part of the hash for the pipeline id), and they could be swapped
out without changing the output image (as long as they are valid).
However, a pipeline without A sources object would not be useful,
and typically the pipeline and the sources are generated, passed
around and used together.
This is different from the build environment and the secrets object,
which both are specific to either the host or the caller, unlike
the pipeline which should be universal.
This changes the `load()` function to take a `manifest`, which is
a map containing both the pipeline and the sources.
Note that the semantics of the build-env parameter remains unchanged:
It shares the sources with the rest of the pipeline. We may want to
reconsider this in future commits, as the build-env is specific to
the host, whereas the regular pipeline is not.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Appart from giving us a hard time on s390x, this feature did not seem
to have a measurable effect. Moreover, O_DIRECT is not supported by
tmpfs so without this patch we could not use tmpfs as backing store,
which does speed up image generation considerably.
Drop the flag and and rather put the store on tmpfs in order to speed
things up.
This source adds support for downloaded files. The files are
indexed by their content hash, and the only option is their URL.
The main usecase for this will be downloading rpms. Allowing depsolving
to be done outside of osbuild, network access to be restricted and
downloaded rpms to be reused between runs.
Each source is now passed two additional arguments, a cache directory
and an output directory. Both are in the source's namespace, and
the source is responsible for managing them. Each directory may
contain contents from previous runs, but neither is ever guaranteed
to do so.
Downloaded contents may be saved to the cache and resued between
runs, and the requested content should be written to the output dir.
If secrets are used, the source must only ever write contents to
the output that corresponds to the available secrets (rather than
contents from the cache from previous runs).
Each stage is passed an additional argument, a sources directory.
The directory is read-only, and contains a subdirectory named after
each used source, which will contain the requseted contents when
the `Get()` call returns (if the source uses this functionality).
Based on a patch by Lars Karlitski.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Add a new `--checkpoint` option, which can be provided multiple
times, that indicate after which stages a the current stage of
the tree should be committed to the object store; the tree id
will be the treesum of the tree at that point and a reference
is created with the id of the stage at the point.
The argument to `--checkpoint` is the id of the stage. If not
all the given checkpoints can be found the execution will be
aborted.
When the tree is committed to the objects directory of the object-
store, it is done via rename(3). The two possible errors that can
be raised in case that a non-empty tree with the same name already
exist is [EEXIST] or [ENOTEMPTY]. The latter was already ignored
but the former was not. At least on btrfs former will be raised
File "/home/gicmo/Code/src/osbuild/osbuild/objectstore.py",
os.rename(tree.root, output_tree)
FileExistsError: [Errno 17] File exists: 'store/tmpyyi3yvie/tree' -> 'store/objects/…'
Add a new method to the ObjectStore that takes a path to a file
system tree, which is currently being built, and commits it to
the store and references it via a given object_id.
The tree is copied to a temporary location (co-located in the
store to enable fast copying via reflinks) and then atomically
moved into the ObjectStore's objects path via rename(3).
Extract the code from ObjectStore.new that will commit the filled
tree to the store into its own method so it can be used from a
future method to snapshot trees at random points in time.
Introduce a small `TreeObject` class that is the representation of
a tree during its construction. It supports calculating its treesum
as well initialize the new tree with an existing one.
This makes sure all disk access is backed by the same disk. We may
want this for performance reasons (avoiding moving across disks), but
also to experiment with different backing stores for all disk access.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Currently /var was always backed by /var/tmp, but we may want to
control exactly what it is backed by. The default is the same, so
this is not a behavioral change.
The dnf stage wants to import `osbuild.sources` but currently the
osbuild module is not available in the stages. Apply the same hack
done in the Assembler also in for the stages, i.e. bind mount the
osbuild module to the stages/osbuild.
This happens rarely when the same loop device is used in rapid
succession. The kernel flushes the page cache asynchronously, which
means that it might not be cleared yet when a new file is bound.
`set_status` checks if the cache is clear (`set_fd` doesn't).
Handle this by trying a different device when `set_status` returns
`EBUSY`.
Fixes#177
Don't wait until python's garbage collector closes the file descriptors
to loop devices. Close them when the `LoopServer` context manager exits,
after an assembler has finished running.
The z Initial Program Loader (zipl) when creating the bootmap in
bootmap_creat (src/zipl/bootmap.c) wants to create a device node
via misc_temp_dev (bootmap_create:1141) for the device that it
is installing the bootloader to[1]. Currently access to loopback
devices is allowed from within the container (it is used to mount
the image), but only read/write access. On s390x also allow the
creation of device nodes, so zipl can do its work and install
the bootloader stages on the "disk".
[1] zipl source at commit dcce14923c3e9615df53773d1d8a3a22cbb23b96
Add a new command line option `--secrets`, which accepts a JSON file
that is structured similarly to a source file. It is should contain data
that is necessary to fetch content, but shouldn't appear in any logs.