Create a `InputService` class with an abstract method called `map`,
meant to be implemented by all inputs. An `unmap` method may be
optionally overridden by inputs to cleanup resources.
Instantiate a `host.ServiceManager` in the `Stage.run` section and
pass the to the host side input code so it can be used to spawn the
input services.
Convert all existing inputs to the new service framework.
Instead of bind-mounting each individual input into the container,
create a temporary directory that is used by all inputs and bind-
mount this to the well known location ("/run/osbuild/inputs"). The
temporary directory is then passed to the input so that it can
make the requested resources available relative to that directory.
This is enforced by the common input handling code.
Additionally, pass the well known input path via a new "paths" key
to the arguments dictionary passed to the stage.
All sources are now pre-fetched before any pipeline and thus any
stage is being built. Additionally, in the version 1 foramt, all
stages that were using source are converted to use inputs when
the manifest is loaded. Thus, nothing should use `source.get`
and thus the sources API (`SourcesServer`) anymore.
Since pipelines can now be uniquely addressed via their names,
add the ability to checkpoint via the pipeline name. This will
effectively checkpoint the last stage of a pipeline.
For format v1 manifests, the build pipeline is called "build",
the main pipeline is called "tree" and the pipeline for the
assembler is called "assembler".
All sources fetch various types of `items`, the specific nature
of which is dependent on the source type, but they are all
identifyable by a opaque identifier. In order for osbuild to
check that all the inputs that a stage needs are are indeed
contained in the manifest description, osbuild must learn what
ids are fetched by what source. This is done by standarzing
the common "items" part, i.e. the "id" -> "options for that id"
mapping that is common to all sources.
For the version 1 of the format, extract the files and ostree
the item information from the respective options.
Adapt the sources (files, ostree) so that they use the new items
information, but also fall back to the old style; the latter is
needed since the sources tests still uses the SourceServer.
In much the same way has `Pipeline` has `add_stage` and `Manifest`
has `add_pipeline`, introduce an `add_input` method to `Stage` to
be able add `Inputs` to stages.
Add a new `add_source` method that will add an individual `Source`
to a `Manifest` give its `ModuleInfo` and options. The dictionary
of source options in the manifest is replaced with a list of such
`Sources` and `add_source` will append to it. Adap the version 1
format code to use `add_source` and reconstruct the source options
from the list of source on `describe`.
Remove the `sources_options` constructor parameter for `Manifest`
and adapt all the source base for this.
Include all inputs of a stage during the calculation of its id,
since they determine, very much like options, the content the
stage produces; thus different inputs should lead to different
ids.
Now that `Pipelines` have no assemblers anymore and thus only one
identifier, i.e. the one corresponding to the tree (`tree_id`),
the `id` and `tree_id` are now the same. Therefore replace the
usage of `tree_id` with `id` and drop the former. Add some extra
documentation including some caveats about the uniquness of `id`.
Convert the assembler phase of the main pipeline in the old format
into a new Pipeline that as the assembler as a stage, where the
input of that stage is the main pipeline. This removes the need of
having "assemblers" as special concepts and thus the corresponding
code in `Pipeline` is removed. The new assembler pipeline is marked
as exported, but the pipeline that builds the tree is not anymore.
Adapt the `describe` and `output` functions of the `v1` format to
handle the assembler pipeline. Also change the tests accordingly.
NB: The id reported for the assembler via `--inspect` and the result
will change as a result of this, since the assembler stage is now
the first and only stage of a new pipeline and thus has no base
anymore.
Since `__iter__` is return an iterator over the `Pipeline` objects,
the `"name" in manifest` check would not work for name or ids. Thus
provide an implemention of `__contains__` that does exactly that.
Add a new helper helper, `Manfifest.get` that will return a
pipeline give a name or an id or `None` if no pipeline could
be found with either. The implementation is taken from the
existing `__getitem__` method and the latter as now based on
the new `get` method.
Every pipeline that gets added to the `Manifest` now need to have
a unique name by which it can be identified. The version 1 format
loader is changed so that the main pipeline that builds the tree
is always called `tree`. The build pipeline for it will be called
`build` and further recursive build pipelines `build-build`, where
the number of repetitions of `build` corresponds to their level of
nesting. An assembler, if it exists, will be added as `assembler`.
The `Manifest.__getitem__` helper is changed so it will first try
to access pipeline via its name and then fall back to an id based
search. NB: in the degenrate case of multiple pipelines that have
exactly the same `id`, i.e. same stages, with the same options and
same build pipeline, only the first one will be return; but only
the first one here will be built as well, so this is in practice
not a problem.
The formatter uses this helper to get the tree pipeline via its
name wherever it is needed.
This also adds an `__iter__` method `Manifest` to ease iterating
over just the pipeline values, a la `for pipeline in manifet`.
Add a new `export` property to the `Pipeline` object that indicates
whether a the result, i.e. the tree after the pipelines has been
built, should be exported, i.e. copied to the output directory.
In the current format (v1), the main pipeline, gets marked as such
by the corresponding loader.
Instead of passing all pre-created pipelines to the Manifest
constructor, add a `add_pipeline` method, analogous to the
existing `Pipeline.add_{stage, assembler}` methods. Convert
the format loading code to use that and remove the constructor
parameter.
Now that assemblers are represented via the `Stage` class, the
Assembler class is not needed anymore. Adjust the monitor method
to take an `pipeline.Stage` for the `assembler` method as well.
Instead of using the `Assemblers` class to represent assemblers,
use the `Stage` class: The `Pipeline.add_assembler` method will
now instantiate and `Stage` instead of an `Assembler`. The tree
that the pipeline built is converted to an Input (while loading
the manifest description in `format/v1.py`) and all existing
assemblers are converted to use that input as the tree input.
The assembler run test is removed as the Assembler class itself
is not used (i.e. run) anymore.
Instead of creating the temporary directory for the BuildRoot and
the sources output directly at the store root, create them inside
the store's temporary directory.
Support for inputs. Before the stage is executed all inputs of the
stage are run. The returned path is mapped inside the sandbox and
pass, along with the returned data, as part of the arguments to the
stage via new "inputs" dictionary. They keys represent the input
keys as given in the manifest.
Now that the `Stage` contains the `ModuleInfo`, which contains the
path the to executable, this can directly be used to execute the
stage. To do so, the path to the executable is bind-mounted to a
well known path inside the sandbox (`/run/osbuild/bin/$id`) and
this is then supplied to the build root as executable to run.
Add a new `info` property that holds the `meta.ModuleInfo` info
for the stage. This gives each instance of a stage access to
meta (or class) information about it, i.e. its schema, docs but,
more importantly, also its name and path to the executable.
Thefore the `name` property is coverted into a transient property
which access the `name` member of `info`.
Change the `formats/v1` load mechanism to carry a new `index`
argument which is used to load the `ModuleInfo` for each stage.
Adapt all tests to load the info as well when creating stages.
All tests and invocations of `add_stage` actually pass a valid
options dictionary. Thefore move the `options` args before
the `sources` arg and remove the default value (`None`).
Instead of having build pipelines nested within the pipeline it is
the build pipeline for, the nested structure is transferred into a
flat list of pipelines. As a result the recursion is gone and all
the pipelines and trees are build one after the other. This is now
possible since floating objects are kept alive by the store itself
and all trees that are being built are transparently via them.
The immediate result dictionary changed accordingly. To keep the
JSON output of osbuild the same, the result is now routed through
a format specific converter.
Additionally, the v1 format module gained a function to retrieve
the global tree_id and output_id. With the new models those global
ids will go away eventually and thus need to go through the format
specific code.
This is a step towards generic pipelines, i.e. replacing assemblers
with pipelines, thus creating an acyclic graph of pipelines. There
the pipeline id will be what is now the tree_id. For now though the
generic id is either the output_id or the tree_id.
The current pipeline code used to set a base for a tree object
that might or might not exist. Depending on it it would either
use that object or reset its base. Avoid doing that because it
prohibits us from properly interpreting the `id` of an object
if the latter is also set when `base_id` is assigned, since
that base might not exist and thus the `id` would not actually
mean that the the contents of tree associated with the object.
Therefore we use `ObjectStore.get` and return the result if it
is not None or a fresh Object otherwise.
Every time a stage has been successfully built, the contents of
the tree now corresponds to the stage and can thus be identified
via the id of the stage.
When the tree is being written to, i.e. on consecutive attempts
of stage builds, the `id` of the tree object will automatically
be reset.
The object in question will be cleaned when the store goes out of
context, which happens soon after the manual cleanup anyway and
the eager cleanup does not gain us much.
More importantly, it removes the special case for the assembler
output object, since trees build by the stages are not cleaned
up manually already.
Instead of passing the store directory to Pipeline.run, pass an
already initialized ObjectStore object. This binds the lifetime
of the store and its (temporary) objects to the run of osbuild
not the run of the pipeline.
This prepares re-using the stores with multiple (non-nested)
pipelines.
The 'Manifest' class represents what to build and the necessary
sources to do so. For now thus it is just a combination of the
pipeline the source options.
The description of a pipeline is format dependent and thus needs
to be located at the specific format module.
Temporarily remove two tests; they should be added back to a format
specific test suit.
Extract the code that loads a pipeline from a pipeline description,
i.e. a manifest, into a new module inside a new 'formats' package.
The idea is to have different descriptions, i.e. different formats,
for the same internal representation. This allows changing the
internal representation, i.e. data structures, but still having the
same external description.
Later a new description might be added that better matches the new
internal representation.
The "/run/osbuild" path is used as the default runpath by the
BuildRoot, which creates it on demand. The only other place
is the API (`BaseAPI`) to create the socket directories in,
but that is now also created on-demand. Additionally, the
API are only run after the build root has been set up so that
directory would already exist.
Extract the existing code that creates the runner for the host
build container into a small helper method, so it can be re-used
in other places, like the tests.
Rename the `API.exception` member to `API.error`, to make it more
generic, so it can also be used for other sort of errors in the
future. Also add a layer of additional structure with `type` and
`data` members so different types of errors apart. Currently only
`exception` is used.
Adapt the tests in test/mod/test_api.py to check for the new
structure and its content.
Create a new api endpoint called exception, that communicates
exception backtraces separately back to osbuild, as opposed to
dumping them into the normal log. Additionally, add a corresponding
test to check that a call to api.exception correctly sets
API.exception.
Now that the BuildRoot is capable of capturing the output of the
runner and modules (stages, assemblers), there is no need for
using `api.setup_stdio`. Therefore, drop it from all runners and
replace `api.output` with `BuildRoot.output`, which will contain
the output if `api.setup_stdio` is not called from the runners.
In case that bubblewrap fails to, e.g. because it fails to execute
the runner, it will print an error message to stderr. Currently,
this output is not capture and thus not logged. To fix that, the
`BuildRoot.run` method now takes a monitor object and will stream
stdout/stderr to the log via the monitor.
Change the API endpoint to prevent retrieving monitor-output from a
running instance. Instead, we require the caller to exit the API context
before querying the monitor-output. This guarantees that the api-thread
was synchronously taken down and scheduled any outstanding events.
This fixes an issue where a side-channel notifies us of a buildroot
exit, but the api-thread has not yet returned from epoll, and thus might
not have dispatched pending I/O events, yet. If we instead wait for the
thread to exit, we have a synchronous shutdown and know that all
*ordered* kernel events must have been handled.
In particular, imagine a build-root program running (like `echo` in the
test_monitor unittest) which writes data to the stdout-pipe and then
immediately exits. The syscall-order guarantees that the data is written
to the pipe before the SIGCHLD is sent (or wait(2) returns). However, we
retrieve the SIGCHLD from our main-thread usually (p.join() in our test,
and BuildRoot() in our main code), while the pipe-reading is done from
an API thread. Therefore, we might end up handling the SIGCHLD first
(just imagine a single-threaded CPU that schedules the main task before
the thread). To avoid this race, we can simply synchronize with the
api-thread. Since we already have this synchronization as part of the
api-thread takedown, it is as simple as stopping the api-thread before
continuing with operations.
Lastly, if a write operation to a pipe was issued, we are guaranteed
that a SIGCHLD synchronization across processes is ordered correctly.
Furthermore, the python event-loop also guarantees that stopping an
event-loop will necessarily dispatch all outstanding events. A read is
guaranteed to be outstanding in our race-scenario, so the read will be
dispatched. The only possible problem is `_output_ready()` only
dispatching a maximum of 4096 bytes. This might need to be fixed
separately. A comment is left in place.
Make the output_directory argument in Pipeline.assemble
and Assembler.run required. The qemu assembler assumes
it is passed in args and will crash without it. Making
it mandatory prevents this.