Extract the existing code that creates the runner for the host
build container into a small helper method, so it can be re-used
in other places, like the tests.
Rename the `API.exception` member to `API.error`, to make it more
generic, so it can also be used for other sort of errors in the
future. Also add a layer of additional structure with `type` and
`data` members so different types of errors apart. Currently only
`exception` is used.
Adapt the tests in test/mod/test_api.py to check for the new
structure and its content.
Create a new api endpoint called exception, that communicates
exception backtraces separately back to osbuild, as opposed to
dumping them into the normal log. Additionally, add a corresponding
test to check that a call to api.exception correctly sets
API.exception.
Now that the BuildRoot is capable of capturing the output of the
runner and modules (stages, assemblers), there is no need for
using `api.setup_stdio`. Therefore, drop it from all runners and
replace `api.output` with `BuildRoot.output`, which will contain
the output if `api.setup_stdio` is not called from the runners.
In case that bubblewrap fails to, e.g. because it fails to execute
the runner, it will print an error message to stderr. Currently,
this output is not capture and thus not logged. To fix that, the
`BuildRoot.run` method now takes a monitor object and will stream
stdout/stderr to the log via the monitor.
Change the API endpoint to prevent retrieving monitor-output from a
running instance. Instead, we require the caller to exit the API context
before querying the monitor-output. This guarantees that the api-thread
was synchronously taken down and scheduled any outstanding events.
This fixes an issue where a side-channel notifies us of a buildroot
exit, but the api-thread has not yet returned from epoll, and thus might
not have dispatched pending I/O events, yet. If we instead wait for the
thread to exit, we have a synchronous shutdown and know that all
*ordered* kernel events must have been handled.
In particular, imagine a build-root program running (like `echo` in the
test_monitor unittest) which writes data to the stdout-pipe and then
immediately exits. The syscall-order guarantees that the data is written
to the pipe before the SIGCHLD is sent (or wait(2) returns). However, we
retrieve the SIGCHLD from our main-thread usually (p.join() in our test,
and BuildRoot() in our main code), while the pipe-reading is done from
an API thread. Therefore, we might end up handling the SIGCHLD first
(just imagine a single-threaded CPU that schedules the main task before
the thread). To avoid this race, we can simply synchronize with the
api-thread. Since we already have this synchronization as part of the
api-thread takedown, it is as simple as stopping the api-thread before
continuing with operations.
Lastly, if a write operation to a pipe was issued, we are guaranteed
that a SIGCHLD synchronization across processes is ordered correctly.
Furthermore, the python event-loop also guarantees that stopping an
event-loop will necessarily dispatch all outstanding events. A read is
guaranteed to be outstanding in our race-scenario, so the read will be
dispatched. The only possible problem is `_output_ready()` only
dispatching a maximum of 4096 bytes. This might need to be fixed
separately. A comment is left in place.
Make the output_directory argument in Pipeline.assemble
and Assembler.run required. The qemu assembler assumes
it is passed in args and will crash without it. Making
it mandatory prevents this.
Change the default of libdir to /usr/lib/osbuild and
remove redundant logic. Additionally, change how the
python package is detected.
Instead of checking if libdir is None, check if
/usr/lib/osbuild is empty - i.e. if the user has specified
a different directory than the default.
Rely on the ability of `BaseAPI` to auto-generate socket addresses
when no one was provided. The `BuildRoot` does not rely on the
sockets being created in the `BuildRoot.api` directory anymore and
will instead bind-mount each individual socket address to the well
known location via the `BaseAPI.endpoint` identifier.
Convert all API providers to take the `socket_address` as an
optional keyword argument.
Register all API end point providers with the `BuildRoot` via the
new `BuildRoot.register_api` call. The context management is now
done via the `BuildRoot` itself.
Use `os.path.join` to build the path for the source cache, instead
of string interpolation. This makes it possible to use other Path
representations, like `pathlib.Path`, transparently.
Currently `objectstore.Object.{read, write}` directly return
strings but in the future they might return an Object that is
an `os.PathLike`, i.e. has a `__fspath__` method, instead.
Prepare for that by ensuring all `tree`s are converted to their
file system representation via `os.fspath` when needed, e.g.
when creating the bind-mount arguments for the `BuildRoot`.
Introduce the concept of pipeline monitoring: A new monitor class is
passed to the pipeline.run() function. The main idea is to separate
the monitoring from the code that builds pipeline. Through the build
process various methods will be called on that object, representing
the different steps and their targets during the build process. This
can be used to fully stream the output of the various stages or just
indicate the start and finish of the individual stages.
This replaces the 'interactive' argument throughout the pipeline
code. The old interactive behavior is replicated via the new
`LogMonitor` class that logs the beginning of stages/assembler,
but also streams all the output of them to stdout.
The non-interactive behavior of not reporting anything is done by
using the `NullMonitor` class, which in turn outputs nothing.
The way secrets work has been changed via commit 372b117: instead
of passing them in via the command line, the information how to
obtain secrets are encoded along the sources themselves.
The only stage that still has support for the old style way is the
deprecated org.osbuild.dnf stage, which might be removed in the
near future.
Change the assembler-commit to be conditional on checkpoints, just like
we already do for stages. This means, assembler output is not
automatically committed, but only if you requested so via a checkpoint.
With this in place we can start sharing caches in osbuild-composer. The
only thing in the cache will be sources as well as checkpointed stages.
We can start checkpointing known pipelines and thus make use of the
cache. Furthermore, we can cache sources, as long as we do not fetch an
unbound set of RPMs. However, our RPM set is currently static, so this
should not be an issue. Nevertheless, it is up to Composer to decide
when to enable the cache.
We no longer need the `tree_id` in the osbuild output. All callers have
been converted to use other means. Drop the ID from the output and
avoid exposing our internals.
Now that no caller requires the "output_id" anymore, drop it from our
results-dictionary. Instead, pass the output-directory through and copy
outputs where we produce / fetch them.
This still uses `objectstore.resolve_ref()`, since we do not have the
outputs pinned at the places where we want to copy. This needs a little
bit more rework, but we might just delay that until we have the cache
rework landed.
This already simplifies the output-directory path and drops the slight
hack which checked very late for produced outputs.
Note that we must be careful not to copy things too early, because we
do not want remnants in the output-directory if we return failure.
Hence, keep the copy-operation close to the commit-operation on the
store.
All callsites of `Pipeline.assemble()` already check early whether the
output-object exists in the store and then return it. Checking again in
`assemble()` will never catch anything (unless another stage would
happen to produce the same ID as the assembler as a side-effect).
It does seem useful to keep the shortcuts in `assemble()`, so other
callers would get the shortcut as well. However, this does not really
work well right now, since you want to skip the stage-compilation as
well, and `assemble()` is really just the last step of the job. Hence,
it really is the job of the pipeline-executor to check early.
With that in mind, lets drop this fast-path which has no effect in the
current setup.
We want to get rid of `tree_id` and `output_id` because the they
are now considered internals of the store and clients should not
use them directly. NB: they are still there indirectly as the id
of the last stage and the assembler.
Also, the `output_id` was never correct here, because it was the
`tree_id` as well. Ups.
Make sure to verify that the pipeline actually produced any output
before attempting to copy it out. This fixes osbuild running with
`--output-directory` but without assembler.
The idea is that source can themselves spawn other modules, esp.
new secrets modules. For this they need to know the library dir,
aka 'libdir' throughout the osbuild source. Therefore change the
SourceServer to directly get the library directory instead of
just the sub-directory to the sources. Then pass the library
directory to via the JSON API to the source.
Adjust all usage of the SourceServer, including the tests.
Drop the `kwargs` forwarding from buildroot.run() to subprocess.run().
We do not use it other than for `stdin=subprocess.DEVNULL`. Set that
option directly instead.
Doing the kwargs forwarding mixes the argument namespaces and is very
hard to read. It is not clear from the call-site which argument goes to
buildroot.run() and which to subprocess.run().
Lastly, it requires us to manually fetch `check` just to make pylint
happy. Lets just drop this dance and make the API explicit.
Drop the --build-env command-line argument. It is not used by anything.
Furthermore, our manifests now allow embedding build-environments, so
there is little reason to continue supporting this.
Add a option to all description methods to include the respective
ids in the description. Defaults to False to preserve the original
output which is used in the tests.
We want to run stages and other scripts inside of the nspawn containers
we use to build pipelines. Since our pipelines are meant to be
self-contained, this should imply that the build-root must have osbuild
installed. However, this has not been the case so far for several
reasons including:
1. OSBuild is not packaged for all the build-roots we want to support
and thus we have the chicken-and-egg problem.
2. During testing and development, we want to support using a local
`libdir`.
3. We already provide an API to the container. Importing scripts from
the outside just makes this API bigger, but does not change the
fact that build-roots are not self-contained. Same is true for the
running kernel, and probably much more..
With all this in mind, our strategy probably still is to eventually
package osbuild for the build-root. This would significantly reduce our
API exposure, points-of-failure, and host-reliance. However, this switch
might still be some weeks out.
With this in mind, though, we can expect the ideal setup to have a full
osbuild available in the build-root. Hence, any script we import so far
should be able to access the entire `libdir`. This commit unifies the
libdir handling by installing the symlinks into `libdir` and providing
a single bind-mount of the module-path into `libdir`.
We can always decide to scratch that in the future when we scratch the
libdir-import from the host-root. Until then, I believe this commit
nicely unifies the way we import the module both in a local checkout as
well as in the container.
Add a new output-directory argument which specifies where to store
result objects. For now, this is purely optional and simply copies from
the old `output_id` into the specified directory. This allows a
backwards compatible transition towards removing any external access to
the osbuild cache.
Note that this has still lots of room for improvements:
* We only support assembler-output for now, but we could also easily
support entire trees as output, in case no assembler was selected.
Alternatively, we could introduce a "copy" assembler, that just
outputs the input tree.
* This parameter is optional, but should really be mandatory. There
is little reason to have the default behavior just dropping any
generated content. This would be a breaking change, though.
* We could move data out of a temporary object-store entry, rather
than copy it. But again, for backwards-compatibility, we leave the
latest store-object intact and do not move things out of it.
* We could now transition towards never committing anything to the
store, not even output IDs, unless explicitly checkpointed.
Causes a problem with ostree-osbuild on CI (travis) otherwise:
Traceback (most recent call last):
File "osbuild-ostree", line 345, in <module>
sys.exit(main())
File "osbuild-ostree", line 337, in main
return build(args)
File "osbuild-ostree", line 257, in build
output_id, commit_id = build_commit(builddir, args)
File "osbuild-ostree", line 162, in build_commit
r = pipeline.run(store.store,
File "/home/travis/build/gicmo/ostree-osbuild-demo/osbuild/osbuild/pipeline.py", line 358, in run
r = self.assemble(object_store,
File "/home/travis/build/gicmo/ostree-osbuild-demo/osbuild/osbuild/pipeline.py", line 314, in assemble
r = self.assembler.run(input_dir,
File "/home/travis/build/gicmo/ostree-osbuild-demo/osbuild/osbuild/pipeline.py", line 148, in run
osbuild_module_path = os.path.dirname(importlib.util.find_
Move the whole result handling of the assembler outside the context
manager; this includes the cleanup of the object in the error case
which would conflict with the ongoing write operation inside the
context manager and thus lead to a crash:
Traceback (most recent call last):
File "/usr/bin/osbuild", line 11, in <module>;
load_entry_point('osbuild==10', 'console_scripts', 'osbuild')()
File "/usr/lib/python3.7/site-packages/osbuild/__main__.py", line 99, in main
secrets=secrets
File "/usr/lib/python3.7/site-packages/osbuild/pipeline.py", line 362, in run
libdir)
File "/usr/lib/python3.7/site-packages/osbuild/pipeline.py", line 324, in assemble
output.cleanup()
File "/usr/lib/python3.7/site-packages/osbuild/objectstore.py", line 160, in cleanup
self._check_writer()
File "/usr/lib/python3.7/site-packages/osbuild/objectstore.py", line 178, in _check_writer
raise ValueError("Write operation is ongoing")
ValueError: Write operation is ongoing
If there is a build pipeline specified, always build it, even if
there are no accompanying stages. If we short-circuit earlier and
ignore the build pipeline section, errors in the build pipeline
would not be caught at all.
The `build_stages` method short-circuits and returns early in case
any of the stages fail to build and returns None for the tree, and
build tree, therefore both of those can immediately cleaned up at
that point.
For this add a small helper `cleanup` that will call the cleanup
method for all supplied arguments, after filtering out None values.
Delay the cleanup of the build tree of the build pipeline, and
first check the result and only cleanup the tree when the build
did not fail, because in that case both returned trees will be
None and trying to cleanup them up will result in an exception.
Therefore, also don't clean up `tree` in the error case.
If the final object, image, artifact, already exists in the store,
short-circuit and return directly from `Pipeline.run`. Otherwise
the situation might arise that the final result is in the store,
but the tree (and build trees) are not and thus the tree would be
built, just to be thrown away when the assembler phase detects
that the final output already exists.
Extract the code that assembles the tree into its own method as
it was previously done for the stages. This should make the new
method as well as `Pipeline.run` method easier to read.
Refactor the building of stages and the build tree so that no auto
commit is done at the end of the build pipeline anymore, i.e. the
respective build tree(s) are not commit to the store unless that
was explicitly enabled via a checkpoint.
NB: `objectstore.Object`s are used not via a context manager
anymore, because they are returned from the `build_stages` method
to make the code easier to use and read. Cleanup of Objects during
a KeyboardInterrupt exception (Ctrl-C) are handled by using the
ObjectStore with a context manager, which on exit of the context
will cleanup all objects. Due to a big in python[1] this is indeed
more robust than using `with object_store.new() as tree` because
that is translated[2] to something like:
1: mgr = (EXPR)
2: exit = type(mgr).__exit__
3: value = type(mgr).__enter__(mgr)
-> 4: # NOTE: KeyboardInterrupt here will "leak" value
5: try:
6: [...]
7: finally:
8: if exc:
9: exit(mgr, None, None, None)
Which can leave the tree initialized but not cleaned up if the
KeyboardInterrupt happens exactly line 4.
[1] https://bugs.python.org/issue29988
[2] https://www.python.org/dev/peps/pep-0343/
Only commit checkpoints to the object store if the run of the
stage or assembler was successful. Otherwise we commit a empty,
corrupted or old tree to the store. Any subsequent run might
then pick up that bogus tree as a starting point.
Rename the function to `describe_os()`. We do no actual detection, nor
verification here. That is, the return value of this function is in no
way guaranteed to be a valid runner. That is, error-handling needs to
be done in the caller. Make this clear by renaming the function.
Note: Currently, in case no runner exists for the OS, we end up with:
execv(...) failed: No such file or directory
This needs to be fixed in the future.
Reading from an `Object` via `read` already uses a context manager
to manage the read-only bind mount and also maintain a count of
currently active readers. With this an attempt to start a new
`write` operation while readers were active can be detected and
an exception is throw. Since `write` was not introducing a context
the inverted situation, i.e. reads while a write is ongoing, was
not possible to detect.
This commit therefore introduces a context also for `.write` so
that we can enforce the policy to have either many readers but no
writers, or just one writer and no readers.
A bind mount is also used for write (in read-write mode) to hide
the internal path of the tree.
The exception that was thrown by {stage.run, assembler.run} was a
necessary ingredient that in combination with the context manager
around `Objectstore.new` made sure that tree the object was only
auto-committed to the store when there was no error during the
executing of any of the `.run` methods.
Now that the auto-commit feature got removed and committing of any
object to the store is explicitly done via `objectstore.commit`,
the whole exception throwing and handling can be removed. Status
reporting was already done in `BuildResult.success` and the new
code will use that to exit the function early on stage/asm errors.
Refactor `get_buildtree` to do input/output via `Object`, i.e. by
creating a new `Object`, setting its base accordingly and then
use its `read` and `write` methods. This is what `ObjectStore.get`
does as well.
In the case that there is no build pipeline, use the mount helpers
of `objectstore` instead of the custom mount calls.
Do not automatically commit the last stage of the pipeline to the
store. The last stage is most likely not what should be cached,
because it will contain all the individual customization and thus
be very likely different for different users. Instead, the dnf or
rpm stages have a higher chance of being the same and thus are
better candidates for caching.
Technically this change is done via two big changes that build
upon new features introduces in the previous commits, most notably
the copy on write semantics of Object and that input/output is
being done via `objectstore.Object` instead of plain paths. The
first of the two big changes is to create one new `Object` at
the beginning of `pipeline.run` and use that, in write mode via
`Object.write` across invocations of `stage.run` calls, with
checkpoints being created after each stage on demand.
The very same `Object` is then used in read mode via `Object.read`
as the input tree for the Assembler. After the assembler is done
the resulting image/tree is manually committed to the store.
The other big change is to remove the `ObjectStore.commit` call
from the `ObjectStore.new` method and thus the automatic commit
after the last stage is gone.
NB: since the build tree is being retrieved in `get_buildtree`
from the store, a checkpoint for the last stage of the build
pipeline is forced for now. Future commits will refactor will
do away with that forced commit as well.
Change osbuildtest.TestCase to always create a checkpoint at
the final tree (the last stage of the pipeline), since tests
need it to check the tree contents.