Commit graph

340 commits

Author SHA1 Message Date
Christian Kellner
a0e862f083 formats/v1: determine build_id outside param
Instead of determining the build id while constructing the
Pipeline object, do so beforehand. This makes it clearer
what is going.
2021-01-18 17:44:46 +01:00
Christian Kellner
da16fe30bf formats/v1: set stage sources after loading
Instead of carrying around the `sources_options` parameter
through the recursive `load` and `load_build` calls, set
the sources options after loading has completed by iterating
through all stages of all pipelines.
2021-01-18 17:44:46 +01:00
Christian Kellner
698635171c pipeline: refactor args for add_stage
All tests and invocations of `add_stage` actually pass a valid
options dictionary. Thefore move the `options` args before
the `sources` arg and remove the default value (`None`).
2021-01-18 17:44:46 +01:00
Christian Kellner
262877091f osbuild: flatten the pipeline
Instead of having build pipelines nested within the pipeline it is
the build pipeline for, the nested structure is transferred into a
flat list of pipelines. As a result the recursion is gone and all
the pipelines and trees are build one after the other. This is now
possible since floating objects are kept alive by the store itself
and all trees that are being built are transparently via them.
The immediate result dictionary changed accordingly. To keep the
JSON output of osbuild the same, the result is now routed through
a format specific converter.
Additionally, the v1 format module gained a function to retrieve
the global tree_id and output_id. With the new models those global
ids will go away eventually and thus need to go through the format
specific code.
2021-01-15 13:20:31 +01:00
Christian Kellner
54761e8a13 pipeline: introduce generic pipeline id
This is a step towards generic pipelines, i.e. replacing assemblers
with pipelines, thus creating an acyclic graph of pipelines. There
the pipeline id will be what is now the tree_id. For now though the
generic id is either the output_id or the tree_id.
2021-01-15 13:20:31 +01:00
Christian Kellner
76e72b1c3f objectstore: keep strong reference of objects
The objectstore always tracked all objects that were returned from
it, but it did so via weak references, which means it did not keep
the objects alive itself. With the introduction of identifiers for
temporary objects (floating objects), it makes sense to keep all
created objects alive so that they can in fact be used.
2021-01-15 13:20:31 +01:00
Christian Kellner
e5b12e55f4 objectstore: transparant access for floating objs
A "floating" object is a temporary object that is identified, i.e.
has an `id` and is thus also locked, but is not committed to the
store.
The `contains` and `get` methods of ObjectStore will now return such
floating objects as if they were committed ones, provind transparent
access to object that have been built during the exectuin of osbuild.
2021-01-15 13:20:31 +01:00
Christian Kellner
e24dfbd23f pipeline: don't use a non-existing base for trees
The current pipeline code used to set a base for a tree object
that might or might not exist. Depending on it it would either
use that object or reset its base. Avoid doing that because it
prohibits us from properly interpreting the `id` of an object
if the latter is also set when `base_id` is assigned, since
that base might not exist and thus the `id` would not actually
mean that the the contents of tree associated with the object.
Therefore we use `ObjectStore.get` and return the result if it
is not None or a fresh Object otherwise.
2021-01-15 13:20:31 +01:00
Christian Kellner
b039761544 pipeline: identify tree objects during build
Every time a stage has been successfully built, the contents of
the tree now corresponds to the stage and can thus be identified
via the id of the stage.
When the tree is being written to, i.e. on consecutive attempts
of stage builds, the `id` of the tree object will automatically
be reset.
2021-01-15 13:20:31 +01:00
Christian Kellner
f7bcec60f4 objectstore: make objects identifyable
This adds a new `id` property to the ObjectStore.Object, that is
meant to reflect the identifer of the Stage to build the contents
of it. This will help to transparently access objects that have
been built but not committed to the store.
Setting the `base_id` of an object will also set its `id`. When
the object is then modified via write() the `id` will be set to
None, since no the content and the id are out of sync. In the
same way, restting an object will reset its `id` to None.
2021-01-15 13:20:31 +01:00
Christian Kellner
a8783761a1 pipeline: don't eagerly clean up the final object
The object in question will be cleaned when the store goes out of
context, which happens soon after the manual cleanup anyway and
the eager cleanup does not gain us much.
More importantly, it removes the special case for the assembler
output object, since trees build by the stages are not cleaned
up manually already.
2021-01-15 13:20:31 +01:00
Christian Kellner
f38c48086e pipeline: run method takes store object not dir
Instead of passing the store directory to Pipeline.run, pass an
already initialized ObjectStore object. This binds the lifetime
of the store and its (temporary) objects to the run of osbuild
not the run of the pipeline.
This prepares re-using the stores with multiple (non-nested)
pipelines.
2021-01-15 13:20:31 +01:00
Christian Kellner
8d2c7f8160 osbuild: move mark_checkpoints to manifest
Make the checkpoint marking logic a method of the Manifest class.
2021-01-09 18:09:47 +01:00
Christian Kellner
d25936a028 formats: describe now takes a manifest
Instead of a pipeline, describe now takes a Manifest instance.
The reason is that a manifest fully describes the build, which
includes the sources. Now that the describe function takes the
manifest, the sources can be included as well.
Adapt the tests to refelect that change.
2021-01-09 18:09:47 +01:00
Christian Kellner
945914b195 osbuild: introduce Manifest class
The 'Manifest' class represents what to build and the necessary
sources to do so. For now thus it is just a combination of the
pipeline the source options.
2021-01-09 18:09:47 +01:00
Christian Kellner
acef7aa4a9 main_cli: rename manifest → desc
Prepare for having a dedicate manifest class by renaming the
description of the manifest in a given format to "desc".
2021-01-09 18:09:47 +01:00
Christian Kellner
4ab52c3764 formats: move pipeline description here
The description of a pipeline is format dependent and thus needs
to be located at the specific format module.
Temporarily remove two tests; they should be added back to a format
specific test suit.
2021-01-09 18:09:47 +01:00
Christian Kellner
a13783f67b formats: load function takes combined manifest
Instead of having the pipeline and the source option as separate
arguments, the load function now takes the full manifest, which
has those two items combined.
2021-01-09 18:09:47 +01:00
Christian Kellner
0b6f36158d osbuild: use load function via the format module
Instead of importing the load, load_build functions into the osbuild
namespace and using it via that, use the load function via the module
that provides them, i.e. the formats.v1 module.
2021-01-09 18:09:47 +01:00
Christian Kellner
b65211a94d formats/v1: move validation logic here
The validation of the manifest descritpion is eo ipso format
specific and thus belongs into the format specific module.
Adapt all usages throughout the codebase to directly use the
version 1 specific function.
2021-01-09 18:09:47 +01:00
Christian Kellner
b49ee53d0a formats/v1: add type hints
Add type hints to most arguments and return types to aid editors
and type checking.
2021-01-09 18:09:47 +01:00
Christian Kellner
aaf61ce9fc formats: extract manifest loading into module
Extract the code that loads a pipeline from a pipeline description,
i.e. a manifest, into a new module inside a new 'formats' package.
The idea is to have different descriptions, i.e. different formats,
for the same internal representation. This allows changing the
internal representation, i.e. data structures, but still having the
same external description.
Later a new description might be added that better matches the new
internal representation.
2021-01-09 18:09:47 +01:00
Christian Kellner
c466b40e14 cli: remove --source command line option
This was deprecated in favor of always having the source in the
manifest. Remove the command line option and the corresponding
code that would override the sources definitions.
Update the docs accordingly.
2020-12-15 13:12:01 +01:00
Christian Kellner
27d4450352 pipeline: don't create "/run/osbuild" eagerly
The "/run/osbuild" path is used as the default runpath by the
BuildRoot, which creates it on demand. The only other place
is the API (`BaseAPI`) to create the socket directories in,
but that is now also created on-demand. Additionally, the
API are only run after the build root has been set up so that
directory would already exist.
2020-12-04 12:28:30 +01:00
Christian Kellner
35149c6aec api: ensure parent of socket dir exists
When creating the socket directory, i.e. in the case that it was
not specified directly, ensure the parent directories exist.
Make it possible to override that parent directory.
2020-12-04 12:28:30 +01:00
Christian Kellner
b7ae7a01c6 objectstore: fix typo in comment
It is "already" not "alreday".
2020-12-04 12:28:30 +01:00
Christian Kellner
cd1f248dca util/jsoncomm: chain the BufferError in recv
Explicit re-raise the BufferError exception in recv from the orignal
JSONDecodeError, so the latter gets recorded as the underlying cause.

Uncovered by pylint 2.6.0: W0707: "Not using raise from makes the
traceback inaccurate, because the message implies there is a bug in
the exception-handling code itself, which is a separate situation
than wrapping an exception."
2020-10-30 17:28:31 +01:00
Christian Kellner
373f474769 loop: use python 3 style base class initialization
Use the canonical Python3 usage of "super" without any arguments.
pylint 2.6.0 started to actually warn about this.
2020-10-30 17:28:31 +01:00
Christian Kellner
d9168ee625 buildroot: continuously stream log data to monitor
All runners stopped calling `api.setup_stdio` (commit c40b414), and
thus all output of runners and also modules is now redirected to a
pipe (created via Popen and subprocess.PIPE for stdout).
Text was read from that pipe via `stdout.read(4096)`, which means
that it is now buffered in chunks of 4096, where it previously was
line buffered in the case that osbuild was run in the terminal and
--json was not specified. This is very annoying for anyone wanting
to follow osbuild's output in real-time.
Restore the previous behavior by using `os.read`, which should be
a small wrapper around read(3), which does not block until all the
requested data is available but returns early (short reads). This
means, new text will be forwarded as soon is it is available in the
pipe. Increase the read buffer to 32768 while at it, which is what
Popen is using in Python 3.9.
2020-10-28 14:28:07 +01:00
Lars Karlitski
a5d4a8a926 osbuild: always return exit code
osbuild_cli() sometimes returned an exit code, but at the end called
sys.exit() directly. The idea was probably to always return the code
with which the executable should exit.

Make this consistent and call sys.exit() in __main__.py, with the value
returned by osbuild_cli().
2020-10-27 22:04:09 +01:00
Christian Kellner
d9ae219e19 api: transfer metadata context via fd
Metadata information can easily become very big, like in the case
of the package metadata of the org.osbuild.rpm stage, quite likely
exceeding the configured maximum package length of the underlying
socket. To avoid potential issues here, transfer the actual data
by writing it to a temporary file and sending a open fd over.
2020-10-22 22:47:22 +01:00
Christian Kellner
e919f66609 pipeline: use osrelease.DEFAULT_PATHS
Use the newly defined constant that contains the well known paths
for where to look for `os-release` file.
2020-10-21 11:13:28 +02:00
Christian Kellner
807090f4c8 pipeline: introduce detect_host_runner helper
Extract the existing code that creates the runner for the host
build container into a small helper method, so it can be re-used
in other places, like the tests.
2020-10-21 11:13:28 +02:00
Christian Kellner
3010d247ea util/osrelease: add default os-release paths
Add a new `DEFAULT_PATHS` constant, a list of all well known paths
where `os-release` can be found, as per os-release(5).
2020-10-21 11:13:28 +02:00
Christian Kellner
aaa51e22a6 api: properly serialize the exception's traceback
Use `traceback.print_tb()` to serialize the exceptions' backtrace.
The previously used expression `str(e.__traceback__)` will just
give `<traceback object at 0x…>`, which is not very helpful.
Add a test to check that the method name that raises the exception,
also called `exception`, is in the traceback.
2020-10-09 10:47:44 +02:00
Christian Kellner
f8de164413 api: properly encode exception type
When using `str(type(exception))` this ends up to be something like
`<class 'ValueError'>` for a `ValueError` exception. Get the vanilla
name of the exception type via `type(exception).__name__`.
Add a test to ensure that we encode this properly.
2020-10-09 10:47:44 +02:00
Christian Kellner
f5d00dd043 api: use more generic error member for exceptions
Rename the `API.exception` member to `API.error`, to make it more
generic, so it can also be used for other sort of errors in the
future. Also add a layer of additional structure with `type` and
`data` members so different types of errors apart. Currently only
`exception` is used.
Adapt the tests in test/mod/test_api.py to check for the new
structure and its content.
2020-10-09 10:47:44 +02:00
Christian Kellner
78d72eded9 api: whitespaces fixes
No semantic change, just more spaces between functions to make it
more PEP-8 compliant.
2020-10-09 10:47:44 +02:00
Christian Kellner
cbcb335b3e osbuild: fix spelling mistakes found by codespell
Run codespell on the source ('codespell -f -L msdos -S coverity
-S rpmbuild -S samples') and fix all uncovered mistakes.
2020-10-06 14:41:00 +02:00
Chloe Kaubisch
5dc5ddcf29 api: add exception endpoint
Create a new api endpoint called exception, that communicates
exception backtraces separately back to osbuild, as opposed to
dumping them into the normal log. Additionally, add a corresponding
test to check that a call to api.exception correctly sets
API.exception.
2020-10-02 17:49:45 +02:00
chloenayon
01aae91949 api: remove setup_stdio
API.setup_stdio was replaced in PRs 506 and 507,
remove setup_stdio functions and call sites.
2020-09-09 12:52:50 +02:00
chloenayon
b1229de56e pipeline: unify object exporting
Remove output.export and associated logic in pipeline.assemble.
Instead, return output or None, and export only once in pipeline.run.
2020-09-02 17:54:11 +02:00
Christian Kellner
499ae1654e osbuild: replace api.setup_stdio with BuildRoot
Now that the BuildRoot is capable of capturing the output of the
runner and modules (stages, assemblers), there is no need for
using `api.setup_stdio`. Therefore, drop it from all runners and
replace `api.output` with `BuildRoot.output`, which will contain
the output if `api.setup_stdio` is not called from the runners.
2020-08-31 15:06:36 +02:00
Christian Kellner
10579ee6f5 buildroot: return a new CompletedBuild with output
Create a new CompletedBuild object that wraps and is very similar
to the subprocess.CompletedProcess, i.e. it has a process member
but also has shortcuts for returncode. Additionally, the output
of the process is not only forwarded to the monitor, but also
captured and then handed to CompletedBuild, so its output member
will actually contain the full build output. To be compatible
with the previously returned CompletedProcess, `stderr`, `stdout`
members exist on CompletedBuild that also return `output`.
2020-08-31 15:06:36 +02:00
Christian Kellner
96a5499ed9 buildroot: log bubblewrap's output
In case that bubblewrap fails to, e.g. because it fails to execute
the runner, it will print an error message to stderr. Currently,
this output is not capture and thus not logged. To fix that, the
`BuildRoot.run` method now takes a monitor object and will stream
stdout/stderr to the log via the monitor.
2020-08-27 08:07:14 +02:00
chloenayon
3bf5d26c7a pipeline: replace objectstore logic with get call
In pipeline.run, replace calls to objectstore.contains
and objectstore.new with a call to objectore.get, which
has the same functionality.
2020-08-26 15:10:12 +02:00
chloenayon
35fa429965 objectstore: get returns object not path
Change objectstore.get to return an object or None instead of a path.
2020-08-26 15:10:12 +02:00
Christian Kellner
e273dd0084 api: add 'get-arguments' call and client method
Add a new `get-arguments` API call to fetch the input/arguments.
To avoid running into any limitings on maximum package size on
the socket, the actual data is written to a temp file and a fd
to that passed to the client - very much as in `setup_stdio`.

Additionally, new `arguments` method is provided as a client
counterpart for the new API call.
2020-08-25 18:51:55 +02:00
David Rheinsberg
803433fb62 api: prevent early output retrieval
Change the API endpoint to prevent retrieving monitor-output from a
running instance. Instead, we require the caller to exit the API context
before querying the monitor-output. This guarantees that the api-thread
was synchronously taken down and scheduled any outstanding events.

This fixes an issue where a side-channel notifies us of a buildroot
exit, but the api-thread has not yet returned from epoll, and thus might
not have dispatched pending I/O events, yet. If we instead wait for the
thread to exit, we have a synchronous shutdown and know that all
*ordered* kernel events must have been handled.

In particular, imagine a build-root program running (like `echo` in the
test_monitor unittest) which writes data to the stdout-pipe and then
immediately exits. The syscall-order guarantees that the data is written
to the pipe before the SIGCHLD is sent (or wait(2) returns). However, we
retrieve the SIGCHLD from our main-thread usually (p.join() in our test,
and BuildRoot() in our main code), while the pipe-reading is done from
an API thread. Therefore, we might end up handling the SIGCHLD first
(just imagine a single-threaded CPU that schedules the main task before
the thread). To avoid this race, we can simply synchronize with the
api-thread. Since we already have this synchronization as part of the
api-thread takedown, it is as simple as stopping the api-thread before
continuing with operations.

Lastly, if a write operation to a pipe was issued, we are guaranteed
that a SIGCHLD synchronization across processes is ordered correctly.
Furthermore, the python event-loop also guarantees that stopping an
event-loop will necessarily dispatch all outstanding events. A read is
guaranteed to be outstanding in our race-scenario, so the read will be
dispatched. The only possible problem is `_output_ready()` only
dispatching a maximum of 4096 bytes. This might need to be fixed
separately. A comment is left in place.
2020-08-13 14:02:27 +02:00
Christian Kellner
42b20638c0 pipeline: add metadata to the build result
Include metadata, optionally set by modules, in the build result.
2020-08-13 10:50:34 +02:00