Create a new api endpoint called exception, that communicates
exception backtraces separately back to osbuild, as opposed to
dumping them into the normal log. Additionally, add a corresponding
test to check that a call to api.exception correctly sets
API.exception.
Access metadata.api only after `api` has exited the context and
thus the event loop has stopped and all incoming messages, like
the one setting the metadata, have been processed.
See commit 803433fb62 for a lecture
about the internals and all the details involved.
Now that the `org.osbuild.linux` runner does not use `api.setup_stdio`
anymore, the output of the binary run from the BuildRoot must end up
in `BuildRoot.output`. Check for that.
Create a new CompletedBuild object that wraps and is very similar
to the subprocess.CompletedProcess, i.e. it has a process member
but also has shortcuts for returncode. Additionally, the output
of the process is not only forwarded to the monitor, but also
captured and then handed to CompletedBuild, so its output member
will actually contain the full build output. To be compatible
with the previously returned CompletedProcess, `stderr`, `stdout`
members exist on CompletedBuild that also return `output`.
In case that bubblewrap fails to, e.g. because it fails to execute
the runner, it will print an error message to stderr. Currently,
this output is not capture and thus not logged. To fix that, the
`BuildRoot.run` method now takes a monitor object and will stream
stdout/stderr to the log via the monitor.
Simple check for the new server side method, `get-arguments`, and
client side counterpart, `api.arguments`, that compares that using
the later we get the supplied input (arguments) to API.
Change the API endpoint to prevent retrieving monitor-output from a
running instance. Instead, we require the caller to exit the API context
before querying the monitor-output. This guarantees that the api-thread
was synchronously taken down and scheduled any outstanding events.
This fixes an issue where a side-channel notifies us of a buildroot
exit, but the api-thread has not yet returned from epoll, and thus might
not have dispatched pending I/O events, yet. If we instead wait for the
thread to exit, we have a synchronous shutdown and know that all
*ordered* kernel events must have been handled.
In particular, imagine a build-root program running (like `echo` in the
test_monitor unittest) which writes data to the stdout-pipe and then
immediately exits. The syscall-order guarantees that the data is written
to the pipe before the SIGCHLD is sent (or wait(2) returns). However, we
retrieve the SIGCHLD from our main-thread usually (p.join() in our test,
and BuildRoot() in our main code), while the pipe-reading is done from
an API thread. Therefore, we might end up handling the SIGCHLD first
(just imagine a single-threaded CPU that schedules the main task before
the thread). To avoid this race, we can simply synchronize with the
api-thread. Since we already have this synchronization as part of the
api-thread takedown, it is as simple as stopping the api-thread before
continuing with operations.
Lastly, if a write operation to a pipe was issued, we are guaranteed
that a SIGCHLD synchronization across processes is ordered correctly.
Furthermore, the python event-loop also guarantees that stopping an
event-loop will necessarily dispatch all outstanding events. A read is
guaranteed to be outstanding in our race-scenario, so the read will be
dispatched. The only possible problem is `_output_ready()` only
dispatching a maximum of 4096 bytes. This might need to be fixed
separately. A comment is left in place.
If the stage test folder contains a `metadata.json` file, it will
contain a dictionary where the keys are stage ids and the values
are dictionaries containing the metadata to verify. For each of
those the stage will be looked up in the pipeline result of 'b'
and verified that the metadata matches.
This is a crucial pre-condition for the org.osbuild.selinux stage
to work properly, especially that it can set labels that are not
present in the policy on the host. If /sys/fs/selinux is writable,
setfiles will try to verify the labels via /sys/fs/selinux/context
and fail for unknown labels.
Add a new helper script to check if a mount / file-system was
mounted with specific flags. Currently only "ro", "nosuid",
"nodev" and "noexec" flags are supported. This script is in
test/data since it will be used from other tests and is itself
not a test per se.
Change the default of libdir to /usr/lib/osbuild and
remove redundant logic. Additionally, change how the
python package is detected.
Instead of checking if libdir is None, check if
/usr/lib/osbuild is empty - i.e. if the user has specified
a different directory than the default.
640k ought to be enough for anybody!
Err... I mean...
The assembler tests now install only the filesystem and selinux packages and
their dependencies. For this, we don't need the luxury of 2 GiB.
This commit changes the image size to 512 MiB. This has some advantages:
- the tests are faster - I measured the qemu assembler test and the running
time went down from 290s to 260s.
- the tests can be run in environments with smaller disk space
Previously, the osbuild executor had its internal temporary directory that
served as the output directory. However, this approach gives no power to
the caller to control the lifetime of the produced artifacts. When more
images are built using one executor, the results will accumulate in one
place possibly leading to exhaustion of disk space.
This commit removes the executor's internal output directory. The output
directory can now be passed to osbuild.compile, so the caller can control
its lifetime. If no directory is passed in, the compile method will use
its own temporary directory - this is useful in cases when the caller
doesn't care about the built artifacts or the manifest doesn't have any
outputs.
Now that jsoncomm.Socket is using a connection-oriented socket,
the destination in `socket.sendmsg` is ignored and thus can and
should be dropped from the `jsoncomm.Socket.send` method.
Adjust the tests accordingly.
Now that jsoncomm is using a connection oriented protocol, the
`addr` parameter is not needed[*] and can thus be removed from
the `BaseAPI._message` message dispatcher. Adapt all usages
of it, including the tests.
[*] sendmsg ignores the destination parameter for connection
oriented sockets.
Switch to use a connection oriented datagram based protocol, i.e.
`SOCK_SEQPACKET`, instead of `SOCK_DGRAM`. It sill preserves
message boundaries, but since it is connection oriented the client
nor the server do not need to specify the destination addresses
of the peer in sendmsg/recvmesg. Moreover, the host will be able
to send messages to the client, even if the latter is sandboxed
with a separate network namespace. In the `SOCK_DRAM` case the
auto-bound address of the client would not be visible to the host
and thus sending messages would to it would fail.
Adapt the jsoncomm tests as well as `BaseAPI`.
This way the test can benefit from osbuild's internal cache:
The first subtest builds all the stages and runs the assembler
The next subtests can reuse the built stages and just run the assembler
Some data from my machine running the qemu test:
Building the manifest takes about 120 seconds
Running just the assembler on the cache's content takes 30 seconds.
Before this change, the whole manifest was built 3 times:
3 * 120 = 360 seconds
After this change, the whole manifest is built once and the cache
is reused 2 times:
1 * 120 + 2 * 30 = 180 seconds
Let the caller decide which executor instance should be used to build
the manifest. This change allows us to use osbuild's built-in cache
in the following commit.
Rely on the ability of `BaseAPI` to auto-generate socket addresses
when no one was provided. The `BuildRoot` does not rely on the
sockets being created in the `BuildRoot.api` directory anymore and
will instead bind-mount each individual socket address to the well
known location via the `BaseAPI.endpoint` identifier.
Convert all API providers to take the `socket_address` as an
optional keyword argument.
Add support for `util.types.PathLike` paths for socket addresses,
instead of just plain strings. Test it by using pathlib.Path to
create the address in the corresponding test.
Add a new abstract class property to `BaseAPI` called `endpoint`,
meant to be implemented by deriving classes in order to identify
the end point name for the API provider.
Implement the new property in all existing API providers.
Split out the part of `api.API` that is responsible for providing
the server infrastructure for the API; i.e. setting up the server
and the corresponding context manager and asynchronous event
handling. This leaves `API` itself which just the implementation
of the high level protocol and makes the API-server part re-usable.
NB: pylint, for some reason, confuses `API` and `BaseAPI`, like in
`test_monitor`. Annotate that accordingly.
Create small test cases that check the execution of Stages and
Assembler. This ensure that path handling, the sandbox, as well
as basic result reporting works as expected.
Instead of using string interpolation and concatenation to build
file system paths, use `os.path.join` or directly the constructor
for `pathlib.Path`, which can take path segments.
Create a new monitor that records all the invocations of the
monitoring (virtual) functions and use that to check that when
running (i.e. building) a pipeline all of them are executed
the excepted number of times (and with the correct arguments).
Add a basic test that will set up an 'API' endpoint, then spawn a
child process that uses that 'API' endpoint to setup its stdio in
very much the same way as runners do. This is used to verify that
the API itself works properly as well as the new LogMonitor class
by comparing the inputs and outputs.
#471 extends the assembler test suite to also test xfs and btrfs filesystems
in raw and qemu assemblers. However, this change leads to long running times
of this suite.
The running time of these test consist of 3 main steps:
1) Building the build pipeline
2) Building the stages
3) Running the assembler
There are two optimization approaches:
1) Caching
OSBuild supports caching, therefore it's possible to cache results of first
two steps.
2) Minimizing the operating system tree
Assemblers don't care about the image contents. Therefore, it's possible
to create just a small tree which would be used to test the assemblers.
This should lead to speed up in the step 2 (smaller tree should be built
quicker) and in step 3 (big part of assembling is just copying files over
to the image).
This commit implements the second approach. A new test manifest is now added,
which just installs the filesystem package and its dependencies and this tree
is then labeled. This solution was chosen, so that the assemblers get
something that looks as a proper filesystem tree but also can be built pretty
quickly.
Before this change, the test_rawfs method with #471 merged ran for 842 seconds.
After this change, it ran for 391 seconds.