Availability of new incoming data is indicated to clients, i.e.
deriving classes, by invoking the `_dispatch` method, with the
`jsoncomm.Socket` as argument. All clients then need to call
`Socket.recv` to actually receive the data.
Provide a new high-level message dispatcher class by providing
a standard implementation of `_dispatch` in `BaseAPI` that calls
`socket.revc` and then invokes the new high level `_message`
method, with the data (`msg`), file descriptors (`fds`, if passed)
the socket (`sock`) and the peer address `addr`.
Rely on the ability of `BaseAPI` to auto-generate socket addresses
when no one was provided. The `BuildRoot` does not rely on the
sockets being created in the `BuildRoot.api` directory anymore and
will instead bind-mount each individual socket address to the well
known location via the `BaseAPI.endpoint` identifier.
Convert all API providers to take the `socket_address` as an
optional keyword argument.
Make the `socket_address` argument to `BaseAPI` optional, i.e.
allow it to be `None`. In that case, create a temporary directory
and place the socket, named with the value of `endpoint`, in that
directory. On context exit, the directory is cleaned up. As long
as the jsoncomm.Socket server is running, `socket_address` will
always be valid and indicating the address of the server.
Add support for `util.types.PathLike` paths for socket addresses,
instead of just plain strings. Test it by using pathlib.Path to
create the address in the corresponding test.
Add a simple new module meant to define types that are commonly
used throughout the code-base. For starters, define `PathLike`
meant to represent file system paths, i.e. strings, bytes, or
anything that provides the `os.PathLike` protocol, i.e. that
can be used with `os.fspath`.
The current way API end points, i.e. sockets for API providers,
are provided to the sandbox is via a temporary directory that
is created by `BuildRoot` which later gets bind-mounted to a well
known path, i.e. /run/osbuild/api inside the sandbox. API providers
are expected to create their socket in that temporary directory.
Now that `BuildRoot` has a `regsiter_api` method and each API has
an `endpoint` property, the socket of each API provider, no matter
where it is located, will get bind-mounted individually inside
the sandbox at /run/osbuild/api using the `endpoint` identifier.
For backwards compatibility reasons the temporary api directory
will still be created by `BuildRoot`, but it is no longer bind
mounted inside the container. This paves the way to remove that
directory completely once all API providers are converted to not
use that directory anymore.
Add a new abstract class property to `BaseAPI` called `endpoint`,
meant to be implemented by deriving classes in order to identify
the end point name for the API provider.
Implement the new property in all existing API providers.
Register all API end point providers with the `BuildRoot` via the
new `BuildRoot.register_api` call. The context management is now
done via the `BuildRoot` itself.
Add a new `register_api` method that is meant to be used by clients
to register API end point providers, i.e. instances of `api.BaseAPI`.
When the context of the `BuildRoot` is enter, all providers are
activated, i.e. their context is entered. In case `regsiter_api` is
called with an already active context, the provider will immediately
be activated. In both cases their lifetime is thus bound to the
context of the `BuildRoot`. This also means that they are cleaned-up
with the `BuildRoot`, i.e. when its context is exited.
The `api.API` provides a `setup-stdio` method, that is meant to
be used by clients to replace their stdio with the supplied fds
from the server. Provide a canonical `api.setup_stdio` method
that will do exactly that.
Split out the part of `api.API` that is responsible for providing
the server infrastructure for the API; i.e. setting up the server
and the corresponding context manager and asynchronous event
handling. This leaves `API` itself which just the implementation
of the high level protocol and makes the API-server part re-usable.
NB: pylint, for some reason, confuses `API` and `BaseAPI`, like in
`test_monitor`. Annotate that accordingly.
Use `os.path.join` to build the path for the source cache, instead
of string interpolation. This makes it possible to use other Path
representations, like `pathlib.Path`, transparently.
Currently `objectstore.Object.{read, write}` directly return
strings but in the future they might return an Object that is
an `os.PathLike`, i.e. has a `__fspath__` method, instead.
Prepare for that by ensuring all `tree`s are converted to their
file system representation via `os.fspath` when needed, e.g.
when creating the bind-mount arguments for the `BuildRoot`.
Instead of using string interpolation, use `os.path.join` in all
places. This should allow the use of `os.PathLike` objects as well
as bytes (i.e. `objectstore.PathLike` types) to be used and is
generally cleaner.
Support `os.PathLike` arguments in `Object.export` by explicitly
converting the supplied argument via `os.fspath`. Additionally,
declare the support for those via the Python typing system with
a new Union type for general `PathLike` type, i.e. all valid
types for `os.fspath`, which are `str`, `bytes`, `os.PathLike`.
Instead of having a duplication of the invocation of `cp`, once in
`init`, once in `export`, re-use the latter in the former: the to
be copied object is accessed in the normal way via the store, and
then "exported" to the new location. This gets rid of the call to
resolve_ref as a nice side effect, which means less poking into
the internals of the store.
This swaps the `systemd-nspawn` implementation for `bubblewrap` to
contain sub-processes. It also adjusts the `BuildRoot` implementation
to reduce the number of mounts required to keep locally.
This has the following advantages:
* We know exactly how the build-root looks like. Only the bits and
pieces we select will end up in the build-root. We can let RPM
authors know what environment their post-install scripts need to
run in, and we can reliably test this.
* We no longer need any D-Bus access or access to other PID1
facilities. Bubblewrap allows us to execute from any environment,
including containers and sandboxes.
* Bubblewrap setup is significantly faster than nspawn. This is a
minor point though, since nspawn is still fast enough compared to
the operations we perform in the container.
* Bubblewrap does not require root.
At the same time, we have a bunch of downsides which might increase the
workload in the future:
* We now control the build-root, which also means we have to make sure
it works on all our supported architectures, all quirks are
included, and all required resources are accessible from within the
build-root.
The good thing here is that we have lots of previous-art we can
follow, and all the other ones just play whack-a-mole, so we can
join that fun.
The `bubblewrap` project is used by podman and flatpak, it is packaged
for all major distributions, and looks like a stable dependency.
Introduce the concept of pipeline monitoring: A new monitor class is
passed to the pipeline.run() function. The main idea is to separate
the monitoring from the code that builds pipeline. Through the build
process various methods will be called on that object, representing
the different steps and their targets during the build process. This
can be used to fully stream the output of the various stages or just
indicate the start and finish of the individual stages.
This replaces the 'interactive' argument throughout the pipeline
code. The old interactive behavior is replicated via the new
`LogMonitor` class that logs the beginning of stages/assembler,
but also streams all the output of them to stdout.
The non-interactive behavior of not reporting anything is done by
using the `NullMonitor` class, which in turn outputs nothing.
Instead of using plain python strings and appending to them, use
'io.StringIO' which is a data structure meant to be used for i/o.
This should increase performance compared to plain strings.
Instead of either using a text file, in non-interactive mode, or
directly stdout otherwise, create a pipe and always use that as
for stdout/stderr when preparing the output for 'setup_stdio'.
This streamlines the two cases (interactive, non-interactive) and
as a result 'API.output' will always contain the full output data.
Close the event loop when the context is exited, which will clear
the internal queues and shut down the executor of the event loop.
Not doing this will create a warning when the object is garbage
collected.
Close the event loop when the context is exited, which will clear
the internal queues and shut down the executor of the event loop.
Not doing this will create a warning when the object is garbage
collected.
In three places we have more than 7 instances attributes, but less
then 10; instead of disabling the warning for all these cases,
increase the limit to a reasonable size of 10 and re-enable the
warnings in all the places.
If the user does not specify an output directory or checkpoints
to osbuild, exit successfully without building.
Previously, if a user did not include an output directory or
checkpoints, it would build the manifest and throw out the result.
Returning early will be clearer to the user and avoid wasting work.
Instead of having a another indirection via `main_cli`, directly
use `osbuild_cli` in as main function in `__main__.py`. Also use
that in as the entry point for the generated `osbuild` executable.
Change `osbuild_cli` to be self-contained, i.e. it directly uses
`sys.argv` and `sys.exit`.
The way secrets work has been changed via commit 372b117: instead
of passing them in via the command line, the information how to
obtain secrets are encoded along the sources themselves.
The only stage that still has support for the old style way is the
deprecated org.osbuild.dnf stage, which might be removed in the
near future.
When applying labels inside the container that are unknown to the
host, the process needs to have the CAP_MAC_ADMIN capability in order
to do so, otherwise the kernel will prevent setting those unknown
labels. See the previous commit for more details.
In python 3.6 the value of `__origin__` for typing.List[str] is
typing.List. This then changed to the actual `list` type in later
versions. Accept both versions.
Add the initramfs-args Treefile option that can be used to pass
arguments to drauct via rpm-ostree. NB: the ostree module will
always be automatically be included by rpm-ostree.
Add support for querying information about sources: add the mapping
from name to directory and accept "Source" as a module name. Adapt
the ModuleInfo schema property to handle the different styles for
stage-like schemata as well as sources now.
For all currently supported modules, i.e. stages and assemblers,
convert the STAGE_DESC and STAGE_INFO into a proper doc-string.
Rename the STAGE_OPTS into SCHEMA.
Refactor meta.ModuleInfo loading accordingly.
The script to be used for the conversion is:
--- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< --- 8< ---
import os
import sys
import osbuild
import osbuild.meta
from osbuild.meta import ModuleInfo
def find_line(lines, start):
for i, l in enumerate(lines):
if l.startswith(start):
return i
return None
def del_block(lines, prefix):
start = find_line(lines, prefix)
end = find_line(lines[start:], '"""')
print(start, end)
del lines[start:start+end+1]
def main():
index = osbuild.meta.Index(os.curdir)
modules = []
for klass in ("Stage", "Assembler"):
mods = index.list_modules_for_class(klass)
modules += [(klass, module) for module in mods]
for m in modules:
print(m)
klass, name = m
info = ModuleInfo.load(os.curdir, klass, name)
module_path = ModuleInfo.module_class_to_directory(klass)
path = os.path.join(os.curdir, module_path, name)
with open(path, "r") as f:
data = list(f.readlines())
i = find_line(data, "STAGE_DESC")
print(i)
del data[i]
del_block(data, "STAGE_INFO")
i = find_line(data, "STAGE_OPTS")
data[i] = 'SCHEMA = """\n'
docstr = '"""\n' + info.desc + "\n" + info.info + '"""\n'
doclst = docstr.split("\n")
doclst = [l + "\n" for l in doclst]
data = [data[0]] + doclst + data[1:]
with open(path, "w") as f:
f.writelines(data)
if __name__ == "__main__":
main()
Make the mapping of module class to the corresponding directory
a method of the ModuleInfo class. This is so it can be re-used
by others in the future.
The are converging on a nomenclature where the sum of Stages,
Assemblers, Sources (and future entities like those) together
are called 'Modules'.
Thus rename StageInfo to ModuleInfo and the corresponding
variables and methods.
Change the assembler-commit to be conditional on checkpoints, just like
we already do for stages. This means, assembler output is not
automatically committed, but only if you requested so via a checkpoint.
With this in place we can start sharing caches in osbuild-composer. The
only thing in the cache will be sources as well as checkpointed stages.
We can start checkpointing known pipelines and thus make use of the
cache. Furthermore, we can cache sources, as long as we do not fetch an
unbound set of RPMs. However, our RPM set is currently static, so this
should not be an issue. Nevertheless, it is up to Composer to decide
when to enable the cache.
We no longer need the `tree_id` in the osbuild output. All callers have
been converted to use other means. Drop the ID from the output and
avoid exposing our internals.
Now that no caller requires the "output_id" anymore, drop it from our
results-dictionary. Instead, pass the output-directory through and copy
outputs where we produce / fetch them.
This still uses `objectstore.resolve_ref()`, since we do not have the
outputs pinned at the places where we want to copy. This needs a little
bit more rework, but we might just delay that until we have the cache
rework landed.
This already simplifies the output-directory path and drops the slight
hack which checked very late for produced outputs.
Note that we must be careful not to copy things too early, because we
do not want remnants in the output-directory if we return failure.
Hence, keep the copy-operation close to the commit-operation on the
store.
All callsites of `Pipeline.assemble()` already check early whether the
output-object exists in the store and then return it. Checking again in
`assemble()` will never catch anything (unless another stage would
happen to produce the same ID as the assembler as a side-effect).
It does seem useful to keep the shortcuts in `assemble()`, so other
callers would get the shortcut as well. However, this does not really
work well right now, since you want to skip the stage-compilation as
well, and `assemble()` is really just the last step of the job. Hence,
it really is the job of the pipeline-executor to check early.
With that in mind, lets drop this fast-path which has no effect in the
current setup.
Using `[]` as default value for arguments makes `pylint` complain. The
reason is that it creates an array statically at the time the function
is parsed, rather than dynamically on invocation of the function. This
means, when you append to this array, you change the global instance and
every further invocation of that function works on this modified array.
While our use-cases are safe, this is indeed a common pitfall. Lets
avoid using this and resort to `None` instead.
This silences a lot of warnings from pylint about "dangerous use of []".