We need to initialize `schema` to `None`, otherwise it will be an access
to an uninitialized variable when looking up invalid schemata:
[...]
File "[...]/osbuild/meta.py", line 583, in get_schema
schema = Schema(schema, name or klass)
UnboundLocalError: local variable 'schema' referenced before assignment
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
The schema input of Schema.__init__ is a python-native representation
of a JSON object, so it can be any kind of dictionary. Furthermore, it
is optional.
Fix the type to be Optional[Dict].
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
Make sure all --help output is consistent. In this particular case,
each line should consistently start with a lower-case character and
avoid a leading `the`.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
This adds a `osbuild --version` command that prints the current osbuild
version in use. Allows users to confirm their osbuild is up to date
enough to use newer features.
Introduce a new class to manage inputs, `InputManger` and move the
code to map inputs from the `Input` here. The main insight of why
the logic should be place here is that certain information is needed
to map inputs, independently of specific type: the path to the input
directory, `root`, the store API, `storeapi` and the service manager
instance to start the actual service. Instead of passing all this
information again and again to the `Input` class, we now have a
specialized (service) manager class for inputs that has all the
needed information all the time.
Check for existing checkpoint in `Pipeline.build_stages` by trying to
get the object, instead of just checking for its existence. Later, if
no checkpoints were found, i.e. `tree` is `None`, create a new object.
This avoids mixing of new object creation and object access.
Instead of iterating over the stages via indices, iterate over the
stages directly. To be able to do so, collect the stages that need
to be built in a deque and then drain it from the other end.
Also invoke `monitor.finish` when the pipeline failed to built.
There is no need to not invoke it in that case. This also will
allow us to print some information in the monitor in tha case.
Since neither a build tree, nor the actual tree is returned from
`build_stages` the short circuit code that checks if the tree is
already present in the store, can be moved before the build tree
retrival. As a result, the short-circuit check in `Pipeline.run`
is now redundant. It was there to make sure that if we have the
tree associated with a pipeline, its build pipeline would also
not be needed. With the short-circuit now happening before the
access of the build pipeline in `build_stages` this is ensured.
In the previous data model the build pipelines were nested inside
the pipeline and thus we would recurse in `build_stages`. The
tree that was built was returned and potentially became the build
tree for the pipeline that invoked `build_stages`. In the new
model of a direct acyclic graph of pipelines the build tree can
be any previously built pipeline and we just get it via the store,
which now keeps track of all previously built pipelines even if
there are not committed to it. Thus there is no need to return
the trees from `build_stages` anymore.
Adjust the short code that does the short circuit check to use
`ObjectStore.contains` instead of `ObjectStore.get` since we
do not need to object anymore.
The pipeline data model used to have an assembler optionally
associated with the pipeline; therefore we had to return the
build tree used to to build the stages since the same build
tree also needed to be used from the assembler. In the "new"
model (first introduced in version 27), the assembler got
replaced by another "normal" pipeline. Since then, there is
no need to return the build tree anymore. Remove it.
Instead of serializing the `BuildResult` to a dict in `build_stages`,
we keep the object and then only serialize it in the corresponding
formatting code. This doubles down on the separation between the
internal data structures and the external representation of them. It
was partially already done in the v2 format which hand-picked which
elements of the BuildResult it would return for each stage.
Remove the stage options from the `BuildResult` object. They were
only serialized in the case of version 1 and not actually used by
Composer for anything. Use of v1 manifests should very limted now
anyway.
Show the stage name (if one is set) when failing the stage in the
validator. This closes#1007, example output:
```
€ python3 -m osbuild supakeen-os.json
supakeen-os.json has errors:
pipelines[0].stages[0]
could not find schema information for 'org.osbuild.rpmb'
.pipelines[0].stages[0].inputs.packages:
could not find schema information for 'org.osbuild.filesz'
```
Before, the download method was defined in the inherited class of each
program. With the same kind of workflow redefined every time. This
contribution aims at making the workflow more clear and to generalize
what can be in the SourceService class.
The download worklow is as follow:
Setup -> Filter -> Prepare -> Download
The setup mainly step sets up caches. Where the download data will be
stored in the end.
The filter step is used to discard some of the items to download based
on some criterion. By default, it is used to verify if an item is
already in the cache using the item's checksum.
The Prepare step goes from each element and let the overloading step the
ability to alter each item before downloading it. This is used mainly
for the curl command which for rhel must generate the subscriptions.
Then the download step will call fetch_one for each item. Here the
download can be performed sequentially or in parallel depending on the
number of workers selected.
Introduce a new class member `content_type` that specifies what type of
items the source will store in the cache. Use that to generalize the
setup step, which is shared across all sources.
Silence pylint warning W0201 (attribute-defined-outside-init) in
`set_status`; it sets dynamic attributes on the LoopInfo class
which pylint does not recognize.
Add a new class `SubIdsDB` as a database of subordinate Ids, like the
ones in `/etc/subuid` and `/etc/subgid`. Methods to read and write
data from these two files are provided.
Add corresponding unit tests.
Drop `CAP_MAC_ADMIN` from the default capabilities which is needed
to write and read(!) unknown SELinux labels. Adjust the stages
that need to read or write SELinux labels accordingly.
Drop CAP_{NET_ADMIN,SYS_PTRACE} from the default capabilities which
are only needed to run bwrap from inside a stage which is done by
the `ostree.commit` and `ostree.preptree` stages, so retain them
directly there.
Add new stage metadata `CAPABILITIES` where stages can request
additional capabilities that are not in the default set.
Currently this is not used by any stage since the default set
contains the sum of all needed capabilities.
Drop all capabilities that are not required by any of the stages.
N.B. at least one stage (`ostree.preptree`) itself executes bwrap
itself, which in turn needs `CAP_SYS_PTRACE` and `CAP_NET_ADMIN`.
Add a new member variable `caps` that if not `None` indicates the
capabilities to retain, i.e. all other capabilities not specified
will be dropped via `bubblewrap` (`--cap-drop`).
Add corresponding tests.
This extends the possible ways of passing references to inputs. The
current ways possible are:
1) "plain references", an array of strings:
["ref1", "ref2", ...]
2) "object references", a mapping of keys to objects:
{"ref1": { <options> }, "ref2": { <options> }, ...}
This patch adds a new way:
3) "array of object references":
[{"id": "ref1", "options": { ... }}, {"id": ... }, ]
While osbuild promises to preserves the order for "object references"
not all JSON serialization libraries preserve the order since the
JSON specification does leave this up to the implementation.
The new "array of object references" thus allows for specifying the
references together with reference specific options and this in a
specific order.
Additionally this paves the way for specifying the same input twice,
e.g. in the case of the `org.osbuild.files` input where a pipeline
could then be specified twice with different files. This needs core
rework though, since internally we use dictionaries right now.
This is a left-over from the time when `systemd-nspawn` was used,
which only retained a limited set of capabilities which did not
include `CAP_MAC_ADMIN`[1]. Bubblewrap, on the other hand, retains
all currently capabilities if the process is run as root[2].
[1] see e.g. src/nspawn/nspawn.c#L147 of commit c52950c
[2] see commit abc56644566a6095bb72a5bf70fcee7dd90e9447
This is basically a re-implementation of `setfilecon(3)` minus the
translation of human readable context to raw context. Add test for
the new function.
The udev inhibitor rules are checking for `device-$major:$minor`
but we created them with `f"device-{major}-{minor}"`. So they
did indeed not actually work. Fix that.
Add support for emitting signals to host.Service which can be used to
transmit data back to the client during an ongoing method call. This
provides the possibility for the services to send information to their
client counterpart while running. The signal can take file descriptors
as extra parameters to send data on separate files.
This adds a stage called org.osbuild.skopeo that installs docker and
oci archive files into the container storage of the tree being
constructed.
The source can either be a file from another pipeline, for example one
created with the existing org.osbuild.oci-archive stage, or it can
be using the new org.osbuild.skopeo source and org.osbuild.containers
input, which will download an image from a registry and install that.
There is an optional option in the install stage that lets you
configure a custom storage location, which allows the use of the
additionalimagestores option in the container storage.conf
to use a read-only image stores (instead of /var/lib/container).
Note: skopeo fails to start if /etc/containers/policy.json is
not available, so we bind mount it from the build tree to the
buildroot if available.
The client side does meta.get("source-epoch", default), but for
this to work we need to have the key unset if not specified,
but currently we set it to None.
Also, make sure the check for "not None" is explicit, because
we do consider a value of `0` to be a valid source-epoch.
ioctl contants are platform dependent. It should be the same on
x86, aarch64 and s390x but it is indeed different on ppc64le.
This lead to the call to `ioctl_blockdev_flushbuf` actually
raising an exception of `OSError: [Errno 22] Invalid argument`.
The constant was calculated with a little python snippet that
in theory could also go directly into the code, but for now
the simpler condition in this patch is enough.
The snippet is a port of the defines from the Linux kernel,
specifically /usr/include/asm-generic/ioctl.h.
class IOConstants:
"""IO Commands for Linux"""
if platform.machine() == "ppc64le":
NRBITS = 8
TYPEBITS = 8
SIZEBITS = 13
DIR_NONE = 1
else:
NRBITS = 8
TYPEBITS = 8
SIZEBITS = 14
DIR_NONE = 0
NRSHIFT = 0
TYPESHIFT = NRSHIFT+NRBITS
SIZESHIFT = TYPESHIFT+TYPEBITS
DIRSHIFT = SIZESHIFT+SIZEBITS
@classmethod
def make(cls, directory, iotype, nr, size):
return ((directory << cls.DIRSHIFT) |
(iotype << cls.TYPESHIFT) |
(nr << cls.NRSHIFT) |
(size << cls.SIZESHIFT))
@classmethod
def make_dir_none(cls, iotype, nr):
return cls.make(cls.DIR_NONE, iotype, nr, 0)
This is used to get the value for `BLKFLSBUF` taken from the
include `/usr/include/linux/fs.h`:
#define BLKFLSBUF _IO(0x12,97) /* flush buffer cache */
The value is then obtained via:
print("0x%x" % IOConstants.make_dir_none(0x12,97))
0x20001261
The treesum of a filesystem tree is the content hash of all its
files, its directory structure and file metadata.
By storing trees by their treesum we avoid storing duplicates of
identical trees, at the cost of computing the hashes for every
commit to the store.
This has limited benefit as the likelihood of two trees being
identical is slim, in particular when we already have the ability
to cache based on pipeline/stage ID (i.e., we can avoid rebuilding
trees if the pipelines that built them were the same).
Drop the concept of a treesum entirely, even though I very much
liked the idea in theory...
Signed-off-by: Tom Gundersen <teg@jklm.no>
Set the container environment variable to indicate to programs
inside the build root that they are indeed running inside a
container (see also https://systemd.io/CONTAINER_INTERFACE/).
Create a well-defined environment with and use that for the build
root. It is not desirable to have the host's environment leak
into the container. Add a test to ensure that this works.
NB: This was probably an oversight when we switched from systemd-
nspawn to bubblewrap.
Introduce two new command line arguments, which can be used to
specify which monitor class to use (`--monitor`) and what file
descriptor to use for monitoring (`--monitor-fd`). The latter
defaults to standard out. The monitor class, if not specified,
is depended on the `--json` argument.