osbuild_cli() sometimes returned an exit code, but at the end called
sys.exit() directly. The idea was probably to always return the code
with which the executable should exit.
Make this consistent and call sys.exit() in __main__.py, with the value
returned by osbuild_cli().
Instead of having a another indirection via `main_cli`, directly
use `osbuild_cli` in as main function in `__main__.py`. Also use
that in as the entry point for the generated `osbuild` executable.
Change `osbuild_cli` to be self-contained, i.e. it directly uses
`sys.argv` and `sys.exit`.
This extracts the CLI entrypoint into `main_cli.py` and prepares the
codebase for the introduction of additional entrypoints. This should
not contain any functional changes.
The idea behind this is to add `main_api.py` (and maybe more in the
future), which will be similar to `main_cli.py` but contain the
`osbuild-api` entrypoint. This will make all entrypoints nicely symetric
and the only difference will be `setup.py` selecting the right
entrypoint for each executable, as well as `__main__.py` selecting the
entrypoint for the module itself (which we will keep to the CLI for
compatibility).
Add a new output-directory argument which specifies where to store
result objects. For now, this is purely optional and simply copies from
the old `output_id` into the specified directory. This allows a
backwards compatible transition towards removing any external access to
the osbuild cache.
Note that this has still lots of room for improvements:
* We only support assembler-output for now, but we could also easily
support entire trees as output, in case no assembler was selected.
Alternatively, we could introduce a "copy" assembler, that just
outputs the input tree.
* This parameter is optional, but should really be mandatory. There
is little reason to have the default behavior just dropping any
generated content. This would be a breaking change, though.
* We could move data out of a temporary object-store entry, rather
than copy it. But again, for backwards-compatibility, we leave the
latest store-object intact and do not move things out of it.
* We could now transition towards never committing anything to the
store, not even output IDs, unless explicitly checkpointed.
osbuild can now take only manifests as its input (the legacy input format
was dropped in e48c2f1). This commit changes all remaining occurrences of
"pipeline" to "manifest" when describing the osbuild input.
This drops support for passing in non-manifest style pipelines
directly. It used to be that we directly pass in the pipeline
description, but it got changed to a proper manifest format in:
commit e48c2f178c
Author: Tom Gundersen <teg@jklm.no>
Date: Thu Feb 13 17:44:54 2020 +0100
osbuild: allow the sources to be passed in on stdin
With 2 releases in between, we are now far enough to drop the old
format. All code has been converted, our API guarantee is not in place,
yet, so lets just drop the legacy code and fully commit to the
manifest.
Fixes#265.
When marking stages for checkpointing, let us make use of the local set
datastructure we already allocate, rather than iterating over it
linearly.
Apart from the negligible performance improvement, it makes the code
quite a lot simpler.
We generally surround function definitions with newlines. Make sure
this is also true for local function definitions.
Signed-off-by: David Rheinsberg <david.rheinsberg@gmail.com>
This modifies the help-strings for parameters in `osbuild --help`.
Rather than using the identifier to describe its purpose, make it
describe its type. That is, this changes:
--sources=SOURCES => --sources=FILE
The option-name should already describe the purpose, so lets use the
argument-name for the type. This also improves on the stuttering when
reading the output.
We already do that for options that take directories as arguments. For
some reasons, we did not do that for options that take file-paths.
It is arguable whether this should be `PATH` or `FILE`. The latter has
the advantage that it makes clear that it is not a directory. It should
be obvious that `FILE` allows all kinds of paths.
Lastly, this does not update the positional arguments (in our case just
`PIPELINE`), since I did not conclude on the best way to make it
self-documenting. `PIPELINE-FILE` sounds convoluted.
Currently stdin is taken to be the pipeline to be built, this allows
it to be instead a map containing the suorces and the pipeline.
We would imagine passing around the sources and pipeline together, so
this just makes the behavior of osbuild more closely match the intended
use and semantics of the sources configuration.
This keeps backwards compatibility for now, but that may be dropped as
soon as osbuild-composer no longer relies on the old behavior.
Disable too-many-{branches,statements} pylint warnings in __main__.py.
These do not seem helpful, but could be reenabled if we drop some
options in the future.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Make the sources options a static property of the pipeline, in
particular of each stage, rather than being passed in on `run()`.
This more closely matches the intended semantics of sources and
pipeline having similar lifetimes and being fairly coupled together.
The difference between the pipeline and the sources is that the
sources do not contribute to identifying the pipeline (they are not
part of the hash for the pipeline id), and they could be swapped
out without changing the output image (as long as they are valid).
However, a pipeline without A sources object would not be useful,
and typically the pipeline and the sources are generated, passed
around and used together.
This is different from the build environment and the secrets object,
which both are specific to either the host or the caller, unlike
the pipeline which should be universal.
This changes the `load()` function to take a `manifest`, which is
a map containing both the pipeline and the sources.
Note that the semantics of the build-env parameter remains unchanged:
It shares the sources with the rest of the pipeline. We may want to
reconsider this in future commits, as the build-env is specific to
the host, whereas the regular pipeline is not.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Add a new `--checkpoint` option, which can be provided multiple
times, that indicate after which stages a the current stage of
the tree should be committed to the object store; the tree id
will be the treesum of the tree at that point and a reference
is created with the id of the stage at the point.
The argument to `--checkpoint` is the id of the stage. If not
all the given checkpoints can be found the execution will be
aborted.
Add a new command line option `--secrets`, which accepts a JSON file
that is structured similarly to a source file. It is should contain data
that is necessary to fetch content, but shouldn't appear in any logs.
Pipelines encode which source content they need in the form of
repository metadata checksums (or rpm checksums). In addition, they
encode where they fetch that source content from in the form of URLs.
This is overly specific and doesn't have to be in the pipeline's hash:
the checksum is enough to specify an image.
In practice, this precluded using alternative ways of getting at source
packages, such as local mirrors, which could speed up development.
Introduce a new osbuild API: sources. With it, a stage can query for a
way to fetch source content based on checksums.
The first such source is `org.osbuild.dnf`, which returns repository
configuration for a metadata checksum. Note that the dnf stage continues
to verify that the content it received matches the checksum it expects.
Sources are implemented as programs, living in a `sources` directory.
They are run on the host (i.e., uncontained) right now. Each source gets
passed options, which are taken from a new command line argument to
osbuild, and an array of checksums for which to return content.
This API is only available to stages right now.
A pipeline run only returned logs in the `StageFailed` and
`AssemblerFailed` exceptions. Remove those and always return structured
data instead.
It only returns data for stages that actually ran (i.e., didn't come
from the cache). This is similar to the output in interactive mode.
Also change osbuildtest to be able to deal with output that is larger
than the pipe buffer by using subprocess.communicate().
We've been using a generic `osbuild-run`, which sets up the build
environment (and works around bugs) for all build roots. It is already
getting unwieldy, because it tries to detect the OS for some things it
configures. It's also about to cause problems for RHEL, which doesn't
currently support a python3 shebang without having /etc around.
This patch changes the `build` key in a pipeline to not be a pipeline
itself, but an object with `runner` and `pipeline` keys. `pipeline` is
the build pipeline, as before. `runner` is the name of the runner to
use. Runners are programs in the `runners` subdirectory.
Three runners are included in this patch. They're copies of osbuild-run
for now (except some additions for rhel82). The idea is that each of
them only contains the minimal setup code necessary for an OS, and that
we can review what's needed when updating a build root.
Also modify the `--build-pipeline` command line switch to accept such a
build object (instead of a pipeline) and rename it accordingly, to
`--build-env`.
Correspondingly, `OSBUILD_TEST_BUILD_PIPELINE` → `OSBUILD_TEST_BUILD_ENV`.
Treat outputs like we treat trees: store them in the object store. This
simplifies using osbuild and allows returning a cached version if one is
available.
This makes the `--output` parameter redundant. Remove it.
`osbuild --json [ARGS]` will suppress the normal output and print its
result as JSON. For now, it only does this when it returns 0. Otherwise,
it prints the error from the latest stage.
This is useful for other tools to call it and get machine-readable
output.
The best practice for creating a pipeline should be to include at least
one level of build-pipelines. This makes sure that the tools used to
generate the target image are well-defined.
In principle one could add several layers, though in pracite, one would
hope that the envinment used to build the buildroot does not affect the
final image (and as we anyway cannot recurr indefinitely, we fall back
to simply using the host system in this case).
This only makes sense, if the contents of the host system truly does not
affect the generated image, and as such we do not include any information
about the host when computing the hash that identifies a pipeline.
In fact, any image could be used in its place, as long as the required
tools are present. This commit takes advantage of that fact. Rather than
run a pipeline with the host as the build root, take a second pipeline
to generate the buildroot, but do not include this when computing the
pipeline id (so it is different from simply editing the original JSON).
This is necessary so we can use the same pipelines on significantly
different host systems (run with different --bulid-pipeline arguments).
In particular, it allows our test pipelines that generate f30 images
to be run unmodified on Travis (which runs Ubuntu).
Signed-off-by: Tom Gundersen <teg@jklm.no>
This also changes the structure of the object store, though the
basic idea is the same.
The object store contains a directory of objects, which are content
addressable filesystem trees. Currently we only ever use their
content-hash internally, but the idea for this is basically Lars
Karlitski and Kay Sievers' `treesum()`. We may exopse this in the
future.
Moreover, it contains a directory of refs, which are symlinks named
by the stage id they correspond to (as before), pointing to an object
generated from that stage-id.
The ObjectStore exposes three method:
`has_tree()`: This checks if the content store contains the given tree.
If so, we can rely on the tree remaining there.
`get_tree()`: This is meant to be used with a `with` block and yields
the path to a read-only instance of the tree with the given id. If the
tree_id is passed in as None, an empty directory is given instead.
`new_tree()`: This is meant to be used with a `with` block and yields
the path to a directory in which the tree by the given id should be
created. If a base_id is passed in, the tree is initialized with the
tree with the given id. Only when the block is exited successfully
is the tree written to the content store, referenced by the id in
question.
Use this in Pipeline.run() to avoid regenerating trees unneccessarily.
In order to trigger a regeneration, the content store must currently
be manually flushed.
Update the travis test to run the noop pipeline twice, verifying that
the stage is only run the first time.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Stop guessing if we're in the source directory by looking if a `stages`
subdirectory exists. Instead, assume that osbuild is installed on the
host.
If `--libdir` is given, mount the libdir into `/run/osbuild/lib` (alas,
we can't overwrite `/usr/libexec/osbuild`) and run osbuild from there.
Thus, running from source must now be done like this:
# python3 -m osbuild --libdir . [other args]