Added CleanCache() method to the solver that deletes all the caches if
the total size grows above a certain (configurable) limit
(default: 500 MiB).
The function is called externally to handle errors (usually log or
ignore completely) and to avoid calling multiple times for multiple
depsolves of a single request.
The cleanup is extremely simple and is meant as a placeholder for more
sophisticated cache management. The goal is to simply avoid ballooning
cache sizes that might cause issues for users or our own services.
The repository checksums in the response from dnf-json aren't used
anywhere. Since we're making changes to dnf-json and depsolving, now is
a good opportunity to drop them completely.
Move package set chain collation to the distro package and add
repositories to the package sets while returning the package sets from
their source, i.e., the ImageType.PackageSets() method.
This also removes the concept of "base repositories". There are no
longer repositories that are added implicitly to all package sets but
instead each package set needs to specify *all* the repositories it will
be depsolved against.
This paves the way for the requirement we have for building RHEL 7
images with a RHEL 8 build root. The build root package set has to be
depsolved against RHEL 8 repositories without any "base repos" included.
This is now possible since package sets and repositories are explicitly
associated from the start and there is no implicit global repository
set.
The change requires adding a list of PackageSet names to the core
rpmmd.RepoConfig. In the cloud API, repositories that are limited to
specific package sets already contain the correct package set names and
these are now copied to the internal RepoConfig when converting types in
genRepoConfig().
The user-specified repositories are only associated with the payload
package sets like before.
Attach the repository configurations that are specific to a package set
directly on the PackageSet object. This simplifies the Depsolve()
signature and avoids requiring a `nil` when no additional repositories
are required. More importantly, it makes associating repositories to
package sets explicit, no longer relying on matching array indices or
map keys.
Remove the single Depsolve function from the dnfjson package and the
depsolve command from the dnf-json tool. The new ChainDepsolve
functions and chain-depsolve command can handle single depsolves in the
same way so there's no need to keep (and have to maintain) two versions
of very similar code.
The ChainDepsolve function (in Go) and chain-depsolve command (in
Python) have been renamed to plain Depsolve and depsolve respectively,
since they are now general purpose depsolve functions.
All calls to rpmmd.Depsolve() are now replaced with the equivalent call
to solver.Depsolve() (or dnfjson.Depsolve() for one-off calls).
Attached an unconfigured dnfjson.BaseSolver to all APIs and server
configurations where rpmmd.RPMMD used to be. This BaseSolver instance
loads the repository credentials from the system and carries the cache
directory, much like the RPMMD field used to do. The BaseSolver is used
to create an initialised (configured) solver with the platform variables
(module platform ID, release ver, and arch) before running a Depsolve()
or FetchMetadata() using the NewWithConfig() method.
The FillDependencies() call in the modulesInfoHandler() of the weldr API
has been replaced by a direct call to the Depsolve() function. This
rpmmd function was only used here. Replacing the rpmmd.Depsolve() call
in rpmmd.FillDependencies() with dnfjson.Depsolve() would have created
an import cycle. The FillDependencies() function could have been moved
to dnfjson, but since it's only used in one place, moving the one-line
function body into the caller is ok.
For testing:
The mock-dnf-json is compiled to a temporary directory during test
initialisation and used for each Depsolve() or FetchMetadata() call.
The weldr API tests now use the mock dnfjson. Each rpmmd_mock.Fixture
now also has a dnfjson_mock.ResponseGenerator.
All API calls in the tests use the proper functions from dnfjson and
only the dnf-json script is mocked. Because of this, some of the
expected results in responses_test had to be changed to match correct
behaviour:
- The "builds" array of each package in the result of a module or
project list is now sorted by version number (ascending) because we
sort the package list in the result of dnfjson by NVR.
- 'check_gpg: true' is added to the expected response of the depsolve
test. The repository configs in the test weldr API specify 'CheckGPG:
True', but the mock responses returned it as false, so the expected
result didn't need to include it. Since now we're using the actual
dnfjson code to convert the mock response to the internal structure,
the repository settings are correctly used to set flag to true for
each package associated with that repository.
- The word "occurred" was mistyped as "occured" in rpmmd and is now
fixed in dnfjson.
Channels are a concept similar to job types. Callers must specify a channel
name when queueing a new job. A list of channels is also specified when
dequeueing a job. The dequeued job's channel will always be from one of the
specified channel. Of course, the job types are also respected. The dequeued
job will also always be from one of the specified type.
Currently, all calls to jobqueue were changed so all queue operations use
an empty channel name and all dequeue operations use a list containing
an empty channel.
Thus, this is a non-functional change.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Replace Job() and JobStatus() with typesafe versions, and introduce JobType()
for the rare instances where we don't know the type up front.
Additionally, catch a few more error cases:
- if OSBuildResult is nil, then we failed to invoke osbuild
- make sure the same JobResult handling is done for osbuild-koji, as for osbuild
When we compute the overall status of a koji compose, the individual
build jobs are checked. Currently, a job is considered a failure, if
a build job has output (`OSBuildOutput`) and the output's `Success`
field is `false`. But `OSBuildOutput` will be `nil` when osbuild
crashed or refused the manifest input. Therefore the job status is
a failure if `OSBuildOutput` is `nil`, since if osbuild was run,
and the run was successful we must have a non-`OSBuildOutput` field.
Implement the structured errors as defined by the worker client.
Every error for each of the job types now returns a structured
error with a reason and a specific error code. This will make
it possible to differentiate between 4xx errors and 5xx errors.
This commit refactors the way errors are implemented in the workers,
but maintains backwards compatability in composer by checking for
both kinds of errors.
We have been actually unmarshalling into a wrong datatype for a year, by
fixing this, we should get much more logging in Brew.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Pipeline names are added to each job before adding to the queue. When a
job is finished, the names are copied to the Result object as well. This
is done for both OSBuild and Koji jobs.
The pipeline names in the result are primarily used to separate package
lists into build and payload/image packages in two cases:
1. Koji builds: for reporting the build root and image package lists to
Koji (in Koji finalize).
2. Cloud API (v1 and v2): for reporting the payload packages in the
metadata request.
The pipeline names are also used to print the system log output in the
order in which pipelines are executed. This still isn't used when
printing the OSBuild Result (osbuild2.Result.Write()) and we still rely
on sorting by pipeline name
(see https://github.com/osbuild/osbuild-composer/pull/1330).
The problem: osbuild-composer used to have a rather uncomplete logic for
selecting client certificates and keys while fetching data from
repositories that use the "subscription model". In this scenario, every
repo requires the user to use a client-side TLS certificate. The problem
is that every repo can use its own CA and require a different pair of
a certificate and a key. This case wasn't handled at all in composer.
Furthermore, osbuild-composer can use remote workers which complicates
things even more.
Assumptions: The problem outlined above is hard to solve in the general
case, but Red Hat Subscription Manager places certain limitations on how
subscriptions might be used. For example, a subscription must be tight to
a host system, so there is no way to use such a repository in osbuild-composer
without it being available on the host system as well.
Also, if a user wishes to use a certain repository in osbuild-composer it
must be available on both hosts: the composer and the worker. It will come
with different pair of a client certificate and a key but otherwise, its
configuration remains the same.
The solution: Expect all the subscriptions to be registered in the
/etc/yum.repos.d/redhat.repo file. Read the mapping of URLs to certificates
and keys from there and use it. Don't change the manifest format and let
osbuild guess the appropriate subscription to use.
Koji image request handling now reads the exports defined by each image
type. All APIs now support reading the exports defined by each image
type. The worker still falls back to "assembler" in case the call comes
from an older version of composer.
My goal is to add a method to distroregistry to return Registry with
all supported distributions. This way, all supported distributions
would be defined only on one place.
To achieve this, the Registry must live outside the distro package
because the distro implementation depends on it and this would create
a circular dependency unsupported by Go.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
This replaces Packages() and BuildPackages() by returning a map of
package sets, the semantics of which is up to the distro to define.
They are meant to be depsolved and the result returned back as a
map to Manifest(), with the same keys.
No functional change.
Signed-off-by: Tom Gundersen <teg@jklm.no>
The type verification introduced in the previous commit is now also used
when retrieving the job status (GET /compose/{id}) and the logs (GET
/compose/{id}/logs).
In these cases, job retrieval needs to be performed twice:
1. First the job parameters are retrieved (Job()) to check the type.
2. Then the job result is retrieved (JobStatus()) for the status or
logs.
This makes it unlikely (essentially impossible) that the retrieval will
fail with "not found" on the second retrieval (JobStatus()), but it's
still a good sanity check for the system.
Verifies the Koji job types when retrieving Init and Build jobs as well.
When a job's arguments are retrieved (for the /manifests API endpoint),
the incoming ID should correspond to a Finalize Job. The new
worker.Job() method helps us verify the type and produce an error if the
wrong type is found.
Similarly, the dependencies of a Finalize Job should be in order (Init
Job first followed by Build Jobs). The types are validated while
iterating the dependency list.
Added convenience functions that check the retrieved job type and return
the initialised struct or an error if the ID is not found or does not
match the type.
Currently the getInitJob() function isn't used but it will be useful
later.
JobArgs() function replaced with more general Job() function that
returns all the parameters used to originally define a job during
Enqueue(). This new function enables access to the type of a job in the
queue, which wasn't available until now (except when Dequeueing).
Add and implement an API endpoint that returns the manifests for a given
osbuild-koji job. The route returns an array of manifests, one for each
image in the request.
Retrieves the arguments of each Build Job to extract the Manifest.
When rpmmd's Depsolve function is called we need to pass in the image
type's excluded packages. These excluded packages are retrieved when we
get the packages we include from each image type.
I thought rand in Go is auto-seeded but I was wrong, see [1].
This commit adds seed initialization.
[1]: https://golang.org/pkg/math/rand/#Seed
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Imagine this situation: You have a RHEL system booted from an image produced
by osbuild-composer. On this system, you want to use osbuild-composer to
create another image of RHEL.
However, there's currently something funny with partitions:
All RHEL images built by osbuild-composer contain a root xfs partition. The
interesting bit is that they all share the same xfs partition UUID. This might
sound like a good thing for reproducibility but it has a quirk.
The issue appears when osbuild runs the qemu assembler: it needs to mount all
partitions of the future image to copy the OS tree into it.
Imagine that osbuild-composer is running on a system booted from an imaged
produced by osbuild-composer. This means that its root xfs partition has this
uuid:
efe8afea-c0a8-45dc-8e6e-499279f6fa5d
When osbuild-composer builds an image on this system, it runs osbuild that
runs the qemu assembler at some point. As I said previously, it will mount
all partitions of the future image. That means that it will also try to
mount the root xfs partition with this uuid:
efe8afea-c0a8-45dc-8e6e-499279f6fa5d
Do you remember this one? Yeah, it's the same one as before. However, the xfs
kernel driver doesn't like that. It contains a global table[1] of all xfs
partitions that forbids to mount 2 xfs partitions with the same uuid.
I mean... uuids are meant to be unique, right?
This commit changes the way we build RHEL 8.4 images: Each one now has a
unique uuid. It's now literally a unique universally unique identifier. haha
[1]: a349e4c659/fs/xfs/xfs_mount.c (L51)
Previously, the compose status returned failure as soon as possible.
koji-osbuild considers the job as done when its status == failure and proceeds
with uploading the logs to koji and marking the job as failed. However, not
all osbuild-composer jobs might be done at this point so the logs might be
incomplete making the debugging hard.
This commit changes the behaviour: Now, the compose status is pending until
ALL jobs belonging to it are finished.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Previously, we had no clue what errors were catched by the default echo's
error handler. Thus, in the case of an error, we were basically blind. Let's
log all errors so we can investigate them later.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Serializing an interface does not work, let us simply use the string
representation and treat the empty string as no error. This is
compatible with the current API in the success case, and fixes the
error case, which is currently broken.
Also extend the test matrix for the kojiapi to ensure that all the
different kinds of errors can be serialized correctly and leads to
the correct status being returned.
Fixes#1079 and #1080.
Now that all interaciton with the koji API happens in the workers
we can drop koji configuration from composer itself. This means
that composer no longer needs to be provisioned with kerberos
credentials, and does not need to know about which koji servers
the workers support.
This is no longer returned when creating a compose, as it is obtained
asynchronously.
The TaskID is still returned, and is always set to 0. This is not right,
and should either be fixed or dropped. The caller should know the TaskID
(if they have one), so this is redundant and currently unused.
This removes the restriction of only having a single build per compose
and uses the new job types to schedule the broken-appart build.
A small change in behavior is introduced: the koji build ID is not
known when the call to `compose` returns, so it is always set to
`0`. In the future we should remove this from the API, and instead
rely on the status call to return this information, when it is
known.
The status route will be updated in follow-up commits to reflect the
changes introduced here.
Most of the worker API is now untyped, but keep Enqueu() typed to
ensure the job objects match the names in the queue. This means we
must add a version of Enqueue() for each job type we support.
Add an API route that returns logs for a specific compose.
For now, this contains the result of the job, in JSON. The idea is to
put more and more of this information into structured APIs. This is a
first step to make logs available at all.
Amend koji-compose.py to check that the route exist and contains as many
"image_logs" as images that were requested (currently always 1).
Based on a patch by Chloe Kaubisch <chloe.kaubisch@gmail.com>.
The worker server was heavily tied to OSBuildJob(Result). Untie it so
that it can deal with different job types in the future.
This necessitates a change in the jobqueue: Dequeue() now returns the
job type, as well as job arguments as json.RawMessage. This is so that
the server can wait on multiple job types with different argument
types.
The weldr, composer, and koji APIs continue to use only "osbuild" jobs.
Workers reported status via an `osbuild.Result`, which only includes
osbuild output. Make it report OSBuildJobResult instead, which was meant
to be used for this purpose and is already used as the result type in
the jobqueue.
While at it, add any errors produced by targets into this struct, as
well as an overall success flag.
Note that this breaks older workers returning the result of an osbuild
job to a new composer. I think this is fine in this case, for two
reasons:
1. We don't support running different versions of the worker and
composer in the weldr API, and remote workers aren't widely used yet.
2. Both osbuild.Result and worker.OSBuildJobResult have a top-level
`Success` boolean. Thus, logs are lost in such cases, but the overall
status of the compose is not.
Add "image_name" and "stream_optimized" fields to the osbuild job as
replacement for the local target options. The former signifies the name
of the uploaded artifact and whether an artifact should be uploaded at
all (only weldr API). The latter will be deprecated at some point, when
osbuild itself can make streamoptimized vmdk images.
This change separates what have always been two distinct concepts:
artifacts that are reported back to the composer node (in practice
always running on the same machine), and upload targets to clouds and
such. Separating them makes it easier to add job types that only allow
one upload target while keeping artifacts.
Keep the local target around, so that jobs that are scheduled can still
be run after an upgrade.
This is similar to the previous commit, which did this change in
package cloudapi.
Use constants instead of string literals for compose status, and derive
the status from worker.JobStatus directly, instead of via common.State.
We have the same thing for AWS. The AWS target also specifies under what name
should be the image available in EC2.
As requested by Brew maintainers Tomáš Kopeček and Lubomír Sedlář.
This removes the osbuild-composer-cloud package, binary, systemd units,
the (unused) test binary, and the (only-run-on-RHEL) test in aws.sh.
Instead, move the cloud API into the main package, using the same
socket as the koji API, osbuild-composer-api.socket. Expose it next to
the koji API on route `/api/composer/v1`.
This is a backwards incompatible change, but only of the -cloud parts,
which have been marked as subject to change.
Add a simple unit test for the koji API.
This adds a Handler() method to the koji.Server struct, which made
writing the test easier. This is a direction we want to go in anyway in
the future.
The cloud API will be moved to `/api/composer/v1` in the future.
Mention this in the `servers` section of the openapi.yml (relative URLs
are allowed) too, even though our generator does not consider it.