Commit graph

3690 commits

Author SHA1 Message Date
Tom Gundersen
367444635a containers/composer: terminate composer first
Composer may depend on dnf-json and the worker to shut down cleanly.
2022-03-22 14:17:37 +01:00
Tom Gundersen
c3d66b5a33 cmd/composer: gracefully shut down on SIG{INT,TERM}
Call `Shutdown()` on all http servers. This means we will finish processing
any pending requests (including depsolving), but we will not listen to new
ones.

In particular, we will not answer to the readiness probe, so no new traffic
will be routed to this container.

Once all pending requests have been handled composer will shut down
gracefully and the liveness probe will return failure.

Note that in order for this to work correctly no requests should ever take longer
than the shutdown timeout (by default 30s).
2022-03-22 14:17:37 +01:00
Tom Gundersen
d3cd3197c0 container: make liveness probe independent of webserver
Currently liveness and readiness was treated the same. However, their
behaviour at shutdown is meant to be different. When a service is not read
no new connections are made to it, and when a service is not live it can be
cleaned up.

By considering our service live if and only if it listens to HTTP requests we
don't have the opportunity to clean up after we stop listening to new requests.

Leave readiness probes as they are, and instead use a file in the filesystem to
indicate when the service is live. It is created before composer is spawned and
deleted once composer exits.
2022-03-22 14:17:37 +01:00
Jakub Rusz
15c2044b3c tests/upgrade: update gpg key
We need to use a new gpg key after the SHA-1 deprecation. Also don't
fail immediately on compose failure to be able to retrieve logs from the
test VM.
2022-03-22 10:54:30 +01:00
Ondřej Budai
67e55eaea8 gitlab: run containerbuild on RHEL
Otherwise, we're running into
https://bugzilla.redhat.com/show_bug.cgi?id=2065292
and when I tried implementing a workaround, I ran into
https://bugzilla.redhat.com/show_bug.cgi?id=1897579

Gah.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-21 16:45:49 +01:00
Ondřej Budai
99aad294dd deploy: work around a podman bug in CS8
See the comment.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-21 16:45:49 +01:00
Sanne Raymaekers
f0a17d19f0 templates/composer: Add stage service accounts owner 2022-03-21 12:57:32 +01:00
Jakub Rusz
46a79a48da workflows: Fix Gitlab CI trigger + revert debug
Previous implementation added single quotes to the git command which
made it not trigger the Gitlab CI at all. Changing it to clasic bash if
condition.
2022-03-21 10:42:28 +01:00
Sanne Raymaekers
2023f7731d worker: Support client_credentials grant type in client
This will allow us to use the service accounts which work against
identity.api.openshift.com. These are much easier to manage, especially
with the new multi-tenancy, as there's a single page to create/expire
them across an account.

They also have the added benefit of not expiring automatically when
they're not used like offline tokens, and immediate expiration when
desired.
2022-03-21 09:43:43 +01:00
Sanne Raymaekers
8900bcec40 worker: Client lazy token refresh 2022-03-21 09:43:43 +01:00
Sanne Raymaekers
8a6d6ed6cf worker: Clean up worker client config 2022-03-21 09:43:43 +01:00
Jakub Rusz
eb4c9be168 workflows: debug Gitlab CI trigger 2022-03-18 12:59:40 +01:00
Sanne Raymaekers
815d0ad65b osbuild-worker: Log unexpected dnf-json errors
These errors result in a 5xx status for the depsolve job, marked as
internal failure, it's useful to log them.
2022-03-18 10:14:06 +01:00
Ondřej Budai
9ca74694a7 packer: use unique name tag for Fedora workers
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-16 12:58:05 +01:00
Tomas Hozza
e5595667bc test/api.sh: move the DB dump to the cleanup() function
Previously, the DB was not dumped in case the compose failed. Ensure
that the DB is dumped before the script exits in any case.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-03-16 09:03:47 +00:00
Tomas Hozza
e8a347d1e8 test/api.sh: do not use /tmp, but $WORKDIR
Do not create files directly in `/tmp`, but use `$WORKDIR`, which is a
temporary directory for transient files, which gets cleaned up when the
test case finishes. Without this change, running `api.sh` twice fails
the second time.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-03-16 09:03:47 +00:00
Antonio Murdaca
b2d18166de test/data/manifests: regenerate
Signed-off-by: Antonio Murdaca <runcom@linux.com>
2022-03-14 17:31:40 +01:00
Antonio Murdaca
5f2ad326a6 internal/distro/rhel{86,90}: drop console kargs from raw image deployment
Using the simplified installer we were experiencing slow system boots.
Turns out we're incurring into https://bugzilla.redhat.com/show_bug.cgi?id=1839923
This patch just drops the console kargs - to be aligned with the
anaconda installer that doesn't experience this slow down.
The slow down doesn't happen on virtual machines as there's always a
ttyS0 there

Signed-off-by: Antonio Murdaca <runcom@linux.com>
2022-03-14 17:31:40 +01:00
Gianluca Zuccarelli
19e2fb7fb5 template: composer dashboard queries
Tidy up the queries for the composer dashboard
and making them more readable in grafana. Additionally
add some fallback values for when empty query results
are returned from prometheus.
2022-03-14 16:11:05 +01:00
Gianluca Zuccarelli
1f2fd8cb76 templates: worker depsolve error display
Fix the display of the depsolve error rate
panel. The panel had an incorrect min value of
3 (or 300%).
2022-03-14 16:11:05 +01:00
Jakub Rusz
c91131ee0c github workflows: modify Gitlab CI trigger
In 5e639cba6f the context of the Trigger
Gitalb CI workflow changed and the context
"github.event.pull_request.draft" is no longer available so the
condition for SKIP_CI didn't work. This can be fixed by getting the
variable in the previous workflow and passin it as artifact. Docs:
https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#using-data-from-the-triggering-workflow
2022-03-14 14:40:23 +02:00
Jakub Rusz
d8ea259f8b ci: run ci_details.sh in before_script
This is a nice script showing potentially useful details about the
runner so let's execute it at the begining of each job.
2022-03-14 14:24:59 +02:00
Ondřej Budai
418ae32cf8 packer: fix the secret ID variable in get_koji_creds.sh
Oops, we should probably start testing this.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-14 10:27:28 +01:00
Ondřej Budai
424a741de6 packer: make subscribing optional
We don't want to subscribe Fedora.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 22:31:40 +01:00
Ondřej Budai
c46376aea2 packer: add support for koji credentials
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
2dd5ae7bca packer: skip retrieving of creds if their ARN is not specified
So we can have workers without public cloud creds.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
4c0ba50ea1 packer: remove config tinkering from worker_service.sh
Let's set each cloud section of the config in the respective cloud script.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
2813507ac9 packer: split worker_external_creds.sh into one script per cloud
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
2e7815bf53 packer: move worker-config creation to ansible
I think it untangles the initialization a bit and allows me to do some more
refactorings.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Tom Gundersen
2a4d4c4d49 dnf-json: use the default connection timeout
By default `timeout` is 30 seconds, but we had it set to 5. Drop
the override and use the default.

This has two effects: it increases the time before we give up on
connecting (as it says on the tin), and it also increases the time
download has to be slow for before we give up.

Internally, we were seing failures in downlaoding metadata from ODCS
and similar issues have occurred in CI too.

The potential downside to this is in case of having several mirrors
this means it takes longer before giving up on a bad one and trying
a better one. But slow is better than broken, so for now rever to
the default behavior.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2022-03-12 09:09:13 +01:00
Tomas Hozza
562225af4c osbuild-pipeline: use repo name from the request if provided
Almost all repo configurations used for generating image test cases
using `osbuild-pipeline` have `name` defined. Make sure that the repo
name provided in the compose request is used when depsolving.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-03-12 08:36:40 +01:00
Tomas Hozza
13a9022fd8 rpmmd: rename toDNFRepoConfig() argument i -> repoID
Rename the method argument name to make its purpose obvious.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-03-12 08:36:40 +01:00
Tomas Hozza
180290d016 dnf-json: use repository name from the request if provided
Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-03-12 08:36:40 +01:00
Tomas Hozza
43dafe87fb rpmmd: pass repo name to dnf-json
The repo name is already part of the `rpmmd.RepoConfig` structure. Do
not ignore when calling `dnf-json` and and pass it the value.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-03-12 08:36:40 +01:00
Tomas Hozza
f9d0412316 dnf-json: do not use reponame as repoid.
Repo name defaults to the repo ID if the name is not set. `dnf-json`
should not rely on the `reponame` being set to the ID and intsead
return the actual `repoid`.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-03-12 08:36:40 +01:00
Ondřej Budai
c4c7f44fcb dbjobqueue: reimplement the jobqueue to use only one listening connection
Previously, all dequeuers (goroutines waiting for a job to be dequeued) were
listening for new messages on postgres channel jobs (LISTEN jobs). This didn't
scale well as each dequeuer required to have its own DB connection and the
number of DB connections is hard-limited in the pool's config.

I changed the logic to work somewhat differently: dbjobqueue.New() now spawns
a goroutine that listens on the postgres channel. If there's a new message,
the goroutine just wakes up all dequeuers using a standard go channel.
Go channels are cheap so this should scale much better.

A test was added that confirms that 100 dequeuers are not a big deal now. This
test failed when I tried to run on it on the previous commit. I tried even 1000
locally and it was still fine.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 16:04:52 +01:00
Ondřej Budai
c8dbe0de74 dbjobqueue: remove unused variables from Dequeue
Removing queued_at and started_at is pretty straightforward, it wasn't needed.
Removing token might seem concerning but basically we were just pulling
the same value from DB as we were pushing there. I think there's no value in
doing that.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 16:04:52 +01:00
Sanne Raymaekers
318a4525c6 cmd/osbuild-worker: dnf-json returns MarkingErrors (plural) 2022-03-11 10:13:27 +01:00
Feng Huang
c64eb98011 use app-sre packer image
Signed-off-by: Feng Huang <fehuang@redhat.com>
2022-03-11 09:24:26 +01:00
Ondřej Budai
72de1b3bbe packer: don't save the AMIs on PRs
This should save us a ton of resources as we don't use AMIs from PRs.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
ad15179faf packer: build Fedora images
The decision logic which jobs to run is quite confusing but that's how we
roll for now:

Jenkins builds RHEL images only on main
Schutzbot builds RHEL images only in PRs
Schutzbot builds Fedora images on both PRs and on main

To achieve this, the commit re-enables running Packer on main on Schutzbot.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
ec070612ff packer: remove RHEL and x86_64-specific bits
Arch was easy.

For passing the repository distribution and osbuild_commit (it can be
different for each distro), I decided to go in the way of ansible
inventory directories. It adds a bit of structure but I think it's
the most clean solution.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
cd394bf67d packer: add default to aws auth variables
So you don't have to pass these if packer is supposed to find them
on its own (instance profile, local profile).

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
4ae71d3f3d packer: move all RHEL-specific options to a source block
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
22ec89f956 packer: add more tags identifying the image
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
7301ea6b9d packer: use newer (=faster) instances
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
8664c1449a packer: reuse the build user for the ansible provisioner
We want to build multiple images at once and some of them use a different user.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
e45578d3b0 packer: remove the ami_id variable
We want to build multiple images at once so they have to be defined elsewhere.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
5ecbfbad9e packer: rename composer.pkr.hcl to worker.pkr.hcl
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Achilleas Koutsou
e5675efc4a github: fix job names and IDs for the tests workflow
Flip the incorrect flip that happened in
e4baddfad1
2022-03-10 10:54:20 +01:00