Commit graph

6831 commits

Author SHA1 Message Date
Florian Schüller
f478f802f2 github/workflows/tests: add dependency for tests
libbtrfs-dev seems to be required, otherwise the tests fail
2024-11-19 13:55:38 +01:00
Florian Schüller
2f4d7d3140 internal/cloudapi/v2/server: remove osbuild job explicitly set "failed"
osbuild job is a dependency of the resolve and manifest jobs so
leaving the state and it will fail as a depency is also fine
2024-11-19 13:55:38 +01:00
Florian Schüller
a3ef2b4a3c Makefile: add tools/prepare-source.sh to make lint
The idea is that at least `go fmt` is called on
`make lint`. It's recommended to run `make push-check`
before any push, still.
2024-11-19 13:55:38 +01:00
Florian Schüller
d3e3474fb7 internal/worker/server: return an error on depsolve timeout HMS-2989
Fixes the special case that if no worker is available and we
generate an internal timeout and cancel the depsolve including all
followup jobs, no error was propagated.
2024-11-19 13:55:38 +01:00
Lukas Zapletal
03e74e77b2 worker: log proxy setting 2024-11-18 19:33:19 +01:00
Lukas Zapletal
86f903339a worker: parse ostree MTLS proxy early 2024-11-15 10:16:26 +01:00
schutzbot
d1bf0a77f0 Post release version bump
[skip ci]
2024-11-13 08:16:45 +00:00
Lukas Zapletal
2a5d25d9c0 worker: check MTLS config for ostree 2024-11-12 12:12:52 +01:00
Sanne Raymaekers
2eb3c9f44c worker/server: add tests for job heartbeats 2024-11-07 17:18:48 +01:00
Sanne Raymaekers
14bd8d38ca worker/server: add basic tests for Pending / Running job metrics 2024-11-07 17:18:48 +01:00
Sanne Raymaekers
a971f9340b worker/server: update metrics on requeue
When requeuing a job the next worker requesting the job would decrement
pending counter, but the pending counter only ever got incremented once,
when the job was first enqueued. Thus make sure to increment the pending
counter when a job is requeued.
2024-11-07 17:18:48 +01:00
Sanne Raymaekers
056b3c5ea6 jobqueue: return if a job was requeued or not 2024-11-07 17:18:48 +01:00
Lukas Zapletal
64f479092d osbuild-worker: use the new ostree resolver API 2024-11-07 16:17:56 +01:00
Florian Schüller
f291f41dbc tools/dbtest-*: fix shellcheck complaints 2024-11-06 15:16:42 +01:00
Florian Schüller
007f2989d3 Makefile: implement shellcheck as part of make lint 2024-11-06 15:16:42 +01:00
Florian Schüller
bed22e1d9a Makefile: document make db-tests 2024-11-06 15:16:42 +01:00
Florian Schüller
ece16307c6 jobqueuetest: avoid warning and provide a valid JSON
Not needed for the test but just generates a useless warning
2024-11-06 15:16:42 +01:00
Florian Schüller
c742e6457e Makefile: add hint that this just compiles the binaries
It does not run the tests because of to the "-c" flag
2024-11-06 15:16:42 +01:00
Florian Schüller
00d3f07d08 Makefile: implement make db-tests
enables the option to run the DB tests locally
that are executed in the github actions
2024-11-06 15:16:42 +01:00
Florian Schüller
1f8da3bd83 templates/dashboards/grafana: enable shared crosshair
shared crosshair makes it easier to see the affected time
in other charts/panels
2024-11-06 11:55:22 +01:00
Florian Schüller
3ff8308389 templates/dashboards/grafana: introduce number of pending jobs 2024-11-06 11:55:22 +01:00
Achilleas Koutsou
c9e412f320 test: enable ignore_missing_repos in service configs 2024-11-05 08:21:42 +01:00
Achilleas Koutsou
5d6a4b762b templates: enable ignore_missing_repos in openshift 2024-11-05 08:21:42 +01:00
Achilleas Koutsou
af48971981 osbuild-composer: fail weldr init when repos are nil
If weldr tries to initialise when there are no repositories set and
ignore_missing_repos is enabled, return with an error.
2024-11-05 08:21:42 +01:00
Achilleas Koutsou
51287ea57e osbuild-composer/config: new option: ignore_missing_repos
osbuild/images added an error type that's returned when the reporegistry
loader doesn't find any repository configurations to load [1].  This
lets callers decide whether to stop or continue execution based on
whether repository configurations are required.

A new top-level configuration option is added for osbuild-composer that
makes it possible to start the service without having static rpm
repositories configured.  This is useful in certain (SaaS) modes where
build requests specify their own repository configurations.
2024-11-05 08:21:42 +01:00
Achilleas Koutsou
161a263b45 go.mod: update osbuild/images to v0.95.0 2024-11-05 08:21:42 +01:00
Florian Schüller
69525b7ce6 Makefile: clean golangci-lint cache on make clean
To speedup `make lint` we use a local cache.
It seems that there is no sane way to check and just
update the cache, so we'll just provide an option to wipe it.

Using the cache does reduce the time of `make lint` from
1 minute 15sec to 1 sec on my PC. So for consecutive runs,
the cache still makes sense.
2024-11-04 18:53:21 +01:00
Tomáš Hozza
a7cd521325 Test/data: add test repo configs for RHEL-9.6
Signed-off-by: Tomáš Hozza <thozza@redhat.com>
2024-11-01 14:45:02 +01:00
Sanne Raymaekers
aeba9d5a68 cloud/awscloud: don't specify max spot price
The current spot price could be limiting the available instance pool
significantly. ARM instances specifically are experiencing a lot of
capacity errors.
2024-10-30 15:41:09 +01:00
schutzbot
5ff0e9ac4e Post release version bump
[skip ci]
2024-10-30 08:16:54 +00:00
Sanne Raymaekers
f6e82ba403 templates/packer: fix fedora 40 aarch64 base image
The old one disappeared.
2024-10-29 17:42:10 +01:00
schutzbot
d29f4665ae schutzfile: Update snapshots to 20241015 2024-10-29 09:29:27 +01:00
Sanne Raymaekers
c1b67440c4 cmd/osbuild-service-maintenance: respect dry run
Respect dry run when terminating leftover SIs.
2024-10-28 10:59:25 +01:00
Sanne Raymaekers
0ef0ae7c97 templates/packer: allow setting executor type in worker config
Currently the worker images always have to use aws.ec2, this way we can
use the host executor for fedora.
2024-10-28 10:51:34 +01:00
Sanne Raymaekers
4afcd8c3fd cloud/awscloud: fix another nilpointer in maintenance functions 2024-10-25 17:46:49 +02:00
Sanne Raymaekers
73536b7743 repositories: add fedora 41
This way users could at least build fedora 41, there is currently an
issue in rpmrepo where the fedora 41 branched repositories are very
slow, so enabling CI is currently not possible.

https://github.com/osbuild/rpmrepo/issues/111
2024-10-25 11:34:04 +02:00
Simon de Vlieger
bccd1639af deps: update images to 0.94
Signed-off-by: Simon de Vlieger <supakeen@redhat.com>
2024-10-25 11:23:16 +02:00
Sanne Raymaekers
4f90a757dc cloud/awscloud: fix retrying to create secure instances
Set the correct target capacity specification type, just setting the
spot options to nil doesn't result in an on demand instance.
2024-10-24 20:25:48 +02:00
Sanne Raymaekers
6ccfc7f818 cloud/awscloud: fix nil pointer dereference in maintenance fns
The maintenance pod is crashing when describing the images by tag, most
likely something else is failing.
2024-10-24 12:05:42 +02:00
Lukas Zapletal
350ad58c31 worker: use the new resolver API 2024-10-24 11:53:04 +02:00
Ondřej Budai
eaf90f5aea schutzbot: shorten the slack notification
We are on a quest to reduce clutter on our Slack channels. Thus,
I decided to simplify the daily CI notifications:
  - the link to the edge pipelines got removed, it's now in bookmarks
  - several words were removed to make the message shorter
  - the link to the pipeline is now a hyperlink
  - the whole message should be a one liner
  - less text is now bold

I've also simplified the format in which we send the message. I think
that the block format used before makes redundant line-breaks.
Unfortunately, the mentions need to be done using user IDs instead
of user names. If you ever need to find them, go to the user's profile,
click on the three dots and select "Copy member ID".
2024-10-24 10:45:24 +02:00
Sanne Raymaekers
d5912259a0 cloud/awscloud: rework create fleet retry logic
The current path sometimes launches two instances, which is problematic
because the rest of the secure instance code expects exactly one
instance. A security group could be attached to both instances, and
would block the worker from launching any more SIs, as it tries to
delete the old security group first, which is still held by one of the
surplus SIs which didn't get terminated.

Only retry if:
- on "UnfulfillableCapacity" or "InsufficientInstanceCapacity" error codes;
- there wasn't an instance launched anyway.

If either of these checks fail, do not try to launch another one, and
just fail the job.
2024-10-24 10:29:26 +02:00
Sanne Raymaekers
661f39cbb9 cmd/osbuild-service-maintenance: add test for filtering SIs 2024-10-23 10:32:57 +02:00
Sanne Raymaekers
04a5ca6965 cmd/osbuild-service-maintenance: clean up secure instances
Now and then there are leftover secure instances, probably when worker
instances get terminated during builds, this is possible in ASGs. 2
hours as a cutoff should be enough, since the build times out after 60
minutes, and fetching the output archive after 30 minutes, so that
leaves 30 minutes for booting and connection.
2024-10-23 10:32:57 +02:00
Sanne Raymaekers
1c7a276d6f cloud/aws: add maintenance functions for secure instance cleanup 2024-10-23 10:32:57 +02:00
Sanne Raymaekers
8fc91d1c6d cloud/aws: move maintenance calls to separate file 2024-10-23 10:32:57 +02:00
Achilleas Koutsou
66c2c31a1c blueprint: add kickstart contents to conversion test
The option was added in f5c6cdd9cf but a
value was never added to the conversion test.
2024-10-22 22:08:39 +02:00
Achilleas Koutsou
654a6ad8f5 blueprint: enable the anaconda modules customization
This has been available since v0.74.0 of osbuild/images but was never
connected to the frontend blueprint.

See https://github.com/osbuild/images/pull/799
2024-10-22 22:08:39 +02:00
Tom Koscielniak
fb7a2aab96 Disable Packer job in scheduled GA pipelines 2024-10-21 14:43:18 +02:00
Ondřej Budai
1b169a150c packer: don't deregister old AMIs
Imagine this scenario: the packer job is ran, an AMI gets created.
We configure our deployment to use this AMI. Then, someone retries the
packer job. Since we have force_deregister=true, this will not only
create a new AMI, but also remove the old one (because it has the same
name). Thus, our deployment will get broken, because the source AMI
no longer exists. This means that the ASG cannot replace any broken
instances, and the secure instance feature gets absolutely broken
because it cannot spawn new secure instances (they "inherit" the AMI
ID from their parents).

Let's remove force_deregister=true, so the AMI never gets replaced.
This might cause some pipelines to start failing because they are
rerunning the packer job for same commit (the GA pipeline currently).
Let's fix those then, rerunning the packer job is just confusing.

If this causes some unexpected issues, we can always resort to using
unique AMI names (by appending a timestamp to their name), but having
multiple AMIs with different names, but same tags will cause our
terraform configuration to be reapplied everytime there's a rerun,
which is also not great.
2024-10-21 11:48:02 +02:00