Commit graph

6865 commits

Author SHA1 Message Date
dependabot[bot]
73f3aa22a2 build(deps): bump codecov/codecov-action from 4 to 5
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 4 to 5.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md)
- [Commits](https://github.com/codecov/codecov-action/compare/v4...v5)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2024-12-09 09:46:42 +01:00
Tom Koscielniak
1193187e0a tools/tests: Update rhel10 compose url
Update a rhel 10 compose url to point to nightly instead of public beta.
Fix for a failing rhel 10 nightly pipeline.
2024-12-06 12:06:40 +01:00
Ondřej Budai
3561202acc github: prevent script injections via PR branch names
Prior this commit, ${{ github.event.workflow_run.head_branch }} got
expanded in the bash script. A malicious actor could inject
an arbitrary shell script. Since this action has access to a token
with write rights the malicious actor can easily steal this token.

This commit moves the expansion into an env block where such an
injection cannot happen. This is the preferred way according to the
github docs:
https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#using-an-intermediate-environment-variable
2024-12-05 18:13:17 +01:00
Tom Koscielniak
f5a5705b7e schutzbot/tests: bump rhel10 to nightly, update tf sha and osbuild deps
Bump RHEL 10 from beta to nightly by updating terraform SHA and osbuild dependencies to start testing RHEL 10 nightly and to meet the CTC schedule.
2024-12-05 14:33:12 +01:00
schutzbot
ee41e3dce6 schutzfile: Update snapshots to 20241203 2024-12-04 10:33:09 +01:00
Sanne Raymaekers
f6feb7675b cloud/awscloud: use any instance create fleet returns
Even in case of errors, as long as create fleet returns an instance,
attempt to use it.

In some cases AWS returns `InsufficientInstanceCapacity` but still
creates an instance:
```
msg="Won't retry CreateFleet with OnDemand instance, retry: false, errors: InsufficientInstanceCapacity: There is no Spot capacity available that matches your request.; Already launched instance ([i-...]), aborting create fleet"
msg="doCreateFleetRetry: returning retry: false, msg: [InsufficientInstanceCapacity: There is no Spot capacity available that matches your request. Already launched instance ([i-...]), aborting create fleet]"
msg="doCreateFleetRetry: cancelling retry, instance already exists: [i-...]"
msg="doCreateFleetRetry: setting retry to true"
msg="Checking to retry fleet create on error InsufficientInstanceCapacity (msg: There is no Spot capacity available that matches your request.)"
```
2024-12-03 14:00:12 +01:00
Lukas Zapletal
4b55bc2825 cloudapi: carry ostree MTLS secret over 2024-12-03 13:59:45 +01:00
Sanne Raymaekers
779053d910 cloud/awscloud: give secure instances a name
That way you can just enter the parent instance id into the search bar
and get both the worker and its executor.
2024-12-03 11:56:52 +01:00
Jakub Rusz
f68fcff400 CI: remove jrusz from notifications
Removing myself from the slack notifications as I'm no longer a QE
member.
2024-12-02 14:34:51 +01:00
Sanne Raymaekers
38b799f162 cloud/awscloud: exclude really old instance types
RHEL 10 (nightly) builds fail on stage with "Fatal glibc error: CPU does
not support x86-64-v3", this is most likely due to very old instance
types not supporting a specific instruction set.
2024-11-29 15:42:27 +01:00
Lukas Zapletal
e7a7cda3bc cmd: extra env logging for osbuild worker 2024-11-29 11:57:11 +01:00
Sanne Raymaekers
1372be3f6e test/cases/api/azure: actually verify the HyperV Generation 2024-11-29 10:37:24 +01:00
Lukas Zapletal
6310115629 deps: update images to 0.102 2024-11-27 13:03:29 +01:00
schutzbot
2e1afdc829 Post release version bump
[skip ci]
2024-11-27 08:17:34 +00:00
Ondřej Budai
64ff0e3dad awscloud: add very verbose logging to createFleet creation
We still see this error sometimes:

Unable to start secure instance: Unable to create fleet: InsufficientInstanceCapacity: There is no Spot capacity available that matches your request

This is awkward because the message mentions that there is no spot
capacity, even though the current code should retry on
InsufficientInstanceCapacity. I also confirmed this by searching for
the retries log messages: there are none in the logs.

We need a bigger hammer. Let's log everything that happens in the
createFleet method in order to have better understanding why the
retry logic isn't triggered. We should probably move most of the newly
added logs to the debug level, but let's delay that until we have
more insight into what's happening.
2024-11-26 16:12:09 +01:00
Sanne Raymaekers
54ffc08814 awscloud/secure-instance: pass on fleet information on error
By surfacing the output even in case of an error, the fleet ID and
instance ID can be extracted if present. Thus the instance can be
terminated before its dependencies are deleted.
2024-11-26 12:52:12 +01:00
Sanne Raymaekers
7a166cd356 awscloud/secure-instance: log error code comparisons
We're seeing some behaviour where create fleet is not retried and
subsequently the SI cleanup fails due to the security group already
being tied to an existing instance. There is no error that an instance
was launched anyway.
2024-11-26 12:52:12 +01:00
Sanne Raymaekers
7f2766793d templates/composer: bump rhel-9 distro alias
RHEL 9.5 is now GA.
2024-11-26 11:25:33 +01:00
Michael Vogt
bc7b8355bf worker: report cashes directly to logrus
This is a bit of an RFC commit, I noticed that when we discussed
a crash from the worker we looked at individual message from
syslog/journald for the stacktrace deatils. I was wondering if
having a more direct crash report would be more useful? We can
of course also add more logrus features to flag those with tags
like "crash" or something (I did not do that in this PR, I don't
know much about the operational side, sorry).
2024-11-25 12:02:05 +01:00
Sanne Raymaekers
27e9e22639 Schutzfile: update osbuild commit
Newer osbuild uses dnf 4 on Fedora 41, as we've seen Fedora 41 in
combination with dnf 5 (and SBOMs) failing.
2024-11-22 16:39:52 +01:00
Florian Schüller
446e8448e3 awscloud/secure-instance: retry for 10 minutes
retry for 10 x 60sec. and don't log retries twice
2024-11-22 12:19:32 +01:00
Florian Schüller
4ec8894244 awscloud/secure-instance: retry on error in terminated waiter
terminated waiter sometimes responded
with "waiter state transitioned to Failure"
where we want to retry waiting for the termination
2024-11-22 12:19:32 +01:00
Sanne Raymaekers
3a6a8813a5 test/cases/api/azure: use v2 HyperV generation 2024-11-21 11:22:20 +01:00
Sanne Raymaekers
8fd36225be cloudapi/v2: support HyperV generation in Azure upload options 2024-11-21 11:22:20 +01:00
Sanne Raymaekers
f672610509 cmd/osbuild-worker: specify hyper v gen for azure images 2024-11-21 11:22:20 +01:00
Sanne Raymaekers
fb3e1b0701 internal/upload/azure: support different hyper v generations
When registering an image, users should be able to choose their hyper V
gen, as gen1 is quite outdated by now.
2024-11-21 11:22:20 +01:00
Sanne Raymaekers
d2f50a4224 internal/target: add Azure image HyperV generation 2024-11-21 11:22:20 +01:00
Yi He
867ff9c06e ci: change to rhel 9.6 2024-11-21 08:14:22 +01:00
Tom Koscielniak
d8295ea2ea Test with rhel-9.6 nightly 2024-11-21 08:14:22 +01:00
Lukas Zapletal
ff2660ef00 deps: update images to 0.99 2024-11-20 12:24:49 +01:00
Florian Schüller
b5c71cd7e2 awscloud/secure-instance: enrich logging with secure instance id
we'll log as direct URL to the console for easier tracing
2024-11-19 17:26:23 +01:00
Florian Schüller
992f876da0 cloudapi/v2/server: rephrase error message 2024-11-19 13:55:38 +01:00
Florian Schüller
02778b5361 cloudapi/v2/server: assure order of fail-calls
by avoiding map but rather using a slice the
order of SetFailed is maintained
2024-11-19 13:55:38 +01:00
Florian Schüller
ca3f0a190f internal/jobqueue/jobqueuetest/jobqueuetest: fix DB tests
I got confused as the jobqueue interface is asymmetric.
It expects an object and returns a json.RawMessage
and when handing over to postgres this is abstracted
away by postgres
2024-11-19 13:55:38 +01:00
Florian Schüller
f478f802f2 github/workflows/tests: add dependency for tests
libbtrfs-dev seems to be required, otherwise the tests fail
2024-11-19 13:55:38 +01:00
Florian Schüller
2f4d7d3140 internal/cloudapi/v2/server: remove osbuild job explicitly set "failed"
osbuild job is a dependency of the resolve and manifest jobs so
leaving the state and it will fail as a depency is also fine
2024-11-19 13:55:38 +01:00
Florian Schüller
a3ef2b4a3c Makefile: add tools/prepare-source.sh to make lint
The idea is that at least `go fmt` is called on
`make lint`. It's recommended to run `make push-check`
before any push, still.
2024-11-19 13:55:38 +01:00
Florian Schüller
d3e3474fb7 internal/worker/server: return an error on depsolve timeout HMS-2989
Fixes the special case that if no worker is available and we
generate an internal timeout and cancel the depsolve including all
followup jobs, no error was propagated.
2024-11-19 13:55:38 +01:00
Lukas Zapletal
03e74e77b2 worker: log proxy setting 2024-11-18 19:33:19 +01:00
Lukas Zapletal
86f903339a worker: parse ostree MTLS proxy early 2024-11-15 10:16:26 +01:00
schutzbot
d1bf0a77f0 Post release version bump
[skip ci]
2024-11-13 08:16:45 +00:00
Lukas Zapletal
2a5d25d9c0 worker: check MTLS config for ostree 2024-11-12 12:12:52 +01:00
Sanne Raymaekers
2eb3c9f44c worker/server: add tests for job heartbeats 2024-11-07 17:18:48 +01:00
Sanne Raymaekers
14bd8d38ca worker/server: add basic tests for Pending / Running job metrics 2024-11-07 17:18:48 +01:00
Sanne Raymaekers
a971f9340b worker/server: update metrics on requeue
When requeuing a job the next worker requesting the job would decrement
pending counter, but the pending counter only ever got incremented once,
when the job was first enqueued. Thus make sure to increment the pending
counter when a job is requeued.
2024-11-07 17:18:48 +01:00
Sanne Raymaekers
056b3c5ea6 jobqueue: return if a job was requeued or not 2024-11-07 17:18:48 +01:00
Lukas Zapletal
64f479092d osbuild-worker: use the new ostree resolver API 2024-11-07 16:17:56 +01:00
Florian Schüller
f291f41dbc tools/dbtest-*: fix shellcheck complaints 2024-11-06 15:16:42 +01:00
Florian Schüller
007f2989d3 Makefile: implement shellcheck as part of make lint 2024-11-06 15:16:42 +01:00
Florian Schüller
bed22e1d9a Makefile: document make db-tests 2024-11-06 15:16:42 +01:00