Commit graph

4032 commits

Author SHA1 Message Date
Sanne Raymaekers
be6f6f04b8 templates/dashboards: Rename composer latency titles
These measure latency across all requests, not just compose requests.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
c4d529be5c templates/dashboards: Add thresholds to duration/latency graphs
Show the threshold where we have an SLO target.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
2da910d3e4 templates/dashboards: Bump duration/latency gauges to 95p
This reflects the SLO target of 95%.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
4eb4894c3a templates/dashboards: Reverse order in duration/latency graphs
In these graphs p99 isn't very important. If 1% of jobs are slow that's
fine. The p50 and p95 slices are the important ones, so reorder and
recolor the duration graphs to reflect this.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
060d3ae85d templates/dashboards: Bump worker latency slo variable to 0.95
This reflects the actual SLO target of 95%.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
16491149fc templates/dashboards: Reduce the interval
The interval dictates the granularity of the graphs. As the interval
decreases, spikes and dips become more pronounced. 28 days as an
interval doesn't actually show much, reduce this to 6h by default which
is a happy medium.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
8a51b5db39 templates/dashboards: Remove max from compose req success budget
Values over 100% are useful as those actually impact the error budget.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
eded793788 templates/dashboards: Remove max from build error rate budget
Values over 100% are useful as those actually impact the error budget.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
c1a44b6813 templates/dashboards: Bump grafana schema version
This makes the following diffs smaller.
2022-05-17 19:06:25 +02:00
Juan Abia
031b67566b scheduled-cloud-cleaner: remove storage account skip
scheduled cloud cleaner is skipping the default storage account for a
resource group, as this images should get removed. There can be a
situation where this images are not removed and forgotten here. Remove
this skip condition so scc checks also in this storage account.
2022-05-17 16:37:18 +02:00
Xiaofeng Wang
a6e2755fad test: Add running podman with non-root test
Bug BZ#2078937 has been fixed by osbuild PR#1013. Test should be
updated to test the fix and avoid regression
2022-05-17 21:25:49 +08:00
Tomas Hozza
1017aee438 cloud-cleaner: clean up GCE instances in all regions and zones
Since the `api.sh` test case is using random GCE zone from a random GCE
region which name starts with the `GCP_REGION` CI environment variable.
Since the used region name is not known to the `cloud-cleaner`, it has
to iterate over all potential GCE regions and their zones. We can not
simply filter the VM instance name a list of instances, because any
`instances` API call requires a zone name to be provided.

Add a new internal `cloud/gcp` package method to list existing GCE
regions based on a provided filter.
2022-05-17 12:18:12 +02:00
Tomas Hozza
18dfa9d9c9 Improve GCP test cases to pick regions with available quota
We currently use a single GCP Compute region when spinning up VMs using
the imported GCE image. As a result, we are often hitting the
'IN_USE_ADDRESSES' quota limit when there are multiple CI jobs running.
Google does not allow us to increase the quota limit any more.

Change the GCP test cases to use the CI `GCP_REGION` variable to list
all GCE regions with available quota and pick a random one from the
list. The `GCP_REGION` value is used as the region name prefix when
filtering available regions. This means that if you specify an exact GCE
region, such as `us-west1`, you'll always get the same region, but if a
GCP multi-region is used, such as `us`, then a random region prefixed
with 'us' will be used.
2022-05-17 12:18:12 +02:00
Jakub Rusz
f0f0873d6e ci: run all scripts in after_script regarless of failure
We want to run all of the scripts in after_script even if some of them
fail. In aws we have rhui repos in the images and we don't use them on
GA RHEL so ci_details.sh fails there and cloud_cleaner does not run.
2022-05-17 11:20:57 +02:00
Christian Kellner
5983c295b3 distro/rhel86: ignore SRIOV interface via new udev rule on azure-rhui
Add a new udev rule that ignores the SRIOV network interface. See the
supplied comment for details why.
2022-05-16 15:46:46 +02:00
Christian Kellner
9d5787a475 distro: add support udev rules to image config
Add support for defining udev rules via the recently added udev.rules
stage to the image configs and all pipelines support it.
2022-05-16 15:46:46 +02:00
Christian Kellner
e08fd989ed osbuild2: add udev.rules stage
The `org.osbuild.udev.rules` stage creates custom udev rules files.
This is a full implementation of the stage and includes information
about valid operators and keys.
A small test suit to test the basic functionality and validation is
included.
2022-05-16 15:46:46 +02:00
Chloe Kaubisch
13c79294b6 cloudapi: validate input
Validate incoming requests with openapi3. Remove unsupported
uuid format from the openapi spec. Similarly, change url to uri as
uri is a supported format and url is not.

Co-authored-by: Ondřej Budai <obudai@redhat.com>
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-16 13:20:46 +02:00
Ondřej Budai
f616becf39 cloudapi/test: add task_id to the compose request
It's actually required by the schema.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-16 13:20:46 +02:00
Ondřej Budai
00d602efc3 cloudapi: make UploadOptions anyOf
oneOf means that the body is valid against exactly ONE schema. There's an
issue with AWS EC2 upload options though: It requires region and
share_with_accounts fields. Such a request is also valid AWS S3 upload though
(this one only require region). This means that AWS EC2 upload options will be
always valid against two schemas which violates the oneOf rule.

Let's switch to anyOf and explain this in the openAPI spec.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-16 13:20:46 +02:00
Ondřej Budai
a8a1bb4270 cloudapi: remove ObjectReference from User
It was never required, never used. I honestly think that this was a copy-paste
error, I don't see any reason why a user would have an object reference.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-16 13:20:46 +02:00
Tom Gundersen
4eeaebd40b prometheus/job: measure time spent pending rather than queued
We are interested in the time it takes from a job could be dequeued
until it is, but if a job has dependencies that are not yet finished, it
cannot be dequeued.

Change the logic to measure the time since the last dependency was
dequeued rather than when the job was queued.

The purpose of this metric is to have an alert fire in case we have too
few workers processing jobs.
2022-05-14 17:47:38 +01:00
Tom Gundersen
4621768c14 server/requestJob: record metrics last
This ensures that only if the dequeuing is successful are metrics recorded.
2022-05-14 17:47:38 +01:00
Tom Gundersen
ac642c3d70 server/requestJob: failing to read job status is fatal
Error out early in case reading a job status fails. The state would otherwise
be inconsistent if only some of the job statuses have been read out.
2022-05-14 17:47:38 +01:00
Sanne Raymaekers
a8adb59995 templates/composer: Enable specific maintenance parts
Similar to DRY_RUN, these values should be overwritten in app-interface
per namespace. At some point the maintenance specific to the CRC tenant
(aws and gcp maintenance) should run in the workers namespace rather
than the composer namespace. Granularity is needed for this.
2022-05-14 16:21:21 +02:00
Sanne Raymaekers
d1911f6484 osbuild-service-maintenance: Move type conversion to config 2022-05-14 16:21:21 +02:00
Sanne Raymaekers
8219dcdee8 osbuild-service-maintenance: Explicitly enable maintenance parts
Stage and production share the GCP account. To avoid trying to delete
each GCP image twice, the maintenance script needs the ability to
selectively disable certain parts based on the config.
2022-05-14 16:21:21 +02:00
Achilleas Koutsou
e2fe4b8de2 spec: require osbuild v55
The new osbuild input schema, for which we added support in
https://github.com/osbuild/osbuild-composer/pull/2578, requires osbuild
v55 or newer.
2022-05-13 20:22:23 +02:00
Juan Abia
99649ee142 generate-all-test-cases: generate all manifests
regenerate all manifests without image-info and add new ones
2022-05-13 21:01:37 +03:00
Tomas Hozza
287e63735c RHEL-84: panic error on tar image on s390x
Building `tar` image for `s390x` on RHEL-84 ends with panic:
"s390x image must have a partition table, this is a programming error"

A tar image should not need a partition table, so this error does not
make sense.
2022-05-13 21:01:37 +03:00
Juan Abia
967c8734a1 test-case-generators: update repos and add img types overrides
overrides where needed for `qcow2` and `simplified-image-installer` images on specific distros. Also some repos needed to be updated to newer versions.
2022-05-13 21:01:37 +03:00
Juan Abia
a87193369d update repos.json
fix typos and add new repos
2022-05-13 21:01:37 +03:00
Tomas Hozza
1aabc1870d Image tests: don't use customizations for rhel-85/90beta edge-installer
Do not apply the user customizations on edge-installer on RHEL-85/90beta,
since they are not supported there yet. The way we generate image test
cases from the `format-request-map.json` makes even the customized image
types being generated for all distributions automatically.
2022-05-13 21:01:37 +03:00
Tomas Hozza
04f612d758 Manifests test: ensure that every image type has test coverage
Extend the manifests test to ensure that each an every image type of
each architecture and each distribution is covered by at least one
image test case.

Since now we have the ability to generate image test cases for more
complicated image types, which consists only of the manifest, we should
have test coverage for each and every image type.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-05-13 21:01:37 +03:00
Tomas Hozza
5be81326eb generate-all-test-cases: capture manifests output
Capture stdout and stderr output when running generate-test-casesin the manifests command.
This is helpful for debugging test case generation failures.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-05-13 21:01:37 +03:00
Tomas Hozza
40c095a850 Tools: fetch image test case generation matrix from composer
Add a simple tool `osbuild-composer-image-definitions` which dumps the
matrix of all distributions, architectures and image types names
supported by composer as a JSON to the stdout.

Default to fetching the image test case generation matrix directly from
composer. This eliminates the need to update a JSON source file with
this information every time a new distro or image type are added to
composer.

Delete the previously used JSON source file with the image test case
generation matrix.

Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-05-13 21:01:37 +03:00
Tomas Hozza
035b616308 test-case-generators: add request definitions for edge-raw-image
Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-05-13 21:01:37 +03:00
Tomas Hozza
97cb091ef7 test-case-generators: add RHEL-87/91 and F36 repos
Signed-off-by: Tomas Hozza <thozza@redhat.com>
2022-05-13 21:01:37 +03:00
Achilleas Koutsou
1ff36bce9a tools/format-request-map: add 'core' group to qcow2 customize
Removed in 2beb707def, possibly
accidentally.
The affected manifests were not regenerated based on this change so
they all already contain the core group.
2022-05-13 13:25:34 +02:00
Ondřej Budai
056e095419 github: pin fedora:35 for the pylint check
Fedora 36 ships pylint 2.13 that newly reports:

dnf-json:436:20: E0601: Using variable 'cache_state' before assignment (used-before-assignment)

As dnf-json is pending a big rewrite
(https://github.com/osbuild/osbuild-composer/pull/2537),
I decided to pin fedora to 35 and let the other PR decide how to proceed in
order to prevent any conflicts.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-13 12:03:26 +02:00
Thomas Lavocat
c00aae0a4a worker: provide the region for the ASG
Before, the autoscaling group discovery is failing with error:
Error getting the Autoscaling instances: MissingRegion MissingRegion:
could not find region configuration
2022-05-13 11:52:30 +02:00
Diaa Sami
5a4488c829 templates/composer: fix access to private repos
update secret name to the correct one
2022-05-12 14:49:22 +02:00
Diaa Sami
941fe3513f templates/composer: add missing fluentd-config volume 2022-05-12 14:02:00 +02:00
Sanne Raymaekers
809afbd0ad templates/composer: Specify registry for fluentd-hec image 2022-05-12 11:03:17 +02:00
Diaa Sami
33711d7d51 composer: add support for logrus syslog hook
Which will be used on crc in the log forwarding setup
NeededBy: COMPOSER-1285
2022-05-12 11:02:27 +02:00
Diaa Sami
631133eabb templates/composer: give access to private quay repos 2022-05-12 10:30:54 +02:00
Diaa Sami
ca83eccc47 templates/composer: add fluentd sidecar
The sidecar receives logs from the service and forwards them to Splunk
HEC
2022-05-12 10:30:54 +02:00
Ondřej Budai
069d08fa64 Schutzfile: update Fedora snapshots to 20220504
We need the latest golang-github-getkin-kin-openapi in order to unblock
https://github.com/osbuild/osbuild-composer/pull/1953

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-12 09:56:09 +02:00
Ondřej Budai
de46e85712 cloudapi: make Repository.rhsm optional
I think that we can spare the users of clouadpi of writing "rhsm": "false"
into the requests so I decided to make this property optional and default
to false.

This is nice because it matches the behaviour of Weldr repositories and
sources so we can also use test/data/repositories without any changes after
openapi validation is enabled.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-11 13:46:47 +02:00
Jordi Gil
616258ee25 distro: housekeeping with cpu arch and arch.Name() 2022-05-10 19:53:41 +02:00