Commit graph

120 commits

Author SHA1 Message Date
Ondřej Budai
5315264f2e packer: pin the vector version
See the comment inline.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-06-07 09:08:22 +02:00
Sanne Raymaekers
968023f950 templates/composer: Map db secrets to maintenance container 2022-06-04 12:48:17 +02:00
Sanne Raymaekers
71c78991a6 cloudapi: Drop bucket from composer config
This value is set in the worker config. In future it might also be
passed through the api to upload into target accounts, but it should
never be set in composer.
2022-06-01 12:03:12 +02:00
Ondřej Budai
34fb2b6001 templates: add Fedora prod tenant to the ACL
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-27 17:19:19 +01:00
Sanne Raymaekers
973b209060 templates/composer: Add resources requests/limits to db migration 2022-05-27 15:09:42 +02:00
Sanne Raymaekers
b91400fd92 templates/composer: Add podAntiAffinity rule based on hostname
Linter output:
Specify anti-affinity in your pod specification to ensure that the
orchestrator attempts to schedule replicas on different nodes. Using
podAntiAffinity, specify a labelSelector that matches pods for the
deployment, and set the topologyKey to kubernetes.io/hostname.
2022-05-27 15:09:42 +02:00
Sanne Raymaekers
2208cb1122 .github: Add kube-linter check 2022-05-27 15:09:42 +02:00
Sanne Raymaekers
edcc0866b3 templates/dashboards: Bump dashboard versions
[skip ci]
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
01e2caf95e templates/dashboards: Set default timerange to 28 days
All our SLOs apply to a 28d period. The default state of the board
should reflect that.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
be6f6f04b8 templates/dashboards: Rename composer latency titles
These measure latency across all requests, not just compose requests.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
c4d529be5c templates/dashboards: Add thresholds to duration/latency graphs
Show the threshold where we have an SLO target.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
2da910d3e4 templates/dashboards: Bump duration/latency gauges to 95p
This reflects the SLO target of 95%.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
4eb4894c3a templates/dashboards: Reverse order in duration/latency graphs
In these graphs p99 isn't very important. If 1% of jobs are slow that's
fine. The p50 and p95 slices are the important ones, so reorder and
recolor the duration graphs to reflect this.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
060d3ae85d templates/dashboards: Bump worker latency slo variable to 0.95
This reflects the actual SLO target of 95%.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
16491149fc templates/dashboards: Reduce the interval
The interval dictates the granularity of the graphs. As the interval
decreases, spikes and dips become more pronounced. 28 days as an
interval doesn't actually show much, reduce this to 6h by default which
is a happy medium.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
8a51b5db39 templates/dashboards: Remove max from compose req success budget
Values over 100% are useful as those actually impact the error budget.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
eded793788 templates/dashboards: Remove max from build error rate budget
Values over 100% are useful as those actually impact the error budget.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
c1a44b6813 templates/dashboards: Bump grafana schema version
This makes the following diffs smaller.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
a8adb59995 templates/composer: Enable specific maintenance parts
Similar to DRY_RUN, these values should be overwritten in app-interface
per namespace. At some point the maintenance specific to the CRC tenant
(aws and gcp maintenance) should run in the workers namespace rather
than the composer namespace. Granularity is needed for this.
2022-05-14 16:21:21 +02:00
Diaa Sami
5a4488c829 templates/composer: fix access to private repos
update secret name to the correct one
2022-05-12 14:49:22 +02:00
Diaa Sami
941fe3513f templates/composer: add missing fluentd-config volume 2022-05-12 14:02:00 +02:00
Sanne Raymaekers
809afbd0ad templates/composer: Specify registry for fluentd-hec image 2022-05-12 11:03:17 +02:00
Diaa Sami
631133eabb templates/composer: give access to private quay repos 2022-05-12 10:30:54 +02:00
Diaa Sami
ca83eccc47 templates/composer: add fluentd sidecar
The sidecar receives logs from the service and forwards them to Splunk
HEC
2022-05-12 10:30:54 +02:00
Sanne Raymaekers
02debc0cda templates/composer: Parametrize tenants in acl
This will allow us to specify tenants in the acl per namespace.
2022-05-10 15:40:38 +02:00
Sanne Raymaekers
1ded72b4dc templates/packer: Set region in vector config
Vector 0.21 needs region set otherwise the healthcheck will
fail.
2022-04-19 13:24:33 +02:00
Sanne Raymaekers
11890682b7 templates/composer: Drop unused variables 2022-03-28 12:02:37 +02:00
Sanne Raymaekers
eba355bb60 templates/composer: Remove unused acl claims
This leaves fedora and consoledot tenants.
2022-03-28 11:38:48 +02:00
Ondřej Budai
fc86ffd968 container: fix liveness probe
We don't have permissions to write to /run when running on OpenShift so let's
just use /tmp and change the filename to prevent any conflicts.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-25 14:02:12 +01:00
Sanne Raymaekers
9368b60401 templates/composer: Add prod service accounts owner 2022-03-23 16:43:10 +01:00
Tom Gundersen
d3cd3197c0 container: make liveness probe independent of webserver
Currently liveness and readiness was treated the same. However, their
behaviour at shutdown is meant to be different. When a service is not read
no new connections are made to it, and when a service is not live it can be
cleaned up.

By considering our service live if and only if it listens to HTTP requests we
don't have the opportunity to clean up after we stop listening to new requests.

Leave readiness probes as they are, and instead use a file in the filesystem to
indicate when the service is live. It is created before composer is spawned and
deleted once composer exits.
2022-03-22 14:17:37 +01:00
Sanne Raymaekers
f0a17d19f0 templates/composer: Add stage service accounts owner 2022-03-21 12:57:32 +01:00
Sanne Raymaekers
2023f7731d worker: Support client_credentials grant type in client
This will allow us to use the service accounts which work against
identity.api.openshift.com. These are much easier to manage, especially
with the new multi-tenancy, as there's a single page to create/expire
them across an account.

They also have the added benefit of not expiring automatically when
they're not used like offline tokens, and immediate expiration when
desired.
2022-03-21 09:43:43 +01:00
Ondřej Budai
9ca74694a7 packer: use unique name tag for Fedora workers
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-16 12:58:05 +01:00
Gianluca Zuccarelli
19e2fb7fb5 template: composer dashboard queries
Tidy up the queries for the composer dashboard
and making them more readable in grafana. Additionally
add some fallback values for when empty query results
are returned from prometheus.
2022-03-14 16:11:05 +01:00
Gianluca Zuccarelli
1f2fd8cb76 templates: worker depsolve error display
Fix the display of the depsolve error rate
panel. The panel had an incorrect min value of
3 (or 300%).
2022-03-14 16:11:05 +01:00
Ondřej Budai
418ae32cf8 packer: fix the secret ID variable in get_koji_creds.sh
Oops, we should probably start testing this.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-14 10:27:28 +01:00
Ondřej Budai
424a741de6 packer: make subscribing optional
We don't want to subscribe Fedora.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 22:31:40 +01:00
Ondřej Budai
c46376aea2 packer: add support for koji credentials
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
2dd5ae7bca packer: skip retrieving of creds if their ARN is not specified
So we can have workers without public cloud creds.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
4c0ba50ea1 packer: remove config tinkering from worker_service.sh
Let's set each cloud section of the config in the respective cloud script.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
2813507ac9 packer: split worker_external_creds.sh into one script per cloud
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
2e7815bf53 packer: move worker-config creation to ansible
I think it untangles the initialization a bit and allows me to do some more
refactorings.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-13 09:08:11 +01:00
Ondřej Budai
72de1b3bbe packer: don't save the AMIs on PRs
This should save us a ton of resources as we don't use AMIs from PRs.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
ad15179faf packer: build Fedora images
The decision logic which jobs to run is quite confusing but that's how we
roll for now:

Jenkins builds RHEL images only on main
Schutzbot builds RHEL images only in PRs
Schutzbot builds Fedora images on both PRs and on main

To achieve this, the commit re-enables running Packer on main on Schutzbot.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
ec070612ff packer: remove RHEL and x86_64-specific bits
Arch was easy.

For passing the repository distribution and osbuild_commit (it can be
different for each distro), I decided to go in the way of ansible
inventory directories. It adds a bit of structure but I think it's
the most clean solution.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
cd394bf67d packer: add default to aws auth variables
So you don't have to pass these if packer is supposed to find them
on its own (instance profile, local profile).

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
4ae71d3f3d packer: move all RHEL-specific options to a source block
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
22ec89f956 packer: add more tags identifying the image
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00
Ondřej Budai
7301ea6b9d packer: use newer (=faster) instances
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 09:06:43 +01:00