Tomáš Hozza
6ae8904f5a
templates/packer: add comment to get_aws_creds.sh
...
Add a comment explaining why it is important to set the AWS bucket in
the worker configuration, even if the `AWS_ACCOUNT_IMAGE_BUILDER_ARN` is
empty.
2022-10-11 13:23:18 +02:00
Tomáš Hozza
09daa75adf
templates/packer: set the GCP bucket in the worker configuration
...
Similar to AWS, set the GCP bucket in the worker configuration.
2022-10-11 13:23:18 +02:00
Diaa Sami
5ffb9e693e
tools/appsre: remove monit setup code & scripts
...
Since it doesn't not work since we moved workers to app-sre
2022-10-04 16:26:08 +02:00
Ondřej Budai
f25dca793d
packer: remove Fedora 35
...
Our workers already run on Fedora 36 so there's no need to build F35 anymore.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-09-30 14:52:24 +02:00
Diaa Sami
98eda72499
templates/packer: update amazon plugin
2022-09-27 10:47:32 +02:00
Diaa Sami
06fbd926ae
app-sre: Update AMIs to rhel-9.0
2022-09-27 10:47:32 +02:00
Sanne Raymaekers
5c12076b4f
templates/packer: Allow token url to be set by cloud-init vars
...
Hardcoding the token url renders the image useless if it ever needs to
be changed.
2022-09-22 14:15:26 +02:00
Ondřej Budai
8f97c4788c
packer: add fedora 36
...
F35 is going EOL soon, so let's update. I want to ditch F35 as soon as possible
after this is merged, but I want to have some overlap just to be sure.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-09-22 11:22:46 +02:00
Diaa Sami
5dda08a20a
templates/composer.yml: update splunk port for splunk cloud
...
using an openshift template variable
2022-09-22 10:40:22 +02:00
Sanne Raymaekers
183e10e466
templates/packer: append distro and arch to the ami name
...
Because the rhel-8 images share the same name, and `force_deregister` is
true, packer will always deregister one of them.
2022-09-15 20:27:59 +02:00
Sanne Raymaekers
b5d1c8866a
templates/dashboards: Bump worker dashboard version
2022-09-14 19:43:47 +02:00
Sanne Raymaekers
db978c32bd
templates/dashboards: Fix tenant name to org id mapping
...
The crc stage tenant and fedora stage tenant were mixed up.
2022-09-14 19:43:47 +02:00
Sanne Raymaekers
cb38a92a39
templates/dashboards: Expand job wait duration panels
2022-09-14 19:43:47 +02:00
Diaa Sami
819a63e50e
templates/packer: reasonable aws_polling limits for rhel AWS builds
2022-09-09 12:08:29 +02:00
Diaa Sami
46d36a0e73
Revert "appsre: disable aarch64 AMI creation until issue is resolved"
...
This reverts commit 84f46eebdb .
2022-09-09 12:08:29 +02:00
Diaa Sami
84f46eebdb
appsre: disable aarch64 AMI creation until issue is resolved
...
after merging of PR #2718 , generation of AMIs has been failing with 'ResourceNotReady: exceeded wait attempts'.
issue tracked in #2961
2022-09-07 12:28:40 +02:00
Sanne Raymaekers
ab3bd7d94f
templates/packer: Increase aws timeouts for rhel-8-aarch64
...
This job is failing with "ResourceNotReady: exceeded wait attempts".
https://www.packer.io/plugins/builders/amazon#resourcenotready-error
2022-09-05 14:39:12 +02:00
Diaa Sami
ec0a1944b4
appsre-ansible: support aarch64
...
make ansible playbooks arch-agnostic
extract embedded bash script into separate file with parameters
update packer template to support aarch64
Convert parts of bash script to python code that can start multi-arch instances to build RPMS
2022-09-05 12:08:57 +02:00
Gianluca Zuccarelli
1fb6a574cb
templates: filter worker dashboard on arch
...
Add the ability to filter the build job
types by architecture using the `arch`
dropdown.
2022-08-03 13:38:52 +02:00
Sanne Raymaekers
14208d872b
templates/dashboards: Add brew tenants
...
Also:
- Gives tenants a nice display name.
- Makes "All" the default
2022-08-01 21:45:06 +01:00
Sanne Raymaekers
9347a30775
templates/dashboards: Drop arch from osbuild jobtype
...
This changed in #2845 , and the dashboards stopped working properly as
they were looking for `osbuild+:arch`.
Keep the glob however, to also capture older metrics. The glob can be
removed after 1 month, as that's how long metrics are stored.
2022-08-01 13:37:28 +02:00
Sanne Raymaekers
a221de5db7
templates/composer: Remove non-existent secret
...
The secret not existing causes the deployment to fail during a
validation stage.
```
[ERROR] [openshift_base.py:_validate_resources_used_exist] - [Deployment/composer] Secret db does not exist
```
2022-07-28 11:24:25 +02:00
Chloe Kaubisch
86971ca312
templates: update dashboards to include tenant
...
Add a tenant variable to the composer dashboard, with the option
to select multiple tenants. Add tenant filter to queries accordingly.
link to dashboard: https://grafana.stage.devshift.net/d/image-builder-worker-with-tenant/image-builder-worker?orgId=1
2022-07-18 18:55:13 +02:00
Ondřej Budai
767283b2d9
packer: use 8.6 as a base for RHEL images
...
Let's stay updated!
Also, let's remove 8.4 and 8.5 from Schutzfile, I strongly believe that it's
not used anywhere.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-07-05 11:54:12 +02:00
Ondřej Budai
5315264f2e
packer: pin the vector version
...
See the comment inline.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-06-07 09:08:22 +02:00
Sanne Raymaekers
968023f950
templates/composer: Map db secrets to maintenance container
2022-06-04 12:48:17 +02:00
Sanne Raymaekers
71c78991a6
cloudapi: Drop bucket from composer config
...
This value is set in the worker config. In future it might also be
passed through the api to upload into target accounts, but it should
never be set in composer.
2022-06-01 12:03:12 +02:00
Ondřej Budai
34fb2b6001
templates: add Fedora prod tenant to the ACL
...
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-05-27 17:19:19 +01:00
Sanne Raymaekers
973b209060
templates/composer: Add resources requests/limits to db migration
2022-05-27 15:09:42 +02:00
Sanne Raymaekers
b91400fd92
templates/composer: Add podAntiAffinity rule based on hostname
...
Linter output:
Specify anti-affinity in your pod specification to ensure that the
orchestrator attempts to schedule replicas on different nodes. Using
podAntiAffinity, specify a labelSelector that matches pods for the
deployment, and set the topologyKey to kubernetes.io/hostname.
2022-05-27 15:09:42 +02:00
Sanne Raymaekers
2208cb1122
.github: Add kube-linter check
2022-05-27 15:09:42 +02:00
Sanne Raymaekers
edcc0866b3
templates/dashboards: Bump dashboard versions
...
[skip ci]
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
01e2caf95e
templates/dashboards: Set default timerange to 28 days
...
All our SLOs apply to a 28d period. The default state of the board
should reflect that.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
be6f6f04b8
templates/dashboards: Rename composer latency titles
...
These measure latency across all requests, not just compose requests.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
c4d529be5c
templates/dashboards: Add thresholds to duration/latency graphs
...
Show the threshold where we have an SLO target.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
2da910d3e4
templates/dashboards: Bump duration/latency gauges to 95p
...
This reflects the SLO target of 95%.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
4eb4894c3a
templates/dashboards: Reverse order in duration/latency graphs
...
In these graphs p99 isn't very important. If 1% of jobs are slow that's
fine. The p50 and p95 slices are the important ones, so reorder and
recolor the duration graphs to reflect this.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
060d3ae85d
templates/dashboards: Bump worker latency slo variable to 0.95
...
This reflects the actual SLO target of 95%.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
16491149fc
templates/dashboards: Reduce the interval
...
The interval dictates the granularity of the graphs. As the interval
decreases, spikes and dips become more pronounced. 28 days as an
interval doesn't actually show much, reduce this to 6h by default which
is a happy medium.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
8a51b5db39
templates/dashboards: Remove max from compose req success budget
...
Values over 100% are useful as those actually impact the error budget.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
eded793788
templates/dashboards: Remove max from build error rate budget
...
Values over 100% are useful as those actually impact the error budget.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
c1a44b6813
templates/dashboards: Bump grafana schema version
...
This makes the following diffs smaller.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
a8adb59995
templates/composer: Enable specific maintenance parts
...
Similar to DRY_RUN, these values should be overwritten in app-interface
per namespace. At some point the maintenance specific to the CRC tenant
(aws and gcp maintenance) should run in the workers namespace rather
than the composer namespace. Granularity is needed for this.
2022-05-14 16:21:21 +02:00
Diaa Sami
5a4488c829
templates/composer: fix access to private repos
...
update secret name to the correct one
2022-05-12 14:49:22 +02:00
Diaa Sami
941fe3513f
templates/composer: add missing fluentd-config volume
2022-05-12 14:02:00 +02:00
Sanne Raymaekers
809afbd0ad
templates/composer: Specify registry for fluentd-hec image
2022-05-12 11:03:17 +02:00
Diaa Sami
631133eabb
templates/composer: give access to private quay repos
2022-05-12 10:30:54 +02:00
Diaa Sami
ca83eccc47
templates/composer: add fluentd sidecar
...
The sidecar receives logs from the service and forwards them to Splunk
HEC
2022-05-12 10:30:54 +02:00
Sanne Raymaekers
02debc0cda
templates/composer: Parametrize tenants in acl
...
This will allow us to specify tenants in the acl per namespace.
2022-05-10 15:40:38 +02:00
Sanne Raymaekers
1ded72b4dc
templates/packer: Set region in vector config
...
Vector 0.21 needs region set otherwise the healthcheck will
fail.
2022-04-19 13:24:33 +02:00