Commit graph

262 commits

Author SHA1 Message Date
Sanne Raymaekers
e56248d3c8 templates: Add production worker account to acl 2022-02-25 16:57:13 +01:00
Sanne Raymaekers
b05723a37e templates/composer: Verify against mass sso and rh sso 2022-02-24 09:48:12 +01:00
Gianluca Zuccarelli
8e8d99336f templates/worker: fix depsolve error rate
The depsolve error rate had the incorrect query
and was returning the error rate for the build
jobs. This has now been fixed.
2022-02-22 19:55:14 +00:00
Ondřej Budai
5d304d2957 packer: make the worker image smaller
This should save us some money. 10 GB is the size of the underlying
RHEL 8.5 AMI so this should be the minimum.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-02-18 09:24:07 +01:00
Sanne Raymaekers
a173a3513d tools/appsre-build-worker-packer: Run on subscribed 8.5 machine 2022-02-09 16:54:22 +01:00
Gianluca Zuccarelli
e8d7519c7d templates/dashboard: worker metric queries
The prometheus queries have been updated with
the correct namepsace for the job metrics
Additionally, this commit fixes some of the
queries to add fallback values when the
query results are returned empty.
2022-02-09 14:09:50 +01:00
Sanne Raymaekers
a739151c71 Revert "templates: Add dnf-json template"
This reverts commit 8cb3900dd6.
2022-02-08 14:05:48 +01:00
Sanne Raymaekers
4956e48a0b service-maintenance: Skip db cleanup
Let's enable the cloud cleanup first, and then move on to the db.
2022-02-07 20:42:45 +01:00
Gianluca Zuccarelli
dbf396db2b templates/dashboards: worker error metrics
Update the grafana dashboard for the workers
to show information on the success rate for
osbuild and depsolve jobs.
2022-02-07 20:40:37 +01:00
Sanne Raymaekers
8cb3900dd6 templates: Add dnf-json template 2022-02-06 14:48:32 +00:00
sanne
8a8ed14319 templates/dashboards: Fixed grafana uids
This way we get a nice URL `.../d/image-builder-(composer|worker)`.
2022-01-19 12:27:33 +01:00
sanne
ef6c5df9fa templates/packer: Make cdn host check less sensitive 2022-01-18 17:00:17 +01:00
sanne
68e98244b9 templates/packer: Correct priority for worker rpms
Lower priority means higher, currently the images built through AppSRE's
infra install the worker from epel.
2022-01-17 14:30:11 +01:00
sanne
3c729be3c5 tools/appsre-build-worker-packer: Add image_users variable
packer will share the ami with those users.
2022-01-11 14:30:19 +01:00
sanne
d08147864a osbuild-service-maintenace: Map AWS secrets 2022-01-11 12:57:02 +01:00
sanne
4797ac281a osbuild-service-maintenance: Rework GCP credentials mapping
Because of the way the gcp secrets are stored for the workers, and how
the mapping from vault to openshift works (unable to map a multiple key
secret into a single json file), there's a bit of juggling required to
get the gcp credentials in the right format.
2022-01-11 12:57:02 +01:00
sanne
71da979c81 tools: AppSRE packer build 2022-01-05 22:13:55 +01:00
Ondřej Budai
8d81da7d7b packer: remove /var/lib/osbuild-composer check
This directory is not used on worker instances. It was a left-over from the
times when this AMI was also used for running composer.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-01-04 16:17:59 +01:00
Ondřej Budai
9d0ae3bc1f packer: add initialization scripts
The worker needs quite a lot of configuration involving secrets. Baking them
in the AMI is just awful so we need to fetch them during the instance startup.

Previously, this was all done using cloud-init. This makes the cloud-init
config huge and it is also very hard to test.

This commit moves all the configuration scripts into the image itself.
Cloud-init still needs to be used to push the secret variables into the
instance. The configuration scripts are run after cloud-init. They pick up
yhe secrets and initialize the worker correctly.

These scripts were adopted from
75b752a1c0
(private repository).

During the adoption, some changes has to be applied to make shellcheck happy.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-01-04 16:17:59 +01:00
Ondřej Budai
5697b43ad6 packer: update to RHEL 8.5
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-01-04 16:17:59 +01:00
sanne
60d4f5a751 composer: Disable artifacts for the service
When backed by a DB, composer has no need of a queue directory.

This also addresses "Error moving artifacts for job" logging noise.

Signed-off-by: sanne <sanne.raymaekers@gmail.com>
2021-12-16 17:04:08 +00:00
Gianluca Zuccarelli
10f34de88b templates: add worker dashboard
Add an initial dashboard for the job metrics.
For now, the dashboard includes graphs and
burn rates for osbuild job duration and depsolve
job duration
2021-12-15 08:52:52 +00:00
sanne
98abdf1902 templates: Max concurrent requests is required for the maintenance job 2021-12-08 10:31:33 +01:00
sanne
4224b2231b templates: CronJob is part of the batch/v1 api 2021-12-07 11:52:49 +01:00
sanne
0379cb5796 templates: Add maintenance cronjob 2021-12-06 22:51:24 +01:00
Alex Njaastad
0731857d6c fix uid 2021-12-03 18:38:50 +00:00
Alex Njaastad
595a6fea70 fix version, error-budget interval 2021-12-03 18:38:50 +00:00
Alex Njaastad
a389dae79d fix slo numbers 2021-12-03 18:38:50 +00:00
Alex Njaastad
72109bb775 more dashboard fixes 2021-12-03 18:38:50 +00:00
Alex Njaastad
79caf7b536 add more panels 2021-12-03 18:38:50 +00:00
Alex Njaastad
3cf41cddcd fix interval variable 2021-12-03 18:38:50 +00:00
Alex Njaastad
50bcdf7bc4 dashboard updates 2021-12-03 18:38:50 +00:00
Ondřej Budai
8bf2dd55a2 packer: remove osbuild-composer.service override
We no longer use this AMI for composer, so we don't need this override.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
2bd2e3d1bc packer: install just osbuild-composer-worker
We don't actually need a composer in these images, so let's just install
the worker.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
b799605f51 packer: install monit and vector
Previously, monit and vector RPMs were embedded directly in the
image-builder-packer repository. This was not ideal because hosting big
binary files in git is always ugly.

This commit brings back monit and vector:

- monit is installed from EPEL
- vector is installed from the upstream RPM repository

Ansible was dropped because we don't need it in the image.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
fbebe4c2cf packer: adjust ansible playbook filepath
We want an absolute path, otherwise packer doesn't know where to find the
playbook if called from a wrong directory.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
b619e4875e packer: rework variables
osbuild and composer commit SHAs now must be passed into packer using
variables, no defaults are defined. Also, packer is no longer responsible
for naming the AMIs, the name is also passed as a variable.

imagebuilder_packer_sha was dropped entirely as the packer configuration
now lives directly in osbuild-composer repository.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
0fb3634c2c packer: remove forwarding to console
Console support in AWS EC2 is very basic. We now use vector that works much
better than console so we can just drop the forwarding and rely on vector
dumping the logs into cloudwatch.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
15c46544b6 packer/monit: remove verify_worker_connection
This is currently not working because workers in aoc no longer use mTLS.
Definitely something we want to fix in the future I think.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
cc81e919ca packer: drop RH IT certificate
I think it was needed for internal workers - not needed anymore.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
1b289cc27e packer: import image-builder-packer repository
/templates/packer now contains a copy of image-builder-packer repository
as of b8a4b45f93890090de24e3d043e2d958948fc3c5

Changes:
- LICENSE file was dropped (it was redundant)
- README file was dropped (no longer needed)
- GitHub workflows were removed (will be replaced by schutzbot)
- RPMs were removed (they were huge, will be installed in a different way)

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Gianluca Zuccarelli
3443fb8771 templates: update dashboard metrics
Update the composer dashboard to make use of the
namespaced metrics.
2021-11-19 22:48:25 +01:00
Ondřej Budai
8f0d685b70 template: bump postgres max conns to 20
We actually need 2 * 16 connections at minimum (one worker waits for two
jobs). Let's bump the maximum connection count even moar.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-11-19 13:25:51 +01:00
Ondřej Budai
c3a8fc19a2 templates: bump max postgres connections to 10
By default, pgxpool.Pool has 4 connections (or number of cpus if higher).
Currently, we have 3 replicas, that means max 3*4=12 DB connections.

The dequeue operation is actually blocking - when a worker is waiting for
a job, one connection is blocked. My theory is that with 16 workers, we just
don't have enough connections that causes all sorts of weird slowdowns.

This commit bumps the number of connection from one replica to 10, therefore
we should be at 30 connections in total.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-11-19 13:17:10 +01:00
Sanne Raymaekers
1fdc18856a Revert "templates: Add prometheus scrape annotations to composer-api"
This reverts commit 7f86dae69b.
2021-11-10 15:24:24 +01:00
sanne
7f86dae69b templates: Add prometheus scrape annotations to composer-api 2021-11-10 15:13:53 +01:00
Gianluca Zuccarelli
47c41a0b8d templates: add latency metrics to dashboard
Update the grafana dashboard to with metrics
for latency requests, including error budget
burn for compose latency.
2021-11-02 00:23:57 +00:00
Ondřej Budai
01445cfdfb templates: fix liveness/readiness check url
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-10-29 13:36:16 +02:00
Ondřej Budai
7cf02091d1 templates: add s3 bucket name
Composer API v2 requires a bucket name to be set in composer configuration.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-10-29 11:50:37 +01:00
Gianluca Zuccarelli
57250f5496 templates: update dashboard config map
Minor fix for a capitalisation of `image-builder`
in the grafana configmap
2021-10-28 22:17:45 +01:00