Commit graph

244 commits

Author SHA1 Message Date
Ondřej Budai
9d0ae3bc1f packer: add initialization scripts
The worker needs quite a lot of configuration involving secrets. Baking them
in the AMI is just awful so we need to fetch them during the instance startup.

Previously, this was all done using cloud-init. This makes the cloud-init
config huge and it is also very hard to test.

This commit moves all the configuration scripts into the image itself.
Cloud-init still needs to be used to push the secret variables into the
instance. The configuration scripts are run after cloud-init. They pick up
yhe secrets and initialize the worker correctly.

These scripts were adopted from
75b752a1c0
(private repository).

During the adoption, some changes has to be applied to make shellcheck happy.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-01-04 16:17:59 +01:00
Ondřej Budai
5697b43ad6 packer: update to RHEL 8.5
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-01-04 16:17:59 +01:00
sanne
60d4f5a751 composer: Disable artifacts for the service
When backed by a DB, composer has no need of a queue directory.

This also addresses "Error moving artifacts for job" logging noise.

Signed-off-by: sanne <sanne.raymaekers@gmail.com>
2021-12-16 17:04:08 +00:00
Gianluca Zuccarelli
10f34de88b templates: add worker dashboard
Add an initial dashboard for the job metrics.
For now, the dashboard includes graphs and
burn rates for osbuild job duration and depsolve
job duration
2021-12-15 08:52:52 +00:00
sanne
98abdf1902 templates: Max concurrent requests is required for the maintenance job 2021-12-08 10:31:33 +01:00
sanne
4224b2231b templates: CronJob is part of the batch/v1 api 2021-12-07 11:52:49 +01:00
sanne
0379cb5796 templates: Add maintenance cronjob 2021-12-06 22:51:24 +01:00
Alex Njaastad
0731857d6c fix uid 2021-12-03 18:38:50 +00:00
Alex Njaastad
595a6fea70 fix version, error-budget interval 2021-12-03 18:38:50 +00:00
Alex Njaastad
a389dae79d fix slo numbers 2021-12-03 18:38:50 +00:00
Alex Njaastad
72109bb775 more dashboard fixes 2021-12-03 18:38:50 +00:00
Alex Njaastad
79caf7b536 add more panels 2021-12-03 18:38:50 +00:00
Alex Njaastad
3cf41cddcd fix interval variable 2021-12-03 18:38:50 +00:00
Alex Njaastad
50bcdf7bc4 dashboard updates 2021-12-03 18:38:50 +00:00
Ondřej Budai
8bf2dd55a2 packer: remove osbuild-composer.service override
We no longer use this AMI for composer, so we don't need this override.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
2bd2e3d1bc packer: install just osbuild-composer-worker
We don't actually need a composer in these images, so let's just install
the worker.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
b799605f51 packer: install monit and vector
Previously, monit and vector RPMs were embedded directly in the
image-builder-packer repository. This was not ideal because hosting big
binary files in git is always ugly.

This commit brings back monit and vector:

- monit is installed from EPEL
- vector is installed from the upstream RPM repository

Ansible was dropped because we don't need it in the image.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
fbebe4c2cf packer: adjust ansible playbook filepath
We want an absolute path, otherwise packer doesn't know where to find the
playbook if called from a wrong directory.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
b619e4875e packer: rework variables
osbuild and composer commit SHAs now must be passed into packer using
variables, no defaults are defined. Also, packer is no longer responsible
for naming the AMIs, the name is also passed as a variable.

imagebuilder_packer_sha was dropped entirely as the packer configuration
now lives directly in osbuild-composer repository.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
0fb3634c2c packer: remove forwarding to console
Console support in AWS EC2 is very basic. We now use vector that works much
better than console so we can just drop the forwarding and rely on vector
dumping the logs into cloudwatch.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
15c46544b6 packer/monit: remove verify_worker_connection
This is currently not working because workers in aoc no longer use mTLS.
Definitely something we want to fix in the future I think.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
cc81e919ca packer: drop RH IT certificate
I think it was needed for internal workers - not needed anymore.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Ondřej Budai
1b289cc27e packer: import image-builder-packer repository
/templates/packer now contains a copy of image-builder-packer repository
as of b8a4b45f93890090de24e3d043e2d958948fc3c5

Changes:
- LICENSE file was dropped (it was redundant)
- README file was dropped (no longer needed)
- GitHub workflows were removed (will be replaced by schutzbot)
- RPMs were removed (they were huge, will be installed in a different way)

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-01 16:08:11 +00:00
Gianluca Zuccarelli
3443fb8771 templates: update dashboard metrics
Update the composer dashboard to make use of the
namespaced metrics.
2021-11-19 22:48:25 +01:00
Ondřej Budai
8f0d685b70 template: bump postgres max conns to 20
We actually need 2 * 16 connections at minimum (one worker waits for two
jobs). Let's bump the maximum connection count even moar.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-11-19 13:25:51 +01:00
Ondřej Budai
c3a8fc19a2 templates: bump max postgres connections to 10
By default, pgxpool.Pool has 4 connections (or number of cpus if higher).
Currently, we have 3 replicas, that means max 3*4=12 DB connections.

The dequeue operation is actually blocking - when a worker is waiting for
a job, one connection is blocked. My theory is that with 16 workers, we just
don't have enough connections that causes all sorts of weird slowdowns.

This commit bumps the number of connection from one replica to 10, therefore
we should be at 30 connections in total.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-11-19 13:17:10 +01:00
Sanne Raymaekers
1fdc18856a Revert "templates: Add prometheus scrape annotations to composer-api"
This reverts commit 7f86dae69b.
2021-11-10 15:24:24 +01:00
sanne
7f86dae69b templates: Add prometheus scrape annotations to composer-api 2021-11-10 15:13:53 +01:00
Gianluca Zuccarelli
47c41a0b8d templates: add latency metrics to dashboard
Update the grafana dashboard to with metrics
for latency requests, including error budget
burn for compose latency.
2021-11-02 00:23:57 +00:00
Ondřej Budai
01445cfdfb templates: fix liveness/readiness check url
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-10-29 13:36:16 +02:00
Ondřej Budai
7cf02091d1 templates: add s3 bucket name
Composer API v2 requires a bucket name to be set in composer configuration.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-10-29 11:50:37 +01:00
Gianluca Zuccarelli
57250f5496 templates: update dashboard config map
Minor fix for a capitalisation of `image-builder`
in the grafana configmap
2021-10-28 22:17:45 +01:00
Gianluca Zuccarelli
22aed692f1 templates: add grafana dashboard
Add initial grafana dashboard with
reporting on compose success rate,
error budget and the number of
total composes.
2021-10-28 21:17:55 +01:00
Tom Gundersen
6a671112f0 templates: hook up simple probes and default limits
Use fetching the OpenAPI spec as a simple readiness/liveness, as
there is not much else we can/need to verify.

Set the default CPU and memory limits in accordance with AppSRE
requirements.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2021-10-27 22:51:35 +01:00
Tom Gundersen
b0f36fccd3 templates: add service account
Avoid using the default account, but use a dedicated one.

This follows the guidelines from AppSRE and is what was done for
image-builder.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2021-10-27 22:50:40 +01:00
Tom Gundersen
cfe9f7a87f templates: image-builder-ci access to composer
This should all move to app-interface, as it is configuration, and
we should distinguish between staging and production.

But for now, enable this where it is.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2021-10-26 10:39:50 +02:00
sanne
97fe226c8a templates: Claims based on user_ids 2021-10-19 08:30:15 +01:00
sanne
d25ae71fef worker: Configurable timeout for RequestJob
This is backwards compatible, as long as the timeout is 0 (never
timeout), which is the default.

In case of the dbjobqueue the underlying timeout is due to
context.Canceled, context.DeadlineExceeded, or net.Error with Timeout()
true. For the fsjobqueue only the first two are considered.
2021-10-19 00:12:18 +01:00
sanne
93dec413f3 templates: Name service ports 2021-10-11 22:41:36 +01:00
sanne
2c92473936 templates: Name services after endpoints 2021-10-11 09:52:21 +02:00
sanne
4b48c194a3 templates: Duplicate value in composer config
[skip ci]
2021-10-07 12:18:35 +02:00
sanne
973c1c4795 templates: Port names should be less than 15 characters
[skip ci]
2021-10-07 12:03:21 +02:00
sanne
14370e3c49 templates: Make sure ports are unquoted
[skip ci]
2021-10-07 11:56:02 +02:00
sanne
4e56f04dd7 templates: Composer OSD template 2021-10-05 16:45:55 +02:00