Commit graph

42 commits

Author SHA1 Message Date
Sanne Raymaekers
786f44e7e7 templates/dashboards: human readable job duration targets
Also makes the default 40m, which is the new slo target for osbuild
jobs.
2024-07-04 12:46:19 +02:00
Sanne Raymaekers
55439fc6d3 templates/dashboards: remove active worker count
It's misleading since it counts the amount of workers that have
registered to the current composer pods, it doesn't actually keep track
of the active workers.

Remove it and keep the worker-api stats as a proxy for active workers.
2024-06-12 17:20:01 +02:00
Sanne Raymaekers
c886d6c1f5 templates/dashboards: fix community-stage tenant variable
A space is necessary before and after the colon separating the key and
the value.

[skip ci]
2024-05-08 12:59:34 +02:00
Sanne Raymaekers
e607f3b629 dashboards/worker-general: bump version 2024-04-22 13:05:39 +02:00
Sanne Raymaekers
f6acb31dd8 dashboards/worker-general: add community-stage tenant 2024-04-22 13:05:39 +02:00
Sanne Raymaekers
2eea99d008 dashboards/worker-general: min intervals and multi tooltip mode 2024-04-22 13:05:39 +02:00
Sanne Raymaekers
10d2e272a4 dashboards/worker-general: add active worker count 2024-04-22 13:05:39 +02:00
Sanne Raymaekers
95ae8ed917 dashboards/worker-general: fix tenant query 2024-04-22 13:05:39 +02:00
Sanne Raymaekers
ac9f4a2c81 dashboards/worker-general: update schema 2024-04-22 13:05:39 +02:00
Sanne Raymaekers
44426bb48f templates/dashboards: add community stage service to orgs 2024-02-05 11:38:53 +01:00
Sanne Raymaekers
f05a5b59f3 dashboards: drop API section from worker job stats dashboard
Renames the worker dashboard to worker job stats dashboard.

Drops the interval variable and relies solely on $__range and
$__rate_interval.
2023-10-03 11:48:37 +02:00
Sanne Raymaekers
715bdba1bf dashboards/worker: default to showing the past 6 hours
The worker dashboards contains slow queries, running these on 28 days of
data take a very long time (and they often time out).
2023-08-24 17:01:23 +02:00
Sanne Raymaekers
a2a3a2602c templates/dashboards/worker: add arch label to job wait duration
Display the wait duration of jobs per architecture.
2023-03-21 12:34:09 +01:00
Sanne Raymaekers
b13865d361 templates/dashboards/worker: edit thresholds
95th percentile duration is now a fixed colour, as it's tricky to get
dynamic thresholds based on the job type.

Budget remaining thresholds are now only green at infinity, turn yellow
below 4 weeks, and turn red when budget consumption would only last 3
weeks (out of 4).
2023-03-21 12:34:09 +01:00
Sanne Raymaekers
63d5132aa6 templates/dashboards/worker: change panel alignment
This aligns vertical dividers between panels across rows.
2023-03-21 12:34:09 +01:00
Sanne Raymaekers
865bb98034 templates/dashboards/worker: bump version 2023-03-21 12:34:09 +01:00
Sanne Raymaekers
5a9f8d3457 templates/dashboards/worker: show request throughput per path 2023-03-21 12:34:09 +01:00
Sanne Raymaekers
26a521f54d templates/dashboards/worker: use jobtype variable for job stats
This removes the rows of panels per job type, and uses the jobtype
variable.
2023-03-21 12:34:09 +01:00
Sanne Raymaekers
5d2f84cb9e templates/dashboards/worker: add target duration 2023-03-21 12:34:09 +01:00
Sanne Raymaekers
0b7e94b097 templates/dashboard/worker: add job type variable 2023-03-21 12:34:09 +01:00
Gianluca Zuccarelli
5aae10c951 templates/dashboards: update worker queries
The workers now use a new metric to record all
http requests. This commit updates the worker dashboard
to use the new `image_builder_worker_request_count`
query.
2023-01-09 16:52:16 +01:00
Sanne Raymaekers
b5d1c8866a templates/dashboards: Bump worker dashboard version 2022-09-14 19:43:47 +02:00
Sanne Raymaekers
db978c32bd templates/dashboards: Fix tenant name to org id mapping
The crc stage tenant and fedora stage tenant were mixed up.
2022-09-14 19:43:47 +02:00
Sanne Raymaekers
cb38a92a39 templates/dashboards: Expand job wait duration panels 2022-09-14 19:43:47 +02:00
Gianluca Zuccarelli
1fb6a574cb templates: filter worker dashboard on arch
Add the ability to filter the build job
types by architecture using the `arch`
dropdown.
2022-08-03 13:38:52 +02:00
Sanne Raymaekers
14208d872b templates/dashboards: Add brew tenants
Also:
- Gives tenants a nice display name.
- Makes "All" the default
2022-08-01 21:45:06 +01:00
Sanne Raymaekers
9347a30775 templates/dashboards: Drop arch from osbuild jobtype
This changed in #2845, and the dashboards stopped working properly as
they were looking for `osbuild+:arch`.

Keep the glob however, to also capture older metrics. The glob can be
removed after 1 month, as that's how long metrics are stored.
2022-08-01 13:37:28 +02:00
Chloe Kaubisch
86971ca312 templates: update dashboards to include tenant
Add a tenant variable to the composer dashboard, with the option
to select multiple tenants. Add tenant filter to queries accordingly.

link to dashboard: https://grafana.stage.devshift.net/d/image-builder-worker-with-tenant/image-builder-worker?orgId=1
2022-07-18 18:55:13 +02:00
Sanne Raymaekers
edcc0866b3 templates/dashboards: Bump dashboard versions
[skip ci]
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
c4d529be5c templates/dashboards: Add thresholds to duration/latency graphs
Show the threshold where we have an SLO target.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
2da910d3e4 templates/dashboards: Bump duration/latency gauges to 95p
This reflects the SLO target of 95%.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
4eb4894c3a templates/dashboards: Reverse order in duration/latency graphs
In these graphs p99 isn't very important. If 1% of jobs are slow that's
fine. The p50 and p95 slices are the important ones, so reorder and
recolor the duration graphs to reflect this.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
060d3ae85d templates/dashboards: Bump worker latency slo variable to 0.95
This reflects the actual SLO target of 95%.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
16491149fc templates/dashboards: Reduce the interval
The interval dictates the granularity of the graphs. As the interval
decreases, spikes and dips become more pronounced. 28 days as an
interval doesn't actually show much, reduce this to 6h by default which
is a happy medium.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
eded793788 templates/dashboards: Remove max from build error rate budget
Values over 100% are useful as those actually impact the error budget.
2022-05-17 19:06:25 +02:00
Sanne Raymaekers
c1a44b6813 templates/dashboards: Bump grafana schema version
This makes the following diffs smaller.
2022-05-17 19:06:25 +02:00
Gianluca Zuccarelli
1f2fd8cb76 templates: worker depsolve error display
Fix the display of the depsolve error rate
panel. The panel had an incorrect min value of
3 (or 300%).
2022-03-14 16:11:05 +01:00
Gianluca Zuccarelli
8e8d99336f templates/worker: fix depsolve error rate
The depsolve error rate had the incorrect query
and was returning the error rate for the build
jobs. This has now been fixed.
2022-02-22 19:55:14 +00:00
Gianluca Zuccarelli
e8d7519c7d templates/dashboard: worker metric queries
The prometheus queries have been updated with
the correct namepsace for the job metrics
Additionally, this commit fixes some of the
queries to add fallback values when the
query results are returned empty.
2022-02-09 14:09:50 +01:00
Gianluca Zuccarelli
dbf396db2b templates/dashboards: worker error metrics
Update the grafana dashboard for the workers
to show information on the success rate for
osbuild and depsolve jobs.
2022-02-07 20:40:37 +01:00
sanne
8a8ed14319 templates/dashboards: Fixed grafana uids
This way we get a nice URL `.../d/image-builder-(composer|worker)`.
2022-01-19 12:27:33 +01:00
Gianluca Zuccarelli
10f34de88b templates: add worker dashboard
Add an initial dashboard for the job metrics.
For now, the dashboard includes graphs and
burn rates for osbuild job duration and depsolve
job duration
2021-12-15 08:52:52 +00:00