Commit graph

22 commits

Author SHA1 Message Date
Sanne Raymaekers
2837b2a3ad prometheus: split off request timing information into separate mw
Tracks the worker api in addition to the composer api.
2023-06-28 15:08:37 +02:00
Sanne Raymaekers
06038b2af6 internal/prometheus: add tenant to http and status metrics 2023-06-28 15:08:37 +02:00
Sanne Raymaekers
a25e0f4adb prometheus:: add arch label to dequeue metrics
Only add the arch label for osbuild job types, as the finish metrics
behave similarly. Having arch labels on dequeue metrics for any other
job type (but not on the finish metrics) would produce weird results.
2023-03-09 18:47:57 +01:00
Jakub Rusz
3cdfa9d7f0 internal/prometheus: add more buckets for job durations
We were hitting the limit on stage, let's increase it.
2023-02-08 12:33:10 +01:00
Gianluca Zuccarelli
25faf5ab60 internal/prometheus: remove compose fail metrics
We have switched how 5xx errors are being recorded
internally and we are now recording all failures
for all endpoints. As a result, a dedicated metric
only for compose failures is no longer required.
2023-01-12 12:55:01 +01:00
Gianluca Zuccarelli
5457b9fba2 metrics: update status metrics label
Openshift overrides the `service` label for
all metrics in the cluster. Update the label
from `service` to `subsystem` for the status
metrics query. This helps us differentiate
between requests from composer and the worker
server.
2022-12-02 09:25:40 +01:00
Gianluca Zuccarelli
8756ea717d prometheus: middleware to record 5xx errors
Create a custom middleware function
to measure 5xx requests for all composer
& worker routes and not just the `/composer`
endpoint. The result is a prometheus metric
that contains info on the request status code,
path & method.

A helper function has been added to clean the
dynamic parameters in the path routes to reduce
metric cardinality
2022-11-30 11:14:29 +01:00
Gianluca Zuccarelli
33e53398a6 prometheus: add status metrics
Add a helper function to register the same metrics
for both the worker and composer - the only difference
being the subsystem name. The function checks if the
metric has already been registered and, if so, returns
the already registered metric.
2022-11-30 11:14:29 +01:00
Gianluca Zuccarelli
8e82b223af prometheus: move constants to a single file
Move the constants to a single file and export them.
These can then later be used externally for future use
with the ocm metrics.
2022-11-30 11:14:29 +01:00
Gianluca Zuccarelli
9f4e765657 metrics: build jobs arch label
Add the architecture label to build jobs
which will enable filtering and monitoring
build jobs by architecture. Build job results
contain the `arch` field in the results struct,
this is then used to pass to the metrics, where
there is a value, otherwise it is set to an
empty string.
2022-07-27 13:37:14 +02:00
Chloe Kaubisch
873798514b prometheus: add tenant label
Include a tenant label for all prometheus metrics. Modify
jobstatus function in the worker accordingly to return channel
so it can be passed to prometheus.
2022-06-07 16:35:03 +02:00
Tom Gundersen
4eeaebd40b prometheus/job: measure time spent pending rather than queued
We are interested in the time it takes from a job could be dequeued
until it is, but if a job has dependencies that are not yet finished, it
cannot be dequeued.

Change the logic to measure the time since the last dependency was
dequeued rather than when the job was queued.

The purpose of this metric is to have an alert fire in case we have too
few workers processing jobs.
2022-05-14 17:47:38 +01:00
Gianluca Zuccarelli
80f24dbd61 metrics: change job metrics namespace
Currently the job metrics are namespaced with the composer
subsystem, i.e. `composer_worker`. Since we plan to split
the components to their own namespaces in app interface,
the worker subsystem should be split too.
2022-02-08 15:57:12 +01:00
Gianluca Zuccarelli
290472dfdf metrics: add worker error metrics
This commit introduces the collection of error
metrics since it is now possible to differentiate
between internal errors and user input errors.
Additionally, the error status is reported for
job duration metrics.
2022-02-03 23:40:42 +00:00
Gianluca Zuccarelli
bce12b7bea metrics: extract metric collection
Refactor the current metric collection to make use
of re-usable functions, since some of the same queries
are repeated. This will also make it easier to move
the collection of metrics from the job queue.
2022-02-03 23:40:42 +00:00
Gianluca Zuccarelli
e165db63ea metrics: add additional buckets
The change between the 32s bucket and the 64s bucket is too drastic
for measuring the duration of depsolve jobs. At present, 90% of the
depsolve jobs have a duration inbetween 32s and 64s, making the 32s
bucket too sensitive and the 64s bucket not sensitive enough.
2021-12-15 19:53:11 +00:00
Gianluca Zuccarelli
1a709eda5c metrics: add initial job metrics
Add job metrics to track the number of
pending/running jobs, the duration of
the jobs and how long the jobs spent in
the job queue.
2021-12-08 21:49:43 +00:00
Gianluca Zuccarelli
91f2457363 metrics: add prometheus namespaces
Make use of the prometheus namespace and subsystem
to give the metrics a consistent namespaces in openshift.
2021-11-19 22:48:25 +01:00
Gianluca Zuccarelli
f8199ec41d prometheus: add middleware function
Add middleware function to track request count
and measure the latency of compose requests.
2021-10-29 20:36:18 +01:00
Gianluca Zuccarelli
dfa6a48f5d prometheus: compose latency metric
Add metric to measure the latency
of requests made to the composer
cloud api.
2021-10-29 20:36:18 +01:00
Chloe Kaubisch
f749078b0d prometheus: update metrics
Change the name of total https requests to be more specific.
Add a new counter for failed compose requests.
2021-10-29 17:09:45 +01:00
Chloe Kaubisch
4c800f29a7 worker: add metrics
use prometheus to gather metrics
2021-07-23 21:54:28 +02:00