Commit graph

27 commits

Author SHA1 Message Date
Sanne Raymaekers
c4360a67f5 jobqueue: handle escaped null bytes in postgres
Postgres doesn't accept `\u0000` in the jsonb datatype. Switch to the
json datatype which is larger and slower, but accepts escaped null
bytes.

As we don't actually query or index the result jsonb directly, the
impact of this should be minimal.

See: https://www.postgresql.org/docs/current/datatype-json.html
2025-07-25 13:10:10 +02:00
Brian C. Lane
c06064c1e2 dbjobqueue: Add DeleteJob to database job queue
This adds SQL to delete jobs and dependencies, and implements the
database version of the DeleteJob function.

Related: RHEL-60120
2025-06-05 10:32:56 +02:00
Brian C. Lane
5cddc4223d dbjobqueue: Add AllRootJobIDs implementation
Related: RHEL-60120
2025-06-05 10:32:56 +02:00
Brian C. Lane
d8285a0b74 jobqueue: Add DeleteJob function
This allows jobs to be deleted from the database.
Currently only implemented by fsjobqueue. The function for
dbjobqueue currently returns nil.

This will remove all the job files used by the root job UUID as long as
no other job depends on them. ie. It starts at the top, and moves down
the dependency tree until it finds a job that is also used by another
job, removes the job to be deleted from its dependants list, and moves
back up the tree only deleting jobs with empty dependants lists.

Related: RHEL-60120
2025-06-05 10:32:56 +02:00
Brian C. Lane
87c0462a33 jobqueue: Add AllRootJobIDs function to jobqueue
This lists the root job UUIDs (the jobs with no dependants).
Currently only implemented by fsjobqueue. The function for
dbjobqueue currently returns nil.

Related: RHEL-60120
2025-02-03 17:27:31 -08:00
Florian Schüller
d3e3474fb7 internal/worker/server: return an error on depsolve timeout HMS-2989
Fixes the special case that if no worker is available and we
generate an internal timeout and cancel the depsolve including all
followup jobs, no error was propagated.
2024-11-19 13:55:38 +01:00
Sanne Raymaekers
056b3c5ea6 jobqueue: return if a job was requeued or not 2024-11-07 17:18:48 +01:00
Sanne Raymaekers
09445a1030 dbjobqueue: correct error wrapping
Preserve context.DeadlineExceeded errors through correct error
wrapping. This will reduce error-level logging noise in the worker.
2024-07-31 13:34:13 +02:00
Sanne Raymaekers
1b4935c325 jobqueue: add channel to workers
Stores the channel alongside the worker.
2024-04-19 14:32:07 +02:00
Sanne Raymaekers
de548c36f3 pkg/jobqueue: fix worker status update query
The workers table should be updated, not the heartbeats. Currently every
worker is reregisterig every minute.
2024-02-02 15:24:57 +01:00
Sanne Raymaekers
ac854b7cc8 pkg/jobqueue: add arch to worker 2023-12-14 21:25:32 +01:00
Sanne Raymaekers
d784075d31 jobqueue: add ability to track workers 2023-12-06 17:22:36 +01:00
Ondřej Budai
7edbaf6b43 dbjobqueue: put all SQL queries in dequeueMaybe into a transaction
This is needed to ensure atomicity of the whole dequeue operation.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2023-04-14 16:37:04 +02:00
Ondřej Budai
c3f6baad7f dbjobqueue: put all SQL queries into dequeueMaybe
Let's move all SQL queries together. In the following commit, we will actually
put all of them into a transaction in order to ensure atomicity.

This isn't a functional change, just code shuffling.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2023-04-14 16:37:04 +02:00
Ondřej Budai
464ce568b2 dbjobqueue: put all DequeueByID queries into a transaction
If inserting a heartbeat or querying dependencies fail, we don't want to
actually dequeue the job from the database.

The failures may be:
- context timeout/cancellation
- network issues

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2023-04-14 16:37:04 +02:00
Ondřej Budai
571b959cc1 dbjobqueue: make jobDependencies and dependants accept transactions
We will need this in following commits in order to make dequeuing atomic.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2023-04-14 16:37:04 +02:00
Sanne Raymaekers
0a9cf9b6a7 dbjobqueue: check context errors after trying to dequeue
This fixes a race condition where the context might have been canceled
or timed out in between the preliminary check and trying to dequeue, and
consequently returning the wrong error.

Instead of doing the preliminary check, just check for the context
errors when trying to dequeue.
2022-11-08 07:37:32 -05:00
Sanne Raymaekers
26b8e2ff6e dbjobqueue: acquire a new connection for each listen query
This fixes a bug where the listen function would keep trying to use a
closed, unrecoverable connection to listen for a notification. This
continued failing, which essentially made the queue instance useless.
2022-11-08 07:37:32 -05:00
Tom Gundersen
626530818d worker/server: requeue unresponsive jobs
If a job is unresponsive the worker has most likely crashed or been shut
down and the in-progress job been lost.

Instead of failing these jobs, requeue them up to two times. Once a job is lost
a third time it fails. This avoids infinite loops.

This is implemented by extending FinishJob to RequeuOrFinish job. It takes a
max number of requeues as an argument, and if that is 0, it has the same
behavior as FinishJob used to have.

If the maximum number of requeues has not yet been reached, then the running
job is returned to pending state to be picked up again.
2022-11-02 15:26:00 +01:00
Sanne Raymaekers
8fcbeaadb3 dbjobqueue: Backoff after listener error
The connection can be briefly interrupted for short periods of time,
which is fine. To avoid spamming the logs with error messages, backoff
for half a second.
2022-09-30 16:38:43 +02:00
Gianluca Zuccarelli
5a4d22cc6d pkg/dbjobqueue: fix enqueue/dequeue race condition
Currently there is a race condition that occurs between the
dbjobqueue enqueue and dequeue functions. Both queries make
use of the postgres `now()` timestamp function which returns
the timestamp of when the transaction started and not when
the statement is executed. The result of this is a timestamp
for a job's `started_at` field to be earlier than its
`queued_at` field causing a constraint violation.

Since the dequeue query will never be executed before the
enqueue query, changing the postgres timestamp function to
`statement_timestamp()` resolves this issue.
2022-09-14 12:44:46 +02:00
Lukas Zapletal
a8afca4634 Introduce logging adapter for jobqueue 2022-09-09 16:27:38 +02:00
Lukas Zapletal
b03a131f13 dbjobqueue: use background context when closing listener 2022-09-02 12:52:50 +02:00
Sanne Raymaekers
0fe3f1b2ae jobqueue: Query job dependents 2022-08-30 16:14:52 +02:00
Ondřej Budai
9def545570 dbjobqueue: fix bad errors.As usages
errors.As is meant to check whether err (or other error in its chain) can
be assigned to the value that target is pointing at.

Let's consider this example:

errors.As(err, &pgx.ErrNoRows)

pgx.ErrNoRows (and pgx.ErrTxClosed) is typed as error, thus in all
errors.As calls, the target is typed as *error. Err is always an error.
So this call is basically asking whether error can be assigned to error.
If err != nil, this is always true, thus this check doesn't make any sense
over a plain err != nil.

Go 1.19 now checks this issue and if it's found, it refuses to compile the
code, see:

https://go-review.googlesource.com/c/tools/+/339889

This commit changes usages of errors.As() to errors.Is(). The Is() method
doesn't check assignability but equality (the only different between Is()
and a plain old == operator is that Is() also inspects the whole error chain).

This fixes the check because now, we are basically checking if err (or
any other error in its chain) == pgx.ErrTxClosed which is exactly what we
want.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-07-27 18:29:59 +02:00
Simon de Vlieger
78ae275c61 jobqueue: store an expiry date
This introduces an expiry date (default: 14 days from insert date) and
adjust the service-maintenance script to delete jobs that are older than
the expiration date.
2022-07-13 17:26:04 +02:00
Sanne Raymaekers
03b57f002c jobqueue: Move jobqueue out of internal 2022-07-04 15:37:28 +02:00