Migration 9 alters the result column in the jobs table is relied on for
compose statuses. Because it has to be kept consistent across
migrations, add a test to verify this.
As a side effect, the test itself handles the migration now, so remove
that part from the tests GHA.
Postgres doesn't accept `\u0000` in the jsonb datatype. Switch to the
json datatype which is larger and slower, but accepts escaped null
bytes.
As we don't actually query or index the result jsonb directly, the
impact of this should be minimal.
See: https://www.postgresql.org/docs/current/datatype-json.html
This adds tests for retrieving all root jobs, and deleting jobs
and unused dependencies. These tests are run against the fsjobqueue for
unit testing, and against dbjobqueue for integration testing.
Resolves: RHEL-60120
I got confused as the jobqueue interface is asymmetric.
It expects an object and returns a json.RawMessage
and when handing over to postgres this is abstracted
away by postgres
Fixes the special case that if no worker is available and we
generate an internal timeout and cancel the depsolve including all
followup jobs, no error was propagated.
If a job is unresponsive the worker has most likely crashed or been shut
down and the in-progress job been lost.
Instead of failing these jobs, requeue them up to two times. Once a job is lost
a third time it fails. This avoids infinite loops.
This is implemented by extending FinishJob to RequeuOrFinish job. It takes a
max number of requeues as an argument, and if that is 0, it has the same
behavior as FinishJob used to have.
If the maximum number of requeues has not yet been reached, then the running
job is returned to pending state to be picked up again.
Include a tenant label for all prometheus metrics. Modify
jobstatus function in the worker accordingly to return channel
so it can be passed to prometheus.
Previously, all dequeuers (goroutines waiting for a job to be dequeued) were
listening for new messages on postgres channel jobs (LISTEN jobs). This didn't
scale well as each dequeuer required to have its own DB connection and the
number of DB connections is hard-limited in the pool's config.
I changed the logic to work somewhat differently: dbjobqueue.New() now spawns
a goroutine that listens on the postgres channel. If there's a new message,
the goroutine just wakes up all dequeuers using a standard go channel.
Go channels are cheap so this should scale much better.
A test was added that confirms that 100 dequeuers are not a big deal now. This
test failed when I tried to run on it on the previous commit. I tried even 1000
locally and it was still fine.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
jobqueue.Job must return the channel specified in jobqueue.Enqueue during
the whole lifecycle of the given job.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Channels are a concept similar to job types. Callers must specify a channel
name when queueing a new job. A list of channels is also specified when
dequeueing a job. The dequeued job's channel will always be from one of the
specified channel. Of course, the job types are also respected. The dequeued
job will also always be from one of the specified type.
Currently, all calls to jobqueue were changed so all queue operations use
an empty channel name and all dequeue operations use a list containing
an empty channel.
Thus, this is a non-functional change.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Replace Job() and JobStatus() with typesafe versions, and introduce JobType()
for the rare instances where we don't know the type up front.
Additionally, catch a few more error cases:
- if OSBuildResult is nil, then we failed to invoke osbuild
- make sure the same JobResult handling is done for osbuild-koji, as for osbuild
We will soon need to dequeue a job using its ID. This commit adds ability
to do that to the Jobqueue interface. As always, the fsjobqueue implementation
is slightly naive but it should fine for the usecases that it's designed for.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Previously, we deleted empty channels when a job was dequeued. This is
completely wrong because there still might be some clients waiting for
a job. This commit removes the cleanup and adds a regression test.
Note that this has the potential to leak memory if we ever use a lot of
job types. Currently, we have just handful of them, so this is fine.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
This is backwards compatible, as long as the timeout is 0 (never
timeout), which is the default.
In case of the dbjobqueue the underlying timeout is due to
context.Canceled, context.DeadlineExceeded, or net.Error with Timeout()
true. For the fsjobqueue only the first two are considered.
fsjobqueue_test contained tests that are generically testing the
JobQueue interface. Split those out into its own package `jobqueuetest`.
These tests will be useful when implementing a new package that conforms
to the JobQueue interface.