Commit graph

148 commits

Author SHA1 Message Date
Eng Zer Jun
00ea3eb285 test: use T.TempDir to create temporary test directory
The directory created by `T.TempDir` is automatically removed when the
test and all its subtests complete.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2022-04-05 09:27:43 +02:00
Ondřej Budai
e9ce9370c6 dbjoqbqueue: actually use the transaction object when a tx is created
Transactions are tied to a connection so this is actually not a functional
change. Nevertheless, I think it's nice to explicitly state that we are
using a transaction.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-22 17:49:22 +01:00
Ondřej Budai
187eb188da dbjoqbqueue: wait for listener to become ready before returning from New
Otherwise, there might be an already waiting dequeuer and if something is
enqueued before `sqlListen` is called, we will lost this notification.

Also, a small log message was added when shutting down the listener.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-22 17:49:22 +01:00
Ondřej Budai
c4c7f44fcb dbjobqueue: reimplement the jobqueue to use only one listening connection
Previously, all dequeuers (goroutines waiting for a job to be dequeued) were
listening for new messages on postgres channel jobs (LISTEN jobs). This didn't
scale well as each dequeuer required to have its own DB connection and the
number of DB connections is hard-limited in the pool's config.

I changed the logic to work somewhat differently: dbjobqueue.New() now spawns
a goroutine that listens on the postgres channel. If there's a new message,
the goroutine just wakes up all dequeuers using a standard go channel.
Go channels are cheap so this should scale much better.

A test was added that confirms that 100 dequeuers are not a big deal now. This
test failed when I tried to run on it on the previous commit. I tried even 1000
locally and it was still fine.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 16:04:52 +01:00
Ondřej Budai
c8dbe0de74 dbjobqueue: remove unused variables from Dequeue
Removing queued_at and started_at is pretty straightforward, it wasn't needed.
Removing token might seem concerning but basically we were just pulling
the same value from DB as we were pushing there. I think there's no value in
doing that.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-11 16:04:52 +01:00
Ondřej Budai
2765d2d9a8 jobqueuetest: add a test for multiple channels
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
32080e6202 jobqueuetest: modify testArgs to test also channels
jobqueue.Job must return the channel specified in jobqueue.Enqueue during
the whole lifecycle of the given job.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
4c31b04a65 jobqueuetest: add channel arg to the pushTestJob helper
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
7bfcee36f8 jobqueue: introduce the concept of channels
Channels are a concept similar to job types. Callers must specify a channel
name when queueing a new job. A list of channels is also specified when
dequeueing a job. The dequeued job's channel will always be from one of the
specified channel. Of course, the job types are also respected. The dequeued
job will also always be from one of the specified type.

Currently, all calls to jobqueue were changed so all queue operations use
an empty channel name and all dequeue operations use a list containing
an empty channel.

Thus, this is a non-functional change.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-03-08 12:07:00 +01:00
Ondřej Budai
9c80a17ee5 fsjobqueue: refactor to allow dequeuing by multiple criteria
Previous implementation of fsjobqueue is amazing but it has its drawbacks:
- dequeueing can be done only based on a job type
- it's limited to 100 jobs per a job type

As we soon want to be able to dequeue also by another criteria (job channel),
we need to refactor the queue.

The new implementation is more naive but also more flexible. It basically
works like the dbjobqueue - dequeueing goroutines listen for newly added
jobs. When that happens, a signal is sent to all of them and they all inspect
all pending jobs and dequeue ones that match their needs. Ones that don't find
a suitable job, are waiting for the next signal.

This is certainly slower implementation as every time a new job is added into
the queue, all dequeueing goroutines will have to iterate over all
pending jobs. I think that's fine because fsjobqueue is not recommended
to use for composer instances with heavy load.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2022-02-16 17:14:36 +01:00
Gianluca Zuccarelli
6c4caec022 metrics: move metrics to worker server
For simplicity, the collection of the job metrics
was carried out in the job queue. This was only
being done in the dbqueue and not in the fsqueue.
This pr refactors the metric collection and moves
the job metrics to the worker server, by adding a
wrapper function to enqueueing jobs so that the
metrics only have to be recorded in one place when
queueing a job.
2022-02-03 23:40:42 +00:00
Gianluca Zuccarelli
bce12b7bea metrics: extract metric collection
Refactor the current metric collection to make use
of re-usable functions, since some of the same queries
are repeated. This will also make it easier to move
the collection of metrics from the job queue.
2022-02-03 23:40:42 +00:00
Tom Gundersen
b32ab36e1d worker/server: typesafe Job and JobStatus
Replace Job() and JobStatus() with typesafe versions, and introduce JobType()
for the rare instances where we don't know the type up front.

Additionally, catch a few more error cases:
 - if OSBuildResult is nil, then we failed to invoke osbuild
 - make sure the same JobResult handling is done for osbuild-koji, as for osbuild
2022-02-01 20:28:40 +00:00
Ondřej Budai
ab3990b90a dbjobqueue: fix FinishJob not returning an error if already finished
Reported by covscan

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-12-18 00:14:07 +00:00
Gianluca Zuccarelli
1a709eda5c metrics: add initial job metrics
Add job metrics to track the number of
pending/running jobs, the duration of
the jobs and how long the jobs spent in
the job queue.
2021-12-08 21:49:43 +00:00
sanne
c43ad2b22a osbuild-service-maintenance: Clean up expired images 2021-12-03 00:14:09 +00:00
Ondřej Budai
14b29ae98a dbjobqueue: don't log when context's deadline was exceeded
This happens rather often as we limit the request job timeout to 20s on the
service.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-11-25 08:20:22 +01:00
Diaa Sami
df73b835c3 jobqueue: improve logging
Add job ID where it's missing
2021-11-16 19:16:34 +01:00
Diaa Sami
9c6438c8f4 jobqueue: include dependent job IDs when logging 2021-11-16 19:16:34 +01:00
Ondřej Budai
d3a3dbafed jobqueue: add DequeueByID
We will soon need to dequeue a job using its ID. This commit adds ability
to do that to the Jobqueue interface. As always, the fsjobqueue implementation
is slightly naive but it should fine for the usecases that it's designed for.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-11-14 10:17:03 +01:00
Ondřej Budai
2ecc48727f fsjobqueue: factor out finished deps check
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-11-14 10:17:03 +01:00
Ondřej Budai
5f4db72777 fsjobqueue: do not delete empty channels
Previously, we deleted empty channels when a job was dequeued. This is
completely wrong because there still might be some clients waiting for
a job. This commit removes the cleanup and adds a regression test.

Note that this has the potential to leak memory if we ever use a lot of
job types. Currently, we have just handful of them, so this is fine.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2021-11-14 10:17:03 +01:00
Diaa Sami
751ef84fc1 jobqueue: Better logging
Use appropriate logging levels for different statements
Log important jobqueue events with relevant information
2021-10-19 10:03:39 +01:00
sanne
d25ae71fef worker: Configurable timeout for RequestJob
This is backwards compatible, as long as the timeout is 0 (never
timeout), which is the default.

In case of the dbjobqueue the underlying timeout is due to
context.Canceled, context.DeadlineExceeded, or net.Error with Timeout()
true. For the fsjobqueue only the first two are considered.
2021-10-19 00:12:18 +01:00
sanne
9fab5def90 dbjobqueue: Reduce error noise in rollback check
If the transaction is already closed don't log the rollback failure as
an error, it means it was successfully committed.
2021-08-20 15:42:57 +02:00
Lars Karlitski
9c2c92f729 jobqueue: Introduce jobqueue backed by a postgres database
Co-authored-by: sanne <sanne.raymaekers@gmail.com>
2021-07-28 21:52:31 +01:00
Lars Karlitski
871c6e9cbb fsjobqueue: make canceling a finished job an error
This mirrors FinishJob(), which also errors when the job is already
finished.
2021-07-28 21:52:31 +01:00
Lars Karlitski
30492bfc60 jobqueue: move fsjobqueue's generic tests into new package
fsjobqueue_test contained tests that are generically testing the
JobQueue interface. Split those out into its own package `jobqueuetest`.

These tests will be useful when implementing a new package that conforms
to the JobQueue interface.
2021-07-28 21:52:31 +01:00
sanne
4385c39d66 worker: Introduce heartbeats
An occupied worker checks about every 15 seconds if it's current job was
cancelled. Use this to introduce a heartbeat mechanism, where if
composer hasn't heard from the worker in 2 minutes, the job times out
and is set to fail.
2021-07-08 21:14:38 +01:00
sanne
0fcb44e617 worker: Move job tokens to the queue itself
This removes state from the worker server, as it no longer contains the
list of running jobs. Instead only the queue knows if jobs are running
or not.
2021-07-08 21:14:38 +01:00
Achilleas Koutsou
668fb003ef jobqueue: Replace JobArgs() with Job()
JobArgs() function replaced with more general Job() function that
returns all the parameters used to originally define a job during
Enqueue(). This new function enables access to the type of a job in the
queue, which wasn't available until now (except when Dequeueing).
2021-01-19 10:37:51 +01:00
Achilleas Koutsou
6967333759 jobqueue: Add JobArgs() method
JobArgs() returns the arguments submitted with a job in raw form.
Since the structure of the args are opaque to job queue, it's the
responsibility of the caller to deserialize the arguments.

Args retrieval is added to the existing TestArgs() function.
2021-01-16 13:39:30 +01:00
Lars Karlitski
cb894ccf68 jobqueue: remove testjobqueue
testjobqueue did not implement the JobQueue interface correctly (noted
in its package comment), making it impossible to write tests for
JobQueue itself.

Replace its use everywhere with fsjobqueue operating on a temporary
directory.
2021-01-12 12:19:25 +01:00
Ondřej Budai
a6df2877a3 fsjobqueue: accept jobs of any type
Soon, we want to begin tagging the jobs with the name of its submitter.
The simplest way to add a tag to a job is to put it into its type string.
However, as we don't know (and don't want to know) the submitters' names when
osbuild-composer is initialized, we need to be able to push arbitrary job
types into the jobqueue.

This commit therefore lifts the restriction that a jobqueue accepts only
a predefined set of job types. Now, jobqueue clients can push jobs of
arbitrary names.

Signed-off-by: Ondřej Budai <ondrej@budai.cz>
2020-11-12 15:30:30 +00:00
Ondřej Budai
e007f9964e fsjobqueue: extract channelSize constant 2020-11-12 15:30:30 +00:00
Tom Gundersen
c777a18df0 jobqueue: expose dependencies when querying status
The status of a job may depend on the status of its dependenices,
as we do not repeat for instance the failed state in each dependent
job.

Return also the list of dependencies so these can be queried too.
2020-11-11 18:16:42 +01:00
Tom Gundersen
11d0da0b5c jobqueue/JobStatus: return result as json.RawMessage
Similarly to the recent changes to Dequeue(), let the caller unmarshal the
return JSON. This allows us to pass the result on without being able
to unmarshal it.

In follow-up patches, we will pass results of jobs to dependent jobs,
but the worker API does not know about the different job types, nor how
to unmarshal them.
2020-11-11 18:16:42 +01:00
Tom Gundersen
e277501ca3 jobqueue: return dependencies on dequeue
Once a job has been enqueued, there is no way to query its dependencies.

This makes dequeue more symmetric to enqueue by returning the
dependencies that were passed to enqueue, allowing the caller to
query the dependencies and their results.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-11-11 18:16:42 +01:00
Tom Gundersen
e72b14bdd1 jobqueue: do not sort dependencies
While dependencies are purely internal, sorting and pruning them is a
reasonable optimization. However, we wish to expose them in follow-up
commits and then we want them to remain unchanged from the input.

Nothing in the internal logic seems to rely on the fact the dependencies
were sorted.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-11-11 18:16:42 +01:00
Tom Gundersen
f355bdd426 testjobqueue: initialize dependants map
Adding to a nil map leads to panic. We must ensure that all maps are
initialized before use.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-11-11 18:16:42 +01:00
Lars Karlitski
59e73a686a worker: generalize job types in the server
The worker server was heavily tied to OSBuildJob(Result). Untie it so
that it can deal with different job types in the future.

This necessitates a change in the jobqueue: Dequeue() now returns the
job type, as well as job arguments as json.RawMessage. This is so that
the server can wait on multiple job types with different argument
types.

The weldr, composer, and koji APIs continue to use only "osbuild" jobs.
2020-11-09 14:17:19 +01:00
Lars Karlitski
27e8e4b5d5 jobqueue: allow canceling jobs
This is not exposed to a worker yet. It will continue the job and get an
error when it tries to update the job's status to finished.
2020-05-13 16:45:09 +02:00
Lars Karlitski
b795ca25a2 fsjobqueue: remove unnecessary variable declaration 2020-05-13 16:45:09 +02:00
Lars Karlitski
8df143fabe fsjobqueue: pass accepted job types to New()
This makes the queue more type safe and allows to get rid of the
`pendingChannel` and `pendingChannels` helpers, which only existed to
create not-yet-existing pending channels.
2020-05-13 16:45:09 +02:00
Lars Karlitski
3240f11647 fsjobqueue: factor common functionality into maybeEnqueue() 2020-05-13 16:45:09 +02:00
Lars Karlitski
e599a95067 fsjobqueue: use a single mutex to protect all fields
In addition, it protects that all public functions operate on `db`
in transactions.
2020-05-13 16:45:09 +02:00
Lars Karlitski
e03c1fff65 fsjobqueue: factor reflect.Select out of Dequeue()
Hopefully this make it easier to read.
2020-05-13 16:45:09 +02:00
Lars Karlitski
6773c01722 jobqueue: drop JobStatus type
The enum is redundant information that can be deduced from the job's
times: queuedAt, startedAt, and finishedAt. Not having it reduces the
potential for inconsistent state.
2020-05-13 16:45:09 +02:00
Lars Karlitski
e1805d5f62 fsjobqueue: only update dependants when necessary
This is the same as the code in Enqueue(). There is no need to add a job
to q.dependants if all of its dependencies have finished.
2020-05-13 16:45:09 +02:00
Lars Karlitski
459181a650 fsjobqueue: fix spelling errors and json field names 2020-05-13 16:45:09 +02:00