debian-forge-composer

Author	SHA1	Message	Date
Florian Schüller	ca3f0a190f	internal/jobqueue/jobqueuetest/jobqueuetest: fix DB tests I got confused as the jobqueue interface is asymmetric. It expects an object and returns a json.RawMessage and when handing over to postgres this is abstracted away by postgres	2024-11-19 13:55:38 +01:00
Florian Schüller	d3e3474fb7	internal/worker/server: return an error on depsolve timeout HMS-2989 Fixes the special case that if no worker is available and we generate an internal timeout and cancel the depsolve including all followup jobs, no error was propagated.	2024-11-19 13:55:38 +01:00
Sanne Raymaekers	056b3c5ea6	jobqueue: return if a job was requeued or not	2024-11-07 17:18:48 +01:00
Florian Schüller	ece16307c6	jobqueuetest: avoid warning and provide a valid JSON Not needed for the test but just generates a useless warning	2024-11-06 15:16:42 +01:00
Florian Schüller	00d3f07d08	Makefile: implement `make db-tests` enables the option to run the DB tests locally that are executed in the github actions	2024-11-06 15:16:42 +01:00
Sanne Raymaekers	1b4935c325	jobqueue: add channel to workers Stores the channel alongside the worker.	2024-04-19 14:32:07 +02:00
Sanne Raymaekers	ac854b7cc8	pkg/jobqueue: add arch to worker	2023-12-14 21:25:32 +01:00
Sanne Raymaekers	d784075d31	jobqueue: add ability to track workers	2023-12-06 17:22:36 +01:00
Brian C. Lane	aca748bc14	Don't Panic in getComposeStatus and skip invalid jobs in fsjobqueue New This handles corrupt job json files by skipping them. They still exist, and errors are logged, but the system keeps working. If one or more of the json files in /var/lib/osbuild-composer/jobs/ becomes corrupt they can stop the osbuild-composer service from starting, or stop commands like 'composer-cli compose status' from working because they quit on the first error and miss any job that aren't broken.	2023-11-20 13:34:40 +01:00
Ondřej Budai	cac9327b44	update to go 1.19 UBI and the oldest support Fedora (37) now all have go 1.19, so we are cleared to switch. gofmt now reformats comments in certain cases, so that explains the formatting changes in this commit. See https://go.dev/doc/go1.19#go-doc Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2023-07-21 19:18:00 +02:00
Tom Gundersen	626530818d	worker/server: requeue unresponsive jobs If a job is unresponsive the worker has most likely crashed or been shut down and the in-progress job been lost. Instead of failing these jobs, requeue them up to two times. Once a job is lost a third time it fails. This avoids infinite loops. This is implemented by extending FinishJob to RequeuOrFinish job. It takes a max number of requeues as an argument, and if that is 0, it has the same behavior as FinishJob used to have. If the maximum number of requeues has not yet been reached, then the running job is returned to pending state to be picked up again.	2022-11-02 15:26:00 +01:00
Sanne Raymaekers	0fe3f1b2ae	jobqueue: Query job dependents	2022-08-30 16:14:52 +02:00
Simon de Vlieger	78ae275c61	jobqueue: store an expiry date This introduces an expiry date (default: 14 days from insert date) and adjust the service-maintenance script to delete jobs that are older than the expiration date.	2022-07-13 17:26:04 +02:00
Sanne Raymaekers	03b57f002c	jobqueue: Move jobqueue out of internal	2022-07-04 15:37:28 +02:00
Sanne Raymaekers	d9bd19404d	osbuild-service-maintenance: Move maintenance queries out of jobqueue	2022-07-04 15:37:28 +02:00
Sanne Raymaekers	ff408aa68f	osbuild-service-maintenance: Vacuum tables Call vacuum analyze after each chunk of updates, and dump vacuum stats at the beginning and end of the db cleanup. Nulling results can increase size on disk, but calling vacuum analyze will free up space within the table (not on disk) and reuse the space for new inserts and updates.	2022-06-08 21:12:46 +02:00
Sanne Raymaekers	8bfc6c9961	dbjobqueue: Filter maintenance queries based on results Jobs that already had their results nulled, shouldn't be included in the maintenance job.	2022-06-08 21:12:46 +02:00
Chloe Kaubisch	873798514b	prometheus: add tenant label Include a tenant label for all prometheus metrics. Modify jobstatus function in the worker accordingly to return channel so it can be passed to prometheus.	2022-06-07 16:35:03 +02:00
Sanne Raymaekers	9b119fa4cf	osbuild-service-maintenance: Delete results from select jobs Instead of deleting records, delete the results from the manifest and depsolve jobs. This redacts sensitive data which the manifest can contain, and this conserves space.	2022-06-03 14:38:53 +02:00
Sanne Raymaekers	9bff4a4f0f	dbjobqueue: Alter foreign key constraints When deleting rows from the job table, make sure the delete is cascaded to the dependencies and heartbeat tables.	2022-06-02 18:45:24 +02:00
Eng Zer Jun	00ea3eb285	test: use `T.TempDir` to create temporary test directory The directory created by `T.TempDir` is automatically removed when the test and all its subtests complete. Reference: https://pkg.go.dev/testing#T.TempDir Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>	2022-04-05 09:27:43 +02:00
Ondřej Budai	e9ce9370c6	dbjoqbqueue: actually use the transaction object when a tx is created Transactions are tied to a connection so this is actually not a functional change. Nevertheless, I think it's nice to explicitly state that we are using a transaction. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-03-22 17:49:22 +01:00
Ondřej Budai	187eb188da	dbjoqbqueue: wait for listener to become ready before returning from New Otherwise, there might be an already waiting dequeuer and if something is enqueued before `sqlListen` is called, we will lost this notification. Also, a small log message was added when shutting down the listener. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-03-22 17:49:22 +01:00
Ondřej Budai	c4c7f44fcb	dbjobqueue: reimplement the jobqueue to use only one listening connection Previously, all dequeuers (goroutines waiting for a job to be dequeued) were listening for new messages on postgres channel jobs (LISTEN jobs). This didn't scale well as each dequeuer required to have its own DB connection and the number of DB connections is hard-limited in the pool's config. I changed the logic to work somewhat differently: dbjobqueue.New() now spawns a goroutine that listens on the postgres channel. If there's a new message, the goroutine just wakes up all dequeuers using a standard go channel. Go channels are cheap so this should scale much better. A test was added that confirms that 100 dequeuers are not a big deal now. This test failed when I tried to run on it on the previous commit. I tried even 1000 locally and it was still fine. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-03-11 16:04:52 +01:00
Ondřej Budai	c8dbe0de74	dbjobqueue: remove unused variables from Dequeue Removing queued_at and started_at is pretty straightforward, it wasn't needed. Removing token might seem concerning but basically we were just pulling the same value from DB as we were pushing there. I think there's no value in doing that. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-03-11 16:04:52 +01:00
Ondřej Budai	2765d2d9a8	jobqueuetest: add a test for multiple channels Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-03-08 12:07:00 +01:00
Ondřej Budai	32080e6202	jobqueuetest: modify testArgs to test also channels jobqueue.Job must return the channel specified in jobqueue.Enqueue during the whole lifecycle of the given job. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-03-08 12:07:00 +01:00
Ondřej Budai	4c31b04a65	jobqueuetest: add channel arg to the pushTestJob helper Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-03-08 12:07:00 +01:00
Ondřej Budai	7bfcee36f8	jobqueue: introduce the concept of channels Channels are a concept similar to job types. Callers must specify a channel name when queueing a new job. A list of channels is also specified when dequeueing a job. The dequeued job's channel will always be from one of the specified channel. Of course, the job types are also respected. The dequeued job will also always be from one of the specified type. Currently, all calls to jobqueue were changed so all queue operations use an empty channel name and all dequeue operations use a list containing an empty channel. Thus, this is a non-functional change. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-03-08 12:07:00 +01:00
Ondřej Budai	9c80a17ee5	fsjobqueue: refactor to allow dequeuing by multiple criteria Previous implementation of fsjobqueue is amazing but it has its drawbacks: - dequeueing can be done only based on a job type - it's limited to 100 jobs per a job type As we soon want to be able to dequeue also by another criteria (job channel), we need to refactor the queue. The new implementation is more naive but also more flexible. It basically works like the dbjobqueue - dequeueing goroutines listen for newly added jobs. When that happens, a signal is sent to all of them and they all inspect all pending jobs and dequeue ones that match their needs. Ones that don't find a suitable job, are waiting for the next signal. This is certainly slower implementation as every time a new job is added into the queue, all dequeueing goroutines will have to iterate over all pending jobs. I think that's fine because fsjobqueue is not recommended to use for composer instances with heavy load. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2022-02-16 17:14:36 +01:00
Gianluca Zuccarelli	6c4caec022	metrics: move metrics to worker server For simplicity, the collection of the job metrics was carried out in the job queue. This was only being done in the dbqueue and not in the fsqueue. This pr refactors the metric collection and moves the job metrics to the worker server, by adding a wrapper function to enqueueing jobs so that the metrics only have to be recorded in one place when queueing a job.	2022-02-03 23:40:42 +00:00
Gianluca Zuccarelli	bce12b7bea	metrics: extract metric collection Refactor the current metric collection to make use of re-usable functions, since some of the same queries are repeated. This will also make it easier to move the collection of metrics from the job queue.	2022-02-03 23:40:42 +00:00
Tom Gundersen	b32ab36e1d	worker/server: typesafe Job and JobStatus Replace Job() and JobStatus() with typesafe versions, and introduce JobType() for the rare instances where we don't know the type up front. Additionally, catch a few more error cases: - if OSBuildResult is nil, then we failed to invoke osbuild - make sure the same JobResult handling is done for osbuild-koji, as for osbuild	2022-02-01 20:28:40 +00:00
Ondřej Budai	ab3990b90a	dbjobqueue: fix FinishJob not returning an error if already finished Reported by covscan Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2021-12-18 00:14:07 +00:00
Gianluca Zuccarelli	1a709eda5c	metrics: add initial job metrics Add job metrics to track the number of pending/running jobs, the duration of the jobs and how long the jobs spent in the job queue.	2021-12-08 21:49:43 +00:00
sanne	c43ad2b22a	osbuild-service-maintenance: Clean up expired images	2021-12-03 00:14:09 +00:00
Ondřej Budai	14b29ae98a	dbjobqueue: don't log when context's deadline was exceeded This happens rather often as we limit the request job timeout to 20s on the service. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2021-11-25 08:20:22 +01:00
Diaa Sami	df73b835c3	jobqueue: improve logging Add job ID where it's missing	2021-11-16 19:16:34 +01:00
Diaa Sami	9c6438c8f4	jobqueue: include dependent job IDs when logging	2021-11-16 19:16:34 +01:00
Ondřej Budai	d3a3dbafed	jobqueue: add DequeueByID We will soon need to dequeue a job using its ID. This commit adds ability to do that to the Jobqueue interface. As always, the fsjobqueue implementation is slightly naive but it should fine for the usecases that it's designed for. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2021-11-14 10:17:03 +01:00
Ondřej Budai	2ecc48727f	fsjobqueue: factor out finished deps check Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2021-11-14 10:17:03 +01:00
Ondřej Budai	5f4db72777	fsjobqueue: do not delete empty channels Previously, we deleted empty channels when a job was dequeued. This is completely wrong because there still might be some clients waiting for a job. This commit removes the cleanup and adds a regression test. Note that this has the potential to leak memory if we ever use a lot of job types. Currently, we have just handful of them, so this is fine. Signed-off-by: Ondřej Budai <ondrej@budai.cz>	2021-11-14 10:17:03 +01:00
Diaa Sami	751ef84fc1	jobqueue: Better logging Use appropriate logging levels for different statements Log important jobqueue events with relevant information	2021-10-19 10:03:39 +01:00
sanne	d25ae71fef	worker: Configurable timeout for RequestJob This is backwards compatible, as long as the timeout is 0 (never timeout), which is the default. In case of the dbjobqueue the underlying timeout is due to context.Canceled, context.DeadlineExceeded, or net.Error with Timeout() true. For the fsjobqueue only the first two are considered.	2021-10-19 00:12:18 +01:00
sanne	9fab5def90	dbjobqueue: Reduce error noise in rollback check If the transaction is already closed don't log the rollback failure as an error, it means it was successfully committed.	2021-08-20 15:42:57 +02:00
Lars Karlitski	9c2c92f729	jobqueue: Introduce jobqueue backed by a postgres database Co-authored-by: sanne <sanne.raymaekers@gmail.com>	2021-07-28 21:52:31 +01:00
Lars Karlitski	871c6e9cbb	fsjobqueue: make canceling a finished job an error This mirrors FinishJob(), which also errors when the job is already finished.	2021-07-28 21:52:31 +01:00
Lars Karlitski	30492bfc60	jobqueue: move fsjobqueue's generic tests into new package fsjobqueue_test contained tests that are generically testing the JobQueue interface. Split those out into its own package `jobqueuetest`. These tests will be useful when implementing a new package that conforms to the JobQueue interface.	2021-07-28 21:52:31 +01:00
sanne	4385c39d66	worker: Introduce heartbeats An occupied worker checks about every 15 seconds if it's current job was cancelled. Use this to introduce a heartbeat mechanism, where if composer hasn't heard from the worker in 2 minutes, the job times out and is set to fail.	2021-07-08 21:14:38 +01:00
sanne	0fcb44e617	worker: Move job tokens to the queue itself This removes state from the worker server, as it no longer contains the list of running jobs. Instead only the queue knows if jobs are running or not.	2021-07-08 21:14:38 +01:00

1 2 3 4

168 commits