worker: Introduce heartbeats

An occupied worker checks about every 15 seconds if it's current job was
cancelled. Use this to introduce a heartbeat mechanism, where if
composer hasn't heard from the worker in 2 minutes, the job times out
and is set to fail.
This commit is contained in:
sanne 2021-07-05 11:45:16 +02:00 committed by Tom Gundersen
parent 0fcb44e617
commit 4385c39d66
6 changed files with 166 additions and 46 deletions

View file

@ -0,0 +1,9 @@
# Workers: heartbeat
Workers check in with composer every 15 seconds to see if their job hasn't been
cancelled. We can use this to introduce a heartbeat. If the worker fails to
check in for over 2 minutes, composer assumes the worker crashed or was stopped,
marking the job as failed.
This will mitigate the issue where jobs who had their worker crash or stopped,
would remain in a 'building' state forever.