worker: Introduce heartbeats
An occupied worker checks about every 15 seconds if it's current job was cancelled. Use this to introduce a heartbeat mechanism, where if composer hasn't heard from the worker in 2 minutes, the job times out and is set to fail.
This commit is contained in:
parent
0fcb44e617
commit
4385c39d66
6 changed files with 166 additions and 46 deletions
9
docs/news/unreleased/worker-heartbeat.md
Normal file
9
docs/news/unreleased/worker-heartbeat.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Workers: heartbeat
|
||||
|
||||
Workers check in with composer every 15 seconds to see if their job hasn't been
|
||||
cancelled. We can use this to introduce a heartbeat. If the worker fails to
|
||||
check in for over 2 minutes, composer assumes the worker crashed or was stopped,
|
||||
marking the job as failed.
|
||||
|
||||
This will mitigate the issue where jobs who had their worker crash or stopped,
|
||||
would remain in a 'building' state forever.
|
||||
Loading…
Add table
Add a link
Reference in a new issue