worker/server: requeue unresponsive jobs

If a job is unresponsive the worker has most likely crashed or been shut down and the in-progress job been lost. Instead of failing these jobs, requeue them up to two times. Once a job is lost a third time it fails. This avoids infinite loops. This is implemented by extending FinishJob to RequeuOrFinish job. It takes a max number of requeues as an argument, and if that is 0, it has the same behavior as FinishJob used to have. If the maximum number of requeues has not yet been reached, then the running job is returned to pending state to be picked up again.
2022-03-18 21:39:32 +00:00 · 2022-03-18 21:39:32 +00:00 · 626530818d
commit 626530818d
parent d02f666a4b
8 changed files with 216 additions and 61 deletions
--- a/test/cases/api.sh
+++ b/test/cases/api.sh
@ -481,11 +481,29 @@ jq '.customizations.packages = [ "jesuisunpaquetquinexistepas" ]' "$REQUEST_FILE
 sendCompose "$REQUEST_FILE2"
 waitForState "failure"

-# crashed/stopped/killed worker should result in a failed state
+# crashed/stopped/killed worker should result in the job being retried
 sendCompose "$REQUEST_FILE"
 waitForState "building"
 sudo systemctl stop "osbuild-remote-worker@*"
-waitForState "failure"
+RETRIED=0
+for RETRY in {1..10}; do
+    ROWS=$(sudo ${CONTAINER_RUNTIME} exec "${DB_CONTAINER_NAME}" psql -U postgres -d osbuildcomposer -c \
+                "SELECT retries FROM jobs WHERE id = '$COMPOSE_ID' AND retries = 1")
+    if grep -q "1 row" <<< "$ROWS"; then
+        RETRIED=1
+        break
+    else
+        echo "Waiting until job is retried ($RETRY/10)"
+        sleep 30
+    fi
+done
+if [ "$RETRIED" != 1 ]; then
+    echo "Job $COMPOSE_ID wasn't retried after killing the worker"
+    exit 1
+fi
+# remove the job from the queue so the worker doesn't pick it up again
+sudo ${CONTAINER_RUNTIME} exec "${DB_CONTAINER_NAME}" psql -U postgres -d osbuildcomposer -c \
+     "DELETE FROM jobs WHERE id = '$COMPOSE_ID'"
 sudo systemctl start "osbuild-remote-worker@localhost:8700.service"

 # full integration case