This fixes a race condition where the context might have been canceled
or timed out in between the preliminary check and trying to dequeue, and
consequently returning the wrong error.
Instead of doing the preliminary check, just check for the context
errors when trying to dequeue.
This fixes a bug where the listen function would keep trying to use a
closed, unrecoverable connection to listen for a notification. This
continued failing, which essentially made the queue instance useless.
If a job is unresponsive the worker has most likely crashed or been shut
down and the in-progress job been lost.
Instead of failing these jobs, requeue them up to two times. Once a job is lost
a third time it fails. This avoids infinite loops.
This is implemented by extending FinishJob to RequeuOrFinish job. It takes a
max number of requeues as an argument, and if that is 0, it has the same
behavior as FinishJob used to have.
If the maximum number of requeues has not yet been reached, then the running
job is returned to pending state to be picked up again.
The connection can be briefly interrupted for short periods of time,
which is fine. To avoid spamming the logs with error messages, backoff
for half a second.
Currently there is a race condition that occurs between the
dbjobqueue enqueue and dequeue functions. Both queries make
use of the postgres `now()` timestamp function which returns
the timestamp of when the transaction started and not when
the statement is executed. The result of this is a timestamp
for a job's `started_at` field to be earlier than its
`queued_at` field causing a constraint violation.
Since the dequeue query will never be executed before the
enqueue query, changing the postgres timestamp function to
`statement_timestamp()` resolves this issue.
errors.As is meant to check whether err (or other error in its chain) can
be assigned to the value that target is pointing at.
Let's consider this example:
errors.As(err, &pgx.ErrNoRows)
pgx.ErrNoRows (and pgx.ErrTxClosed) is typed as error, thus in all
errors.As calls, the target is typed as *error. Err is always an error.
So this call is basically asking whether error can be assigned to error.
If err != nil, this is always true, thus this check doesn't make any sense
over a plain err != nil.
Go 1.19 now checks this issue and if it's found, it refuses to compile the
code, see:
https://go-review.googlesource.com/c/tools/+/339889
This commit changes usages of errors.As() to errors.Is(). The Is() method
doesn't check assignability but equality (the only different between Is()
and a plain old == operator is that Is() also inspects the whole error chain).
This fixes the check because now, we are basically checking if err (or
any other error in its chain) == pgx.ErrTxClosed which is exactly what we
want.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
This introduces an expiry date (default: 14 days from insert date) and
adjust the service-maintenance script to delete jobs that are older than
the expiration date.