By default, pgxpool.Pool has 4 connections (or number of cpus if higher).
Currently, we have 3 replicas, that means max 3*4=12 DB connections.
The dequeue operation is actually blocking - when a worker is waiting for
a job, one connection is blocked. My theory is that with 16 workers, we just
don't have enough connections that causes all sorts of weird slowdowns.
This commit bumps the number of connection from one replica to 10, therefore
we should be at 30 connections in total.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
RHEL 9.0 AWS API test is failing with
Host key verification failed.
This is probably due to a recent change in openssh deprecating rsa host
keys (or likely rsa keys in general).
- turn off StrictHostKeyChecking when checking groups
- use 'ed25519' type for user ssh keys
Signed-off-by: Achilleas Koutsou <achilleas@koutsou.net>
Workers now depsolve in parallel to image builds, so we can
again move depsolivng to the workers. This will help us deal
with increases in traffic as we currently only have one
depsolve handler per pod. It would also avoid any issues with
composer running out of disk space due to dnf metadata caches.
This reverts commit c65b1e9b26.
It no longer makes sense because:
- we don't make any changes to 8.5
- we don't regenerate test manifests for 8.5
- osbuild-composer for 8.5 is in the rhel-8.5.0 branch
Also, the latest-8.5.0 symlink was removed, which broke the CI.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
V2 is compliant with api.openshift.com design guidelines.
Errors are predefined, have codes, and are queryable.
All requests have an operationId set: a unique identifier which is
sortable by time. This is added to the response in case of an error.
All returned objects have the href, id, and kind field set.
2 configurations for the listeners are now possible:
- enableJWT=false with client ssl auth
- enableJWT=true with https
Actual verification of the tokens is handled by
https://github.com/openshift-online/ocm-sdk-go.
An authentication handler is run as the top level handler, before any
routing is done. Routes which do not require authentication should be
listed as exceptions.
Authentication can be restricted using an ACL file which allows
filtering based on JWT claims. For more information see the inline
comments in ocm-sdk/authentication.
As an added quirk the `-v` flag for the osbuild-composer executable was
changed to `-verbose` to avoid flag collision with glog which declares
the `-v` flag in the package `init()` function. The ocm-sdk depends on
glog and pulls it in.
The 'google-cloud-sdk' RPM built by Google for RHEL, which provides
the 'gcloud' command, is built only with Python 2. Since Python 2.7
is already EOL in upstream and not available in CentOS Stream 9, we
can not use 'gcloud' from the 'google-cloud-sdk' RPM.
The 'awscli' is not available in RHEL-9 repositories.
The Azure CLI 'az' available in official upstream repositories has
broken dependencies on RHEL-9 and can not be successfully installed. To
workaround the issue, run the tool from the official container image
provided by Microsoft.
Use the `quay.io/osbuild/cloud-tools` F34-based container image instead
of locally installed cloud CLI tools.
Signed-off-by: Tomas Hozza <thozza@redhat.com>
The RHEL-8.5 and RHEL-9.0 `ami` images are now based on the official
RHEL EC2 images. As a result, they use a different default user -
`ec2-user`.
Fix the `api.sh` test case to use the correct user when testing RHEL-9
`ami` images.
Fix#1632
Signed-off-by: Tomas Hozza <thozza@redhat.com>
Use TEST_ID for any resources created in Azure. Also create all
necessary vm network resources in advance to have predictable names
using TEST_ID as well.
RHEL 9.0 isn't yet in .gitlab-ci.yml so this actually doesn't change in test
runs but it should make enabling of the tests easier.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Redefine the `ami` image type in RHEL-8.5 to be based on RHEL
ec2 images. The pipeline has different default settings, therefore the
common "os" pipeline is not used. The RHEL ec2 images have a different
default size than the original `ami` image definition. The RHEL ec2
images use a different default partitioning scheme. Their configuration
is slightly different for each architecture and the x86_64 version
of the image does not support UEFI.
Update rpmrepo snapshots used to generate RHEL-8.5 x86_64 and aarch64
image test cases.
Signed-off-by: Tomas Hozza <thozza@redhat.com>
Cloud api now exposes user customization that let a customer able to add
a new user with a set of groups and a ssh key.
Testing:
* adds 2 users to the AWS image, accessible with a temp ssh key.
* the first one is in the group wheel, the other is not
Fixes#1574
Instead of inspecting the tarball directly, extract it and use ostree to
verify the ref and commit ID.
Adds some data to the CI artifacts directory:
- Build manifest
- Tarball file list for s3 edge commit with s3 upload
- Build metadata
Previously a bad error code was returned, fixes#1477.
Testing:
I have two test cases to test the solution. The first is a request that
makes depsolve crash by changing the dnf-json script by an almost empty
one that only throws an exception. The second one fails because it
requests a non existing package. The former ends with a 500 error and
the later with a 400.
----8<-----
HTTP/1.1 500 Internal Server Error
Failed to depsolve base packages for ami/x86_64/centos-8: ailed to
depsolve base packages for ami/x86_64/centos-8: unexpected end of JSON
input
----8<-----
HTTP/1.1 400 Bad Request
Content-Length: 226Failed to depsolve base packages for
ami/x86_64/centos-8: DNF error occured: MarkingErrors: Error occurred
when marking packages for installation: Problems in request:
missing packages: jesuisunpaquetquinexistepas_idonotexist
The `api.sh` test currently always defaults to "<REGION>-a" zone when
creating instance using the built image. The resources in a zone may get
exhausted and the solution is to use a different zone. Currently even a
CI job retry won't help with mitigation of such error during a CI run.
Modify `api.sh` to pick random GCP zone for a given region when creating
a compute instance. Use only GCP zones which are "UP".
The `cloud-cleaner` relied on the behavior of `api.sh` to always choose
the "<REGION>-a" zone. Guessing the chosen zone in `cloud-cleaner` is
not viable, but thankfully the instance name is by default unique for
the whole GCP project. Modify `cloud-cleaner` to iterate over all
available zones in the used region and try to delete the specific
instance in each of them.
Make `ComputeZonesInRegion` method from the `internal/cloud/gcp` package
exported and use it in `cloud-cleaner` for getting the list of available
zones in a region.
Signed-off-by: Tomas Hozza <thozza@redhat.com>
The `test/cases/api.sh` script relied on environment variables specific
to Jenkins for detecting it if is running in a CI environment. If this
was the case, it used other environment variables to construct a
predictable `TEST_ID` which could be used for names of resources created
in cloud-provider environment as part of the test. This is important to
ensure that `cloud-cleaner` can "guess" resource names and delete them
in case the test script fails to clean up after itself.
With the move from Jenkins to GitLab CI, this stopped to work and the
script started to generate random `TEST_ID`, which can not be guessed by
the `cloud-cleaner` tool.
Modify the `test/cases/api.sh` to detect the CI environment using the
`CI` environment variable, which is always predefined in the GitLab CI
environment [1].
[1] https://docs.gitlab.com/ee/ci/variables/predefined_variables.html
Signed-off-by: Tomas Hozza <thozza@redhat.com>
An occupied worker checks about every 15 seconds if it's current job was
cancelled. Use this to introduce a heartbeat mechanism, where if
composer hasn't heard from the worker in 2 minutes, the job times out
and is set to fail.