Transactions are tied to a connection so this is actually not a functional
change. Nevertheless, I think it's nice to explicitly state that we are
using a transaction.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Otherwise, there might be an already waiting dequeuer and if something is
enqueued before `sqlListen` is called, we will lost this notification.
Also, a small log message was added when shutting down the listener.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Call `Shutdown()` on all http servers. This means we will finish processing
any pending requests (including depsolving), but we will not listen to new
ones.
In particular, we will not answer to the readiness probe, so no new traffic
will be routed to this container.
Once all pending requests have been handled composer will shut down
gracefully and the liveness probe will return failure.
Note that in order for this to work correctly no requests should ever take longer
than the shutdown timeout (by default 30s).
This will allow us to use the service accounts which work against
identity.api.openshift.com. These are much easier to manage, especially
with the new multi-tenancy, as there's a single page to create/expire
them across an account.
They also have the added benefit of not expiring automatically when
they're not used like offline tokens, and immediate expiration when
desired.
Using the simplified installer we were experiencing slow system boots.
Turns out we're incurring into https://bugzilla.redhat.com/show_bug.cgi?id=1839923
This patch just drops the console kargs - to be aligned with the
anaconda installer that doesn't experience this slow down.
The slow down doesn't happen on virtual machines as there's always a
ttyS0 there
Signed-off-by: Antonio Murdaca <runcom@linux.com>
The repo name is already part of the `rpmmd.RepoConfig` structure. Do
not ignore when calling `dnf-json` and and pass it the value.
Signed-off-by: Tomas Hozza <thozza@redhat.com>
Previously, all dequeuers (goroutines waiting for a job to be dequeued) were
listening for new messages on postgres channel jobs (LISTEN jobs). This didn't
scale well as each dequeuer required to have its own DB connection and the
number of DB connections is hard-limited in the pool's config.
I changed the logic to work somewhat differently: dbjobqueue.New() now spawns
a goroutine that listens on the postgres channel. If there's a new message,
the goroutine just wakes up all dequeuers using a standard go channel.
Go channels are cheap so this should scale much better.
A test was added that confirms that 100 dequeuers are not a big deal now. This
test failed when I tried to run on it on the previous commit. I tried even 1000
locally and it was still fine.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Removing queued_at and started_at is pretty straightforward, it wasn't needed.
Removing token might seem concerning but basically we were just pulling
the same value from DB as we were pushing there. I think there's no value in
doing that.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
The `API.arch` member was (mostly) used to read the name of the
architecture.
The only non-name use was for the purposes of reading RPM repositories
from the configuration, in `reporegistry.ReposByArch()`, a thin wrapper
around `reporegistry.ReposByArchName()`.
Removing the `arch` member from the API and using the new `archName`
that is set up in the API constructor lets us control the arch name that
is set without relying on a valid `distro.Arch` object being available
(which would depend on having a valid `distro.Distro` object).
Replaced all calls to `ReposByArch()` with `ReposByArchName()` which
depends on the arch and distro name strings instead of a full
`distro.Arch`.
When the host distribution is not known or supported, instead of failing
with an error, print a warning to the log and initialise the API with
the architecture name and distro name.
This enables running the weldr API on unsupported distros for
cross-distro building.
Guards against a nil arch member when initialising the store.
Add an error object to the ComposeStatus.ImageStatus.
The error object contains a human-readable error reason
and optional details in the case of an error.
This commit adds a very in-depth test for multi-tenancy. It queues several
composes and then runs all jobs belonging to them while checking that
they are run by the correct tenant.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Yeah, we have TestRoute. It has one issue though: It doesn't have support
for passing a custom context. One option is to extend the method with yet
argument but since it already has 9 (!!!), this seems like a huge mess.
Therefore, I decided to invent a new small library for writing API tests.
It uses structs heavily which means that adding features to it doesn't
mean changing 100 lines of code (like adding another arg to TestRoute does).
I hope that we can start using this library more in our tests as it was
designed to be very flexible and powerfule.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
This commit implements multi-tenancy. A tenant is defined based on a value
from JWT claims. The key of this value must be specified in the configuration
file. This allows us to pick different values when using multiple SSOs.
Let me explain more in depth how this works:
Cloud API gets a new compose request. Firstly, it extracts a tenant name from
JWT claims. The considered claims are configured as an array in
cloud_api.jwt.tenant_provider_fields in composer's config file. The channel
name for all jobs belonging to this compose is created by `"org-" + tenant`.
Why is the channel prefixed by "org-"? To give us options in the future. I can
imagine the request having a channel override. This basically means that
multiple tenants can share a channel. A real use-case for this is multiple
Fedora projects sharing one pool of workers.
Why this commit adds a whole new cloud_api section to the config? Because the
current config is a mess and we should stop adding new stuff into the koji
section. As the Koji API is basically deprecated, we will need to remove it
soon nevertheless.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
jobqueue.Job must return the channel specified in jobqueue.Enqueue during
the whole lifecycle of the given job.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Channels are a concept similar to job types. Callers must specify a channel
name when queueing a new job. A list of channels is also specified when
dequeueing a job. The dequeued job's channel will always be from one of the
specified channel. Of course, the job types are also respected. The dequeued
job will also always be from one of the specified type.
Currently, all calls to jobqueue were changed so all queue operations use
an empty channel name and all dequeue operations use a list containing
an empty channel.
Thus, this is a non-functional change.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
Add a new option `GPGKeyFiles` to ImageConfig that indicates which files
containing GPG keys should be imported into rpm. For now it will be used
by the osPipeline in rhel{86,90} to set the corresponding option in the
`org.osbuild.rpm` stage.
The `org.osbuild.rpm` stage gained a new option `gpgkeys.fromtree`
which is a list of paths with files containing gpgkeys that will
be imported after the package installation phase is done.
rpmmd_mock fixture are complex and unneeded in the context of cloudapi, let's
just copy 3 lines from them and drop the dependency.
Signed-off-by: Ondřej Budai <ondrej@budai.cz>
The edge installer and edge simplified installer build roots have
diverged, i.e. the latter need clevis/luks, so define a new pkg
set for the simplified installer extending the edge installer one.
Co-Authored-By: Christian Kellner <christian@kellner.me>
This is only required in RHEL9.0, but best practice is to always pin these things
down. Also increases uniformity between distros.
Simplify a bit the volid generator by making it require `rand.Rand` rather than
`io.Reader`, and hence eliminating the need for error handling.
In LUKSContainer and LVMLogicalVolume we neglected to clone the Payload
which means we would modify the base PartitionTable when manipulating
the clone.
Specify a size for the root filesystem in the partition table,
which basically equates to a minimum size. In reality all image
types specify a larger image size and thus we enlarge the root
file system to more than the specified size for plain layouts.
But if we auto-convert an partition layout to LVM we need a size
for the root partition.
Does not change any existing manifests.
This does not apply for ostree based systems like the simplified
installer.
Whenever we create a new mountpoint due to a user customization,
ensure the layout uses LVM, i.e. convert plain layouts to it, if
needed. It uses the existing lvm-ification code but enhances it
so that we also create a `/boot` partition in case it does not
yet exist.
Adjust the existing tests that assumed we can not create more
than 4 partitions on mbr layouts, since that is now not true
anymore.
Since udev will probe block devices it is advisable to hold a lock
on the device when modifying its partition table or the superblock
of the filesystem (see [1]). osbuild loopback devices do support
this via the `lock` option. Set this option for all operation that
involve changing block device "metadata" that could potentionally
race with udev, such as sfdisk, mkfs, creating a luks2 container
and creating LVM2 volume groups and logical volumes.
NB: osbuild also has its own device inhibition logic to prevent
udev/lvm2 from auto activating devices and in general to limit the
interaction between the host and devices used by osbuild. See [2]
for more information.
NB: this also locks the loopback device in situation where we the
it is strickly not the right thing to do, e.g. when creating a fs
on a logical voume that is located on a loopback device, since in
this case the device we would need to lock is the logical volume.
Sadly, LVM/DM devices are exempt from block device locking. But,
due to a bug in osbuild < 50, the udev inhibitor does *not* work
for loopback devices and therefore we have to use the actual lock
to preven LVM device auto-activation via `69-dm-lvm-metad.rules`.
The change was implemented by adding a new boolean to `getDevices`
indicating if the loopback device should be locked or not. Once
we depend on osbuild 50 we can change the logic in `getDevices`
to only lock the loopback device if the number of devices is one,
i.e. we are working directly on the loopback device.
[1] https://systemd.io/BLOCK_DEVICE_LOCKING/
[2] /usr/lib/udev/rules.d/10-osbuild-inhibitor.rules
Whenever we create a new mountpoint due to a user customization,
ensure the layout uses LVM, i.e. convert plain layouts to it, if
needed. This does not apply to rpm-ostree based systems, e.g. the
simplified installer since they will be using LUKS in 9.0.
Add "lvm2" to the build pipeline and thus generate new manifests
and image infos.
Co-Authored-By: Achilleas Koutsou <achilleas@koutsou.net>
Specify a size for the root filesystem in the partition table,
which basically equates to a minimum size. In reality all image
types specify a lager image size and thus we enlarge the root
file system to more than the specified size for plain layouts.
But if we auto-convert an partiton layout to LVM we need a size
for the root partition.
Does not change any existing manifests.
This does not apply for ostree based systems like the simplified
installer.
Re-introduce the VolumeContainer interface but with a different
meaning: it is supposed to be implemented by all container that
contain volumes and as a result have themselves a size, like eg
LVM2, LUKS2 and PartitionTable (the latter is not yet included).
The sole method on the interface for now is MetadataSize, which
should return the metadata for the container itself.
Use that new `VolumeContainer.MetadataSize` method when we up-
date the sizes of elements in `resizeEntitybranch`.