This commit introduces the collection of error
metrics since it is now possible to differentiate
between internal errors and user input errors.
Additionally, the error status is reported for
job duration metrics.
For simplicity, the collection of the job metrics
was carried out in the job queue. This was only
being done in the dbqueue and not in the fsqueue.
This pr refactors the metric collection and moves
the job metrics to the worker server, by adding a
wrapper function to enqueueing jobs so that the
metrics only have to be recorded in one place when
queueing a job.
Refactor the current metric collection to make use
of re-usable functions, since some of the same queries
are repeated. This will also make it easier to move
the collection of metrics from the job queue.
The internal GCP package used `pkg.go.dev/google.golang.org/api` [1] to
interact with Compute Engine API. Modify the package to use the new and
idiomatic `pkg.go.dev/cloud.google.com/go` [2] library for interacting
with the Compute Engine API. The new library have been already used to
interact with the Cloudbuild and Storage APIs. The new library was not
used for Compute Engine since the beginning, because at that time, it
didn't support Compute Engine.
Update go.mod and vendored packages.
[1] https://github.com/googleapis/google-api-go-client
[2] https://github.com/googleapis/google-cloud-go
Signed-off-by: Tomas Hozza <thozza@redhat.com>
Disable loging in via password authentication since this is an
official Amazon marketplace requirement
Linux-based AMIs must not allow SSH password authentication.
Disable password authentication via your sshd_config file by
setting PasswordAuthentication to NO.
Section "Security policies" from
https://docs.aws.amazon.com/marketplace/latest/userguide/product-and-ami-policies.html
Disable loging in via password authentication since this is an
official Amazon marketplace requirement
Linux-based AMIs must not allow SSH password authentication.
Disable password authentication via your sshd_config file by
setting PasswordAuthentication to NO.
Section "Security policies" from
https://docs.aws.amazon.com/marketplace/latest/userguide/product-and-ami-policies.html
Disable loging in via password authentication since this is an
official Amazon marketplace requirement
Linux-based AMIs must not allow SSH password authentication.
Disable password authentication via your sshd_config file by
setting PasswordAuthentication to NO.
Section "Security policies" from
https://docs.aws.amazon.com/marketplace/latest/userguide/product-and-ami-policies.html
Clean up some implementation aspects of the Fedora distro definition:
- Do not have default Fedora distro version and use `fedora` as the
package name in all places that use it, instead of `fedora33`.
- Fix bugs when wrong (Fedora 33) values were returned by `OSTreeRef()`
and `Releasever()` for newer Fedora releases.
- Test Fedora 35 in package unit tests.
- Add unit test for `OSTreeRef()` method.
- Use architecture name constants from `distro` package, instead of
string literals.
Fix#1802
Signed-off-by: Tomas Hozza <thozza@redhat.com>
The QEMU assembler in Fedora distro definition for UEFI systems used
longer than allowed label for the VFAT filesystem of the EFI System
Partition. The maximum allowed label length is 11 characters.
This worked before with dosfstools, but in 2018, they added a label
validation [1]. This change got into the v4.2 release of dosfstools,
released in Jan 2021. And subsequently since F34, this new version of
dosfstools is present in Fedora repositories.
[1] ca54953476
Signed-off-by: Tomas Hozza <thozza@redhat.com>
Enable F34 testing on AWS as there is nothing blocking it. F34 is not
yet supported on `rhos-01` as there is no runner definition.
Remove F33 repositories for testing and add repo definitions for F34 and
F35.
Signed-off-by: Tomas Hozza <thozza@redhat.com>
There's conflicting ansible versions in the 86 nightlies and epel. There
should be a correct combination of plugins which fixes the callback on
86. But let's drop it to unblock for now.
`json_query` requires python3-jmespath which, while available in the
repos, it can sometimes cause issues when the ansible interpreter is
different from the system interpreter.
The `json_query` is only used in a handful of locations that can easily
be served by `jq`, which we use in other places already.
This is for now only supported for koji builds.
A lot of code moved around for this, but functionally not much changed. PostCompose() now only parses the input, and queueing of all the jobs have been factored out in separate functions. PostCompose() is mostly agnostic to koji/non-koji requests.
Replace Job() and JobStatus() with typesafe versions, and introduce JobType()
for the rare instances where we don't know the type up front.
Additionally, catch a few more error cases:
- if OSBuildResult is nil, then we failed to invoke osbuild
- make sure the same JobResult handling is done for osbuild-koji, as for osbuild
Before accessing a field of the `OSBuildOutput`, which itself is a
field of the `osbuildKojiResults` struct, check if it is actually
is set (non-nill), otherwise dereferencing it will crash the
worker.
The field will be null if osbuild has not been invoked at all or
if osbuild crashed or refused to accept the input.
The manifest is of type distro.Manifest, which is an alias for a
byte array, i.e. it is already in marshalled form. There is no
need to marshal it again before passing it to osbuild.
This only extends the API, the backend can still only deal with composes of a single build.
I aimed to keep the API practically backwards compatible, i.e., no current consumer of it should notice the change. I hope I didn't mess that up.
fixup: image statuses
In addition to individual image status, have an
overall status that captures success or failure
of the compose as a whole.
This is not as fine grained, and only distinguishes
between "pending", "failure" and "success".
This captures other jobs than the image builds, which
is relevant for the koji composes, which consists also
of koji-init and koji-finalize, in addition to the build
jobs.
For now upload requests are required if and only if we are not
using koji. When using the koji integration the produced artifacts
are uploaded to koji only. In the future we may want to support
also uploading to the cloud providers.
Extend the compose endpoints to have minimal koji support.
This is intended to replace the current koji API so that it
can be consumed through api.openshift.com.
We may need to use several SSO providers, so extend our
configuration to allow that.
Based on PoC from Sanne:
```
package main
import (
"net/http"
"log"
"github.com/openshift-online/ocm-sdk-go/authentication"
"github.com/openshift-online/ocm-sdk-go/logging"
)
type H struct{}
func (h *H) ServeHTTP(w http.ResponseWriter, r *http.Request) {
log.Println("HURRAY")
}
func main() {
logBuilder := logging.NewGoLoggerBuilder()
logger, err := logBuilder.Build()
if err != nil {
panic(err)
}
aH, err := authentication.NewHandler().
KeysURL("https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/certs").
KeysURL("https://identity.api.openshift.com/auth/realms/rhoas/protocol/openid-connect/certs").
Logger(logger).Next(&H{}).Build()
if err != nil {
panic(err)
}
log.Fatal(http.ListenAndServe(":8080", aH))
}
```
It should be `1048576` (exactly 512 MiB), like it is for all other
distributions. It somehow got mingled in when the distribution was
forked off from 8.5/9.0 beta (1048676 to 1048576 strongly suggests
a sed command was involved, so we blame that).
When we compute the overall status of a koji compose, the individual
build jobs are checked. Currently, a job is considered a failure, if
a build job has output (`OSBuildOutput`) and the output's `Success`
field is `false`. But `OSBuildOutput` will be `nil` when osbuild
crashed or refused the manifest input. Therefore the job status is
a failure if `OSBuildOutput` is `nil`, since if osbuild was run,
and the run was successful we must have a non-`OSBuildOutput` field.
This check got inverted during the work on "Worker errors backwards
compatibility". As a consequence, osbuild was never run and the
result structure `buildResult.OSBuildOutput` was `nil` Since the
overall status reporting is not complete, and does not take this,
i.e. `buildResult.OSBuildOutput`, being `nil` as an error case,
the overall status was reported as "success". See the function
`composeStatusFromJobStatus` in `internal/kojiapi/server.go`.
Add a special cases for the root user to the work-around for ssh keys in
OSTree commits.
See 93e54cd872 for the original,
equivalent change in RHEL 8.6.
koji depends on uccr/kerbi go module which depends on kerberos C header
files. Make sure new commers know about what to install in order to
compile the project.
Signed-off-by: Roy Golan <rgolan@redhat.com>