This commit keeps track of individual errors as curl will only report
the last download operation success/failure via it's exit code.
There is "--fail-early" but the downside of that is that abort all in
progress downloads too.
curl keeps a global parser state. This means that if there are
multiple "cacert =" values they are just overriden and the last
one wins. This is why the `test_curl_download_many_mixed_certs`
test did not work - the second `cacert = ` overwrites the previous
one.
To fix this we need to use `--next` when we need to change options
on a per url (like `cacert`) basis. With `--next` curl starts a
new parser state for the next url (but keeps the options for the
previous ones set). This commit does that in a slightly naive
way by just repeating our options for each url. Technically
we could sort the sources so that we have less repetition but
other then slightly smaller auto-generated files it has no
advantage.
With this commit the `test_curl_download_many_mixed_certs` test
works.
When investigating https://github.com/osbuild/osbuild-composer/pull/4247
we found that it would fail when a download required two sets of
`--cacert` keys. This commits adds a test for this that fails on
the centos9 7.76.1 version.
When using `--write-out` we are not using %{json} because older curl
(7.76) will write {"http_connect":000} which python cannot parse.
So we had a custom `--write-out` with `\1xc` as "record" separators
between the fields. This is a bit old-school and not very extensible
so Achilleas had the idea to still use json but "define" our own
subset via the variables that curl provides. This commit does that.
As part of the investigation of the CI failure in
https://github.com/osbuild/osbuild-composer/pull/4247
we noticed that curl can return a return_code of `0` even
when it did not downloaded all the urls in a `--config` provided
file. This seems to be curl version dependent, I had a hard
time writing a test-case with the real curl (8.6.0) that
reproduces this so I went with mocking it. We definietly saw
this failure with the centos 9 version (7.76).
Our current code is buggy and assumes that the exit status
of curl is always non-zero if any download fails but that is
only the case when `--fail-early` is used.
The extra paranoia will not hurt even when relying on the
exit code of curl is fixed.
Disable `curl --parallel` by default until the failure in
https://github.com/osbuild/osbuild-composer/pull/4247
is fully understood. It can be enabled via the environment:
```
OSBUILD_SOURCES_CURL_USE_PARALLEL=1
```
in the osbuild-composer test.
When using a modern curl we can download download multiple urls
in parallel which avoids connection setup overhead and is generally
more efficient. Use when it's detected.
TODO: ensure both old and new curl are tested automatically via
the testsuite.
Modern curl (7.68+) has a --parallel option that will download
multiple sources in parallel. This commit adds detection for this
feature as it is only available after RHEL 8.
In addition we need some more feature to properly support --parallel,
i.e. `--write-out` with json and exitcode options. This bumps the
requirements to 7.75+ which is still fine, centos9/RHEL9 have
7.76.
Instead of passing the url and options on the commandline this
commit moves it into a config file. This is not useful just yet
but it will be once we download multiple urls per curl instance.
Setting the user-agent using `--header` is broken in combination with
`--location`, `--proxy`, and an https endpoint which redirects. The
user-agent sent to the proxy changes after the client is redirected,
tripping up proxies.
For more information see https://issues.redhat.com/browse/RHEL-45364
Currently, osbuild downloads are identified as coming from `curl`. This
is unfortunate because some RPM mirrors block requests from curl. Let's
"fix" that by introducing our own user-agent. While this can certainly
be seen as "circumventing" a policy, I think that this change is
actually helpful: Now, the mirror maintainers can actually distinguish
osbuild requests from regular curl calls. If they want to block osbuild,
they certainly can, we have no power there, but at least this allows
more fine-grained filtering. Also, our new user-agent contains our
domain name, so if there's a problem, they can contact us.
The curl source is the only source left that uses "transform". And
here the name is very generic but in fact we only do a single thing:
we add secrets for subscriptions for for mtls to the download.
So rename to make it clear what this is all about.
This commit is somewhat poor, sorry for that. It mostly adds
workaround so that the osbuild sources can emit some progress
reporting as well. Without that the user experience is rather poor
and there is a long delay before any sort of progress can be
reported (even before the normal stages run).
With it the user experience is still not good but slightly better,
i.e. the progress monitor will report that the sources have
started downloading and curl will generated some log output. No
real progress unfortunately (sources subprogress will jump from
zero to 100%).
Add support for the `--insecure` curl flag, which makes curl skip the
verification step when making secure connections (e.g., https://).
This allows osbuild to download files from servers configured with
SSL/TLS but whose certificate cannot be validated.
This is supported for configuring repository sources in
osbuild-composer.
Before, the download method was defined in the inherited class of each
program. With the same kind of workflow redefined every time. This
contribution aims at making the workflow more clear and to generalize
what can be in the SourceService class.
The download worklow is as follow:
Setup -> Filter -> Prepare -> Download
The setup mainly step sets up caches. Where the download data will be
stored in the end.
The filter step is used to discard some of the items to download based
on some criterion. By default, it is used to verify if an item is
already in the cache using the item's checksum.
The Prepare step goes from each element and let the overloading step the
ability to alter each item before downloading it. This is used mainly
for the curl command which for rhel must generate the subscriptions.
Then the download step will call fetch_one for each item. Here the
download can be performed sequentially or in parallel depending on the
number of workers selected.
Introduce a new class member `content_type` that specifies what type of
items the source will store in the cache. Use that to generalize the
setup step, which is shared across all sources.
Some RPMs might be very large, and limiting the total download time
might lead to failed build even in cases where downloading is making
progress. Instead, set a minimum download speed (1kbps). If the
minimum is not surpassed for 30 seconds in a row, the download fails
and is retried. This follows the logic employed by DNF.
Adjust the number of retries to 10 and the connection timeout to 30,
in order to match what DNF does. One difference is that DNF does 10
retries across all downloads, whereas we do it per download, this
could be changed in a follow-up.
Old:
- a download taking more than 5 minutes is unconditionally aborted
New:
- slow but working downloads will never be aborted
- downloads will be stalled for at most five minutes
in total before being aborted
- time spent making progress does not count towards
the five minutes
Signed-off-by: Tom Gundersen <teg@jklm.no>
Port sources to also use the host services infrastructure that is
used by inputs, devices and mounts. Sources are a bit different
from the other services that they don't run for the duration of
the stage but are run before anything is built. By using the same
infrastructure we re-use the process management and inter process
communcation. Additionally, this will forward all messages from
sources to the existing monitoring framework.
Adapt all existing sources and tests.
This moves the check for already downloaded files earlier so
that if all files are already downloaded we don't need to
load the secrets.
This is faster, but also it allows a pre-seeded object store
to run the manifest on a system (like a VM) that isn't subscribed.
The previous version covered too few use cases, more specifically a
single subscription. That is of course not the case for many hosts, so
osbuild needs to understand subscriptions.
When running org.osbuild.curl source, read the
/etc/yum.repos.d/redhat.repo file and load the system subscriptions from
there. While processing each url, guess which subscription is tied to
the url and use the CA certificate, client certificate, and client key
associated with this subscription. It must be done this way because the
depsolving and fetching of RPMs may be performed on different hosts and
the subscription credentials are different in such case.
More detailed description of why this approach was chosen is available
in osbuild-composer git: https://github.com/osbuild/osbuild-composer/pull/1405
Now that there is a common utility function to verify the checksum
of a file, use that.
Also fix the json schema entry for the property to have to correct
minium and maximum digest length, given the supported algorithm,
which is 32 (md5) and 128 (sha512) characters.
Since the `sources.SourcesServer` has been removed, nothing is
using the export functionality anymore. Inputs are now used to
make content in the store available to stages. Remove all the
export logic from org.osbuild.curl.
The `org.osbuild.files` source provides files, but might in the
future not be the only one that does. Therefore rename it to
match the internal tool that is being used to fetch the files.
This is done for most other osbuild modules that target tools.
The format v1 loader is adapted to make this change transparent
for users of the v1 format, so we are backwards compatible.
Change the MPP depsolve preprocessor so that for format v2 based
manifest `org.osbuild.curl` source is used. Also rename the
corresponding source test. Adapt the format v2 mod test to use
the curl source.