Commit graph

38 commits

Author SHA1 Message Date
Michael Vogt
b4b865fddf Revert "sources(curl): disable curl --parallel by default"
This reverts commit 9bef57d5a6.
2024-08-28 07:46:37 +02:00
Michael Vogt
a229d46b1e sources(curl): manually keep track of failed URLs
This commit keeps track of individual errors as curl will only report
the last download operation success/failure via it's exit code.

There is "--fail-early" but the downside of that is that abort all in
progress downloads too.
2024-08-28 07:46:37 +02:00
Michael Vogt
a50dbb14c2 sources(curl): use --next for each url in curl config
curl keeps a global parser state. This means that if there are
multiple "cacert =" values they are just overriden and the last
one wins. This is why the `test_curl_download_many_mixed_certs`
test did not work - the second `cacert = ` overwrites the previous
one.

To fix this we need to use `--next` when we need to change options
on a per url (like `cacert`) basis. With `--next` curl starts a
new parser state for the next url (but keeps the options for the
previous ones set). This commit does that in a slightly naive
way by just repeating our options for each url. Technically
we could sort the sources so that we have less repetition but
other then slightly smaller auto-generated files it has no
advantage.

With this commit the `test_curl_download_many_mixed_certs` test
works.
2024-08-28 07:46:37 +02:00
Michael Vogt
6ccd5d5cfe test: add test that download mixed https content
When investigating https://github.com/osbuild/osbuild-composer/pull/4247
we found that it would fail when a download required two sets of
`--cacert` keys. This commits adds a test for this that fails on
the centos9 7.76.1 version.
2024-08-28 07:46:37 +02:00
Michael Vogt
46db834dee sources(curl): use json like output inside of custom record
When using `--write-out` we are not using %{json} because older curl
(7.76) will write {"http_connect":000} which python cannot parse.

So we had a custom `--write-out` with `\1xc` as "record" separators
between the fields. This is a bit old-school and not very extensible
so Achilleas had the idea to still use json but "define" our own
subset via the variables that curl provides. This commit does that.
2024-07-30 11:12:03 +02:00
Michael Vogt
16667ef260 sources(curl): error if curl exists 0 but there are downloads left
As part of the investigation of the CI failure in
https://github.com/osbuild/osbuild-composer/pull/4247
we noticed that curl can return a return_code of `0` even
when it did not downloaded all the urls in a `--config` provided
file. This seems to be curl version dependent, I had a hard
time writing a test-case with the real curl (8.6.0) that
reproduces this so I went with mocking it. We definietly saw
this failure with the centos 9 version (7.76).

Our current code is buggy and assumes that the exit status
of curl is always non-zero if any download fails but that is
only the case when `--fail-early` is used.

The extra paranoia will not hurt even when relying on the
exit code of curl is fixed.
2024-07-17 11:39:35 +02:00
Michael Vogt
9bef57d5a6 sources(curl): disable curl --parallel by default
Disable `curl --parallel` by default until the failure in
https://github.com/osbuild/osbuild-composer/pull/4247

is fully understood. It can be enabled via the environment:
```
OSBUILD_SOURCES_CURL_USE_PARALLEL=1
```
in the osbuild-composer test.
2024-07-08 18:00:59 +02:00
Michael Vogt
4697a3fb84 sources: do not use %{json} when generating curl output
We cannot use `curl --write-out %{json}` because older curl
(7.76 from RHEL9/Centos9) will write `{"http_connect":000}`
which python cannot parse.
2024-07-04 11:53:40 +02:00
Michael Vogt
018c15aae8 sources: run all tests for curl with both old and new curl
To ensure there are no regressions with the old curl make
sure to run all tests that fetch_all() with both old and
new curl.
2024-07-04 11:53:40 +02:00
Michael Vogt
0d3a153c78 sources: add new _fetch_all_new_curl() helper
When using a modern curl we can download download multiple urls
in parallel which avoids connection setup overhead and is generally
more efficient. Use when it's detected.

TODO: ensure both old and new curl are tested automatically via
the testsuite.
2024-07-04 11:53:40 +02:00
Michael Vogt
974c8adff9 source: add helper to detect if curl parallel download is available
Modern curl (7.68+) has a --parallel option that will download
multiple sources in parallel. This commit adds detection for this
feature as it is only available after RHEL 8.

In addition we need some more feature to properly support --parallel,
i.e. `--write-out` with json and exitcode options. This bumps the
requirements to 7.75+ which is still fine, centos9/RHEL9 have
7.76.
2024-07-04 11:53:40 +02:00
Michael Vogt
d20713d7af curl: add gen_curl_download_config() and use in download
Instead of passing the url and options on the commandline this
commit moves it into a config file. This is not useful just yet
but it will be once we download multiple urls per curl instance.
2024-07-04 11:53:40 +02:00
Sanne Raymaekers
2e5a9335c9 sources/curl: use --user-agent option to set the user-agent
Setting the user-agent using `--header` is broken in combination with
`--location`, `--proxy`, and an https endpoint which redirects. The
user-agent sent to the proxy changes after the client is redirected,
tripping up proxies.

For more information see https://issues.redhat.com/browse/RHEL-45364
2024-07-02 16:15:56 +02:00
Ondřej Budai
af0e849081 sources/curl: Use our own User-Agent
Currently, osbuild downloads are identified as coming from `curl`. This
is unfortunate because some RPM mirrors block requests from curl. Let's
"fix" that by introducing our own user-agent. While this can certainly
be seen as "circumventing" a policy, I think that this change is
actually helpful: Now, the mirror maintainers can actually distinguish
osbuild requests from regular curl calls. If they want to block osbuild,
they certainly can, we have no power there, but at least this allows
more fine-grained filtering. Also, our new user-agent contains our
domain name, so if there's a problem, they can contact us.
2024-04-30 03:10:44 +02:00
Sanne Raymaekers
b90a5027dc sources(curl): set HTTP proxy through the environment 2024-04-08 11:56:05 +02:00
Andre Marianiello
7e0e30fd8f curl: fix RHSM url retrieval 2024-03-29 13:02:11 +01:00
Michael Vogt
352bf5cd52 curl: rename "transform" to "amend_secrets"
The curl source is the only source left that uses "transform". And
here the name is very generic but in fact we only do a single thing:
we add secrets for subscriptions for for mtls to the download.

So rename to make it clear what this is all about.
2024-03-19 14:21:57 +01:00
Michael Vogt
f214c69a98 osbuild: add workaround to integrate sources into progress reporting
This commit is somewhat poor, sorry for that. It mostly adds
workaround so that the osbuild sources can emit some progress
reporting as well. Without that the user experience is rather poor
and there is a long delay before any sort of progress can be
reported (even before the normal stages run).

With it the user experience is still not good but slightly better,
i.e. the progress monitor will report that the sources have
started downloading and curl will generated some log output. No
real progress unfortunately (sources subprogress will jump from
zero to 100%).
2024-03-12 16:44:12 +01:00
Sanne Raymaekers
29159189f1 sources/curl: add org.osbuild.mtls secrets support
If `org.osbuild.mtls` is passed as a secret name, look for the mtls data
in the environment.
2024-03-11 11:09:37 +01:00
Simon de Vlieger
f9b55ff6a0 sources: rename download -> fetch_all
Not all sources download things and `fetch_all` is consistent with
`fetch_one`.
2024-01-26 09:58:48 +01:00
Simon de Vlieger
2c42c46c48 sources: move parallelisation into source
This moves the parallelisation decisions into the sources themselves,
making the `download` method abstract inside `osbuild` itself.
2024-01-26 09:58:48 +01:00
Simon de Vlieger
ea6085fae6 osbuild: run isort on all files 2022-09-12 13:32:51 +02:00
Achilleas Koutsou
f699720dbd sources/curl: quote URL paths before downloading
Some package versions [1] can contain carets and other characters that
curl doesn't like.  These need to be URL encoded.

Interestingly, the documented way of replacing components in a parsed
URL from urllib in Python is by calling the (seemingly private)
`_replace()` method [2].

[1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_snapshots
[2] https://docs.python.org/3/library/urllib.parse.html#url-parsing
2022-08-31 22:28:54 +01:00
Simon de Vlieger
3fd864e5a9 osbuild: fix optional-types
Optional types were provided in places but were not always correct. Add
mypy checking and fix those that fail(ed).
2022-07-13 17:31:37 +02:00
Achilleas Koutsou
c8073b5836 sources: support calling curl with --insecure
Add support for the `--insecure` curl flag, which makes curl skip the
verification step when making secure connections (e.g., https://).
This allows osbuild to download files from servers configured with
SSL/TLS but whose certificate cannot be validated.

This is supported for configuring repository sources in
osbuild-composer.
2022-06-14 22:13:39 +02:00
Simon de Vlieger
7b0e1fe5fd sources: curl max_workers 2 * num_cpus
This changes the curl source to use the number of cpus times two
for its thread count. A conservative number but a commonly used
default.
2022-05-24 19:45:23 +02:00
Thomas Lavocat
1de74ce2c9 sources: generalizing download method
Before, the download method was defined in the inherited class of each
program. With the same kind of workflow redefined every time. This
contribution aims at making the workflow more clear and to generalize
what can be in the SourceService class.

The download worklow is as follow:
Setup -> Filter -> Prepare -> Download

The setup mainly step sets up caches. Where the download data will be
stored in the end.

The filter step is used to discard some of the items to download based
on some criterion. By default, it is used to verify if an item is
already in the cache using the item's checksum.

The Prepare step goes from each element and let the overloading step the
ability to alter each item before downloading it. This is used mainly
for the curl command which for rhel must generate the subscriptions.

Then the download step will call fetch_one for each item. Here the
download can be performed sequentially or in parallel depending on the
number of workers selected.
2022-05-11 04:32:42 -05:00
Thomas Lavocat
128845da3c sources: tidy the download method
Only the "items to download" need to be passed as parameters. The rest
is unpacked as attributes during the Setup step of the workflow.
2022-05-11 04:32:42 -05:00
Thomas Lavocat
92fe237f24 sources: introduce per-source content_type
Introduce a new class member `content_type` that specifies what type of
items the source will store in the cache. Use that to generalize the
setup step, which is shared across all sources.
2022-05-11 04:32:42 -05:00
Thomas Lavocat
34cd9ef9f0 sources: generalize cache generation
Introduce a `setup` step in the workflow that is responsible of
generating the cache folder. This is then used in each download method.
2022-05-11 04:32:42 -05:00
Tom Gundersen
e175529f7c sources/curl: don't limit total download time
Some RPMs might be very large, and limiting the total download time
might lead to failed build even in cases where downloading is making
progress. Instead, set a minimum download speed (1kbps). If the
minimum is not surpassed for 30 seconds in a row, the download fails
and is retried. This follows the logic employed by DNF.

Adjust the number of retries to 10 and the connection timeout to 30,
in order to match what DNF does. One difference is that DNF does 10
retries across all downloads, whereas we do it per download, this
could be changed in a follow-up.

Old:
 - a download taking more than 5 minutes is unconditionally aborted

New:
 - slow but working downloads will never be aborted
 - downloads will be stalled for at most five minutes
   in total before being aborted
 - time spent making progress does not count towards
   the five minutes

Signed-off-by: Tom Gundersen <teg@jklm.no>
2022-03-16 14:48:03 +01:00
Christian Kellner
c902a7a754 sources: port to host services
Port sources to also use the host services infrastructure that is
used by inputs, devices and mounts. Sources are a bit different
from the other services that they don't run for the duration of
the stage but are run before anything is built. By using the same
infrastructure we re-use the process management and inter process
communcation. Additionally, this will forward all messages from
sources to the existing monitoring framework.
Adapt all existing sources and tests.
2021-09-22 00:00:20 +02:00
Alexander Larsson
072b75d78e org.osbuild.curl: Don't load secrets if not needed
This moves the check for already downloaded files earlier so
that if all files are already downloaded we don't need to
load the secrets.

This is faster, but also it allows a pre-seeded object store
to run the manifest on a system (like a VM) that isn't subscribed.
2021-09-22 00:00:20 +02:00
Martin Sehnoutka
ee3760e1ba sources/curl: Implement new way of getting RHSM secrets
The previous version covered too few use cases, more specifically a
single subscription. That is of course not the case for many hosts, so
osbuild needs to understand subscriptions.

When running org.osbuild.curl source, read the
/etc/yum.repos.d/redhat.repo file and load the system subscriptions from
there. While processing each url, guess which subscription is tied to
the url and use the CA certificate, client certificate, and client key
associated with this subscription. It must be done this way because the
depsolving and fetching of RPMs may be performed on different hosts and
the subscription credentials are different in such case.

More detailed description of why this approach was chosen is available
in osbuild-composer git: https://github.com/osbuild/osbuild-composer/pull/1405
2021-06-04 18:23:05 +01:00
Christian Kellner
3ebfc6f657 sources/curl: use util.checksum.verify_file
Now that there is a common utility function to verify the checksum
of a file, use that.
Also fix the json schema entry for the property to have to correct
minium and maximum digest length, given the supported algorithm,
which is 32 (md5) and 128 (sha512) characters.
2021-05-12 14:26:16 +02:00
Christian Kellner
518940cfe0 sources/curls: refactor downloading code
Now that the `export` functionality is gone, the download code
can be simplified, since we are not downloading a subset of the
urls, but all of them.
2021-04-29 12:58:01 +02:00
Christian Kellner
5c19360cbe sources/curl: remove export functionality
Since the `sources.SourcesServer` has been removed, nothing is
using the export functionality anymore. Inputs are now used to
make content in the store available to stages. Remove all the
export logic from org.osbuild.curl.
2021-04-29 12:58:01 +02:00
Christian Kellner
81c8374d3e sources: rename org.osbuild.{files -> curl}
The `org.osbuild.files` source provides files, but might in the
future not be the only one that does. Therefore rename it to
match the internal tool that is being used to fetch the files.
This is done for most other osbuild modules that target tools.

The format v1 loader is adapted to make this change transparent
for users of the v1 format, so we are backwards compatible.

Change the MPP depsolve preprocessor so that for format v2 based
manifest `org.osbuild.curl` source is used. Also rename the
corresponding source test. Adapt the format v2 mod test to use
the curl source.
2021-02-12 19:27:08 +01:00
Renamed from sources/org.osbuild.files (Browse further)