debian-forge

Author	SHA1	Message	Date
Michael Vogt	b4b865fddf	Revert "sources(curl): disable `curl --parallel` by default" This reverts commit `9bef57d5a6`.	2024-08-28 07:46:37 +02:00
Michael Vogt	a229d46b1e	sources(curl): manually keep track of failed URLs This commit keeps track of individual errors as curl will only report the last download operation success/failure via it's exit code. There is "--fail-early" but the downside of that is that abort all in progress downloads too.	2024-08-28 07:46:37 +02:00
Michael Vogt	a50dbb14c2	sources(curl): use `--next` for each url in curl config curl keeps a global parser state. This means that if there are multiple "cacert =" values they are just overriden and the last one wins. This is why the `test_curl_download_many_mixed_certs` test did not work - the second `cacert = ` overwrites the previous one. To fix this we need to use `--next` when we need to change options on a per url (like `cacert`) basis. With `--next` curl starts a new parser state for the next url (but keeps the options for the previous ones set). This commit does that in a slightly naive way by just repeating our options for each url. Technically we could sort the sources so that we have less repetition but other then slightly smaller auto-generated files it has no advantage. With this commit the `test_curl_download_many_mixed_certs` test works.	2024-08-28 07:46:37 +02:00
Michael Vogt	6ccd5d5cfe	test: add test that download mixed https content When investigating https://github.com/osbuild/osbuild-composer/pull/4247 we found that it would fail when a download required two sets of `--cacert` keys. This commits adds a test for this that fails on the centos9 7.76.1 version.	2024-08-28 07:46:37 +02:00
Michael Vogt	46db834dee	sources(curl): use json like output inside of custom record When using `--write-out` we are not using %{json} because older curl (7.76) will write {"http_connect":000} which python cannot parse. So we had a custom `--write-out` with `\1xc` as "record" separators between the fields. This is a bit old-school and not very extensible so Achilleas had the idea to still use json but "define" our own subset via the variables that curl provides. This commit does that.	2024-07-30 11:12:03 +02:00
Michael Vogt	16667ef260	sources(curl): error if curl exists 0 but there are downloads left As part of the investigation of the CI failure in https://github.com/osbuild/osbuild-composer/pull/4247 we noticed that curl can return a return_code of `0` even when it did not downloaded all the urls in a `--config` provided file. This seems to be curl version dependent, I had a hard time writing a test-case with the real curl (8.6.0) that reproduces this so I went with mocking it. We definietly saw this failure with the centos 9 version (7.76). Our current code is buggy and assumes that the exit status of curl is always non-zero if any download fails but that is only the case when `--fail-early` is used. The extra paranoia will not hurt even when relying on the exit code of curl is fixed.	2024-07-17 11:39:35 +02:00
Michael Vogt	9bef57d5a6	sources(curl): disable `curl --parallel` by default Disable `curl --parallel` by default until the failure in https://github.com/osbuild/osbuild-composer/pull/4247 is fully understood. It can be enabled via the environment: ``` OSBUILD_SOURCES_CURL_USE_PARALLEL=1 ``` in the osbuild-composer test.	2024-07-08 18:00:59 +02:00
Michael Vogt	4697a3fb84	sources: do not use `%{json}` when generating curl output We cannot use `curl --write-out %{json}` because older curl (7.76 from RHEL9/Centos9) will write `{"http_connect":000}` which python cannot parse.	2024-07-04 11:53:40 +02:00
Michael Vogt	018c15aae8	sources: run all tests for curl with both old and new curl To ensure there are no regressions with the old curl make sure to run all tests that fetch_all() with both old and new curl.	2024-07-04 11:53:40 +02:00
Michael Vogt	0d3a153c78	sources: add new _fetch_all_new_curl() helper When using a modern curl we can download download multiple urls in parallel which avoids connection setup overhead and is generally more efficient. Use when it's detected. TODO: ensure both old and new curl are tested automatically via the testsuite.	2024-07-04 11:53:40 +02:00
Michael Vogt	974c8adff9	source: add helper to detect if curl parallel download is available Modern curl (7.68+) has a --parallel option that will download multiple sources in parallel. This commit adds detection for this feature as it is only available after RHEL 8. In addition we need some more feature to properly support --parallel, i.e. `--write-out` with json and exitcode options. This bumps the requirements to 7.75+ which is still fine, centos9/RHEL9 have 7.76.	2024-07-04 11:53:40 +02:00
Michael Vogt	d20713d7af	curl: add gen_curl_download_config() and use in download Instead of passing the url and options on the commandline this commit moves it into a config file. This is not useful just yet but it will be once we download multiple urls per curl instance.	2024-07-04 11:53:40 +02:00
Sanne Raymaekers	2e5a9335c9	sources/curl: use `--user-agent` option to set the user-agent Setting the user-agent using `--header` is broken in combination with `--location`, `--proxy`, and an https endpoint which redirects. The user-agent sent to the proxy changes after the client is redirected, tripping up proxies. For more information see https://issues.redhat.com/browse/RHEL-45364	2024-07-02 16:15:56 +02:00
Ondřej Budai	af0e849081	sources/curl: Use our own User-Agent Currently, osbuild downloads are identified as coming from `curl`. This is unfortunate because some RPM mirrors block requests from curl. Let's "fix" that by introducing our own user-agent. While this can certainly be seen as "circumventing" a policy, I think that this change is actually helpful: Now, the mirror maintainers can actually distinguish osbuild requests from regular curl calls. If they want to block osbuild, they certainly can, we have no power there, but at least this allows more fine-grained filtering. Also, our new user-agent contains our domain name, so if there's a problem, they can contact us.	2024-04-30 03:10:44 +02:00
Sanne Raymaekers	b90a5027dc	sources(curl): set HTTP proxy through the environment	2024-04-08 11:56:05 +02:00
Andre Marianiello	7e0e30fd8f	curl: fix RHSM url retrieval	2024-03-29 13:02:11 +01:00
Michael Vogt	352bf5cd52	curl: rename "transform" to "amend_secrets" The curl source is the only source left that uses "transform". And here the name is very generic but in fact we only do a single thing: we add secrets for subscriptions for for mtls to the download. So rename to make it clear what this is all about.	2024-03-19 14:21:57 +01:00
Michael Vogt	f214c69a98	osbuild: add workaround to integrate sources into progress reporting This commit is somewhat poor, sorry for that. It mostly adds workaround so that the osbuild sources can emit some progress reporting as well. Without that the user experience is rather poor and there is a long delay before any sort of progress can be reported (even before the normal stages run). With it the user experience is still not good but slightly better, i.e. the progress monitor will report that the sources have started downloading and curl will generated some log output. No real progress unfortunately (sources subprogress will jump from zero to 100%).	2024-03-12 16:44:12 +01:00
Sanne Raymaekers	29159189f1	sources/curl: add org.osbuild.mtls secrets support If `org.osbuild.mtls` is passed as a secret name, look for the mtls data in the environment.	2024-03-11 11:09:37 +01:00
Simon de Vlieger	f9b55ff6a0	sources: rename `download` -> `fetch_all` Not all sources download things and `fetch_all` is consistent with `fetch_one`.	2024-01-26 09:58:48 +01:00
Simon de Vlieger	2c42c46c48	sources: move parallelisation into source This moves the parallelisation decisions into the sources themselves, making the `download` method abstract inside `osbuild` itself.	2024-01-26 09:58:48 +01:00
Simon de Vlieger	ea6085fae6	osbuild: run isort on all files	2022-09-12 13:32:51 +02:00
Achilleas Koutsou	f699720dbd	sources/curl: quote URL paths before downloading Some package versions [1] can contain carets and other characters that curl doesn't like. These need to be URL encoded. Interestingly, the documented way of replacing components in a parsed URL from urllib in Python is by calling the (seemingly private) `_replace()` method [2]. [1] https://docs.fedoraproject.org/en-US/packaging-guidelines/Versioning/#_snapshots [2] https://docs.python.org/3/library/urllib.parse.html#url-parsing	2022-08-31 22:28:54 +01:00
Simon de Vlieger	3fd864e5a9	osbuild: fix optional-types Optional types were provided in places but were not always correct. Add mypy checking and fix those that fail(ed).	2022-07-13 17:31:37 +02:00
Achilleas Koutsou	c8073b5836	sources: support calling curl with --insecure Add support for the `--insecure` curl flag, which makes curl skip the verification step when making secure connections (e.g., https://). This allows osbuild to download files from servers configured with SSL/TLS but whose certificate cannot be validated. This is supported for configuring repository sources in osbuild-composer.	2022-06-14 22:13:39 +02:00
Simon de Vlieger	7b0e1fe5fd	sources: curl max_workers 2 * num_cpus This changes the curl source to use the number of cpus times two for its thread count. A conservative number but a commonly used default.	2022-05-24 19:45:23 +02:00
Thomas Lavocat	1de74ce2c9	sources: generalizing download method Before, the download method was defined in the inherited class of each program. With the same kind of workflow redefined every time. This contribution aims at making the workflow more clear and to generalize what can be in the SourceService class. The download worklow is as follow: Setup -> Filter -> Prepare -> Download The setup mainly step sets up caches. Where the download data will be stored in the end. The filter step is used to discard some of the items to download based on some criterion. By default, it is used to verify if an item is already in the cache using the item's checksum. The Prepare step goes from each element and let the overloading step the ability to alter each item before downloading it. This is used mainly for the curl command which for rhel must generate the subscriptions. Then the download step will call fetch_one for each item. Here the download can be performed sequentially or in parallel depending on the number of workers selected.	2022-05-11 04:32:42 -05:00
Thomas Lavocat	128845da3c	sources: tidy the download method Only the "items to download" need to be passed as parameters. The rest is unpacked as attributes during the Setup step of the workflow.	2022-05-11 04:32:42 -05:00
Thomas Lavocat	92fe237f24	sources: introduce per-source content_type Introduce a new class member `content_type` that specifies what type of items the source will store in the cache. Use that to generalize the setup step, which is shared across all sources.	2022-05-11 04:32:42 -05:00
Thomas Lavocat	34cd9ef9f0	sources: generalize cache generation Introduce a `setup` step in the workflow that is responsible of generating the cache folder. This is then used in each download method.	2022-05-11 04:32:42 -05:00
Tom Gundersen	e175529f7c	sources/curl: don't limit total download time Some RPMs might be very large, and limiting the total download time might lead to failed build even in cases where downloading is making progress. Instead, set a minimum download speed (1kbps). If the minimum is not surpassed for 30 seconds in a row, the download fails and is retried. This follows the logic employed by DNF. Adjust the number of retries to 10 and the connection timeout to 30, in order to match what DNF does. One difference is that DNF does 10 retries across all downloads, whereas we do it per download, this could be changed in a follow-up. Old: - a download taking more than 5 minutes is unconditionally aborted New: - slow but working downloads will never be aborted - downloads will be stalled for at most five minutes in total before being aborted - time spent making progress does not count towards the five minutes Signed-off-by: Tom Gundersen <teg@jklm.no>	2022-03-16 14:48:03 +01:00
Christian Kellner	c902a7a754	sources: port to host services Port sources to also use the host services infrastructure that is used by inputs, devices and mounts. Sources are a bit different from the other services that they don't run for the duration of the stage but are run before anything is built. By using the same infrastructure we re-use the process management and inter process communcation. Additionally, this will forward all messages from sources to the existing monitoring framework. Adapt all existing sources and tests.	2021-09-22 00:00:20 +02:00
Alexander Larsson	072b75d78e	org.osbuild.curl: Don't load secrets if not needed This moves the check for already downloaded files earlier so that if all files are already downloaded we don't need to load the secrets. This is faster, but also it allows a pre-seeded object store to run the manifest on a system (like a VM) that isn't subscribed.	2021-09-22 00:00:20 +02:00
Martin Sehnoutka	ee3760e1ba	sources/curl: Implement new way of getting RHSM secrets The previous version covered too few use cases, more specifically a single subscription. That is of course not the case for many hosts, so osbuild needs to understand subscriptions. When running org.osbuild.curl source, read the /etc/yum.repos.d/redhat.repo file and load the system subscriptions from there. While processing each url, guess which subscription is tied to the url and use the CA certificate, client certificate, and client key associated with this subscription. It must be done this way because the depsolving and fetching of RPMs may be performed on different hosts and the subscription credentials are different in such case. More detailed description of why this approach was chosen is available in osbuild-composer git: https://github.com/osbuild/osbuild-composer/pull/1405	2021-06-04 18:23:05 +01:00
Christian Kellner	3ebfc6f657	sources/curl: use util.checksum.verify_file Now that there is a common utility function to verify the checksum of a file, use that. Also fix the json schema entry for the property to have to correct minium and maximum digest length, given the supported algorithm, which is 32 (md5) and 128 (sha512) characters.	2021-05-12 14:26:16 +02:00
Christian Kellner	518940cfe0	sources/curls: refactor downloading code Now that the `export` functionality is gone, the download code can be simplified, since we are not downloading a subset of the urls, but all of them.	2021-04-29 12:58:01 +02:00
Christian Kellner	5c19360cbe	sources/curl: remove export functionality Since the `sources.SourcesServer` has been removed, nothing is using the export functionality anymore. Inputs are now used to make content in the store available to stages. Remove all the export logic from org.osbuild.curl.	2021-04-29 12:58:01 +02:00
Christian Kellner	81c8374d3e	sources: rename org.osbuild.{files -> curl} The `org.osbuild.files` source provides files, but might in the future not be the only one that does. Therefore rename it to match the internal tool that is being used to fetch the files. This is done for most other osbuild modules that target tools. The format v1 loader is adapted to make this change transparent for users of the v1 format, so we are backwards compatible. Change the MPP depsolve preprocessor so that for format v2 based manifest `org.osbuild.curl` source is used. Also rename the corresponding source test. Adapt the format v2 mod test to use the curl source.	2021-02-12 19:27:08 +01:00

38 commits