Commit graph

36 commits

Author SHA1 Message Date
Michael Vogt
0abdfb9041 jsoncomm: transparently handle huge messages via fds
The existing jsoncomm is a work of beautiy. For very big arguments
however the used `SOCK_SEQPACKET` hits the limitations of the
kernel network buffer size (see also [0]). This lead to various
workarounds in #824,#1331,#1836 where parts of the request are
encoded as part of the json method call and parts are done via
a side-channel via fd-passing.

This commit changes the code so that the fd channel is automatically
and transparently created and the workarounds are removed. A test
is added that ensures that very big messages can be passed.

[0] https://github.com/osbuild/osbuild/pull/1833
2024-09-17 19:27:03 +02:00
Michael Vogt
77a61da760 osbuild: drop libdir from download() methods
The libdir is passed down for sources but it is never used in
any of our sources. As this is confusing and we want to eventually
support multiple libdirs remove this code.

It looks like the libdir for soruces was added a long time ago in 8423da3
but there is no indication if/how it is/was supposed to get used and
AFACT from going over the git history it was very used.

SourceService:dispatch() never sends "libdir" to the actual sources,
so it is not an even technically an API break.
2024-08-26 19:58:55 +02:00
Michael Vogt
1fc7ead2f4 sources: transform() is only used in the curl sources, remove from ABC 2024-03-19 14:21:57 +01:00
Michael Vogt
f214c69a98 osbuild: add workaround to integrate sources into progress reporting
This commit is somewhat poor, sorry for that. It mostly adds
workaround so that the osbuild sources can emit some progress
reporting as well. Without that the user experience is rather poor
and there is a long delay before any sort of progress can be
reported (even before the normal stages run).

With it the user experience is still not good but slightly better,
i.e. the progress monitor will report that the sources have
started downloading and curl will generated some log output. No
real progress unfortunately (sources subprogress will jump from
zero to 100%).
2024-03-12 16:44:12 +01:00
Simon de Vlieger
f9b55ff6a0 sources: rename download -> fetch_all
Not all sources download things and `fetch_all` is consistent with
`fetch_one`.
2024-01-26 09:58:48 +01:00
Simon de Vlieger
2c42c46c48 sources: move parallelisation into source
This moves the parallelisation decisions into the sources themselves,
making the `download` method abstract inside `osbuild` itself.
2024-01-26 09:58:48 +01:00
Tomáš Hozza
bf3e096735 Fix errors reported by new version of mypy
Fix the following errors:

```
osbuild/util/lvm2.py:117: error: Only instance methods can be decorated with @property
osbuild/api.py:50: error: Only instance methods can be decorated with @property
osbuild/sources.py:85: error: Only instance methods can be decorated with @property
```

Chaining of `@classmethod` and `@property` has been deprecated since
Python 3.11 with a note that chaining didn't work correctly in some
cases.

Relevant links:
https://github.com/python/mypy/issues/13746
https://docs.python.org/3.11/whatsnew/3.11.html#language-builtins

Signed-off-by: Tomáš Hozza <thozza@redhat.com>
2023-04-12 11:57:18 +02:00
Simon de Vlieger
ea6085fae6 osbuild: run isort on all files 2022-09-12 13:32:51 +02:00
Simon de Vlieger
3fd864e5a9 osbuild: fix optional-types
Optional types were provided in places but were not always correct. Add
mypy checking and fix those that fail(ed).
2022-07-13 17:31:37 +02:00
Thomas Lavocat
1de74ce2c9 sources: generalizing download method
Before, the download method was defined in the inherited class of each
program. With the same kind of workflow redefined every time. This
contribution aims at making the workflow more clear and to generalize
what can be in the SourceService class.

The download worklow is as follow:
Setup -> Filter -> Prepare -> Download

The setup mainly step sets up caches. Where the download data will be
stored in the end.

The filter step is used to discard some of the items to download based
on some criterion. By default, it is used to verify if an item is
already in the cache using the item's checksum.

The Prepare step goes from each element and let the overloading step the
ability to alter each item before downloading it. This is used mainly
for the curl command which for rhel must generate the subscriptions.

Then the download step will call fetch_one for each item. Here the
download can be performed sequentially or in parallel depending on the
number of workers selected.
2022-05-11 04:32:42 -05:00
Thomas Lavocat
0953cf64e0 sources: provide an unverified tmpdir
Some downloading program need a global unverified tmpdir to work within
before storing the definitive data. Provide this in the workflow
directly.
2022-05-11 04:32:42 -05:00
Thomas Lavocat
128845da3c sources: tidy the download method
Only the "items to download" need to be passed as parameters. The rest
is unpacked as attributes during the Setup step of the workflow.
2022-05-11 04:32:42 -05:00
Thomas Lavocat
92fe237f24 sources: introduce per-source content_type
Introduce a new class member `content_type` that specifies what type of
items the source will store in the cache. Use that to generalize the
setup step, which is shared across all sources.
2022-05-11 04:32:42 -05:00
Thomas Lavocat
34cd9ef9f0 sources: generalize cache generation
Introduce a `setup` step in the workflow that is responsible of
generating the cache folder. This is then used in each download method.
2022-05-11 04:32:42 -05:00
Christian Kellner
879c56a3b5 sources: pass items via temporary file
Since source were converted to host services it now uses a unix
socket instead of stdin to pass the arguments, which includes
the list of items to download. The latter can become quite big,
in fact too big to fit into a single package (NB: SOCK_SEQPACKET
is used for the underlying transport).
Therefore write the actual items to a temporary file and pass
the fd of it along the message.
2021-09-24 08:27:19 +01:00
Christian Kellner
c902a7a754 sources: port to host services
Port sources to also use the host services infrastructure that is
used by inputs, devices and mounts. Sources are a bit different
from the other services that they don't run for the duration of
the stage but are run before anything is built. By using the same
infrastructure we re-use the process management and inter process
communcation. Additionally, this will forward all messages from
sources to the existing monitoring framework.
Adapt all existing sources and tests.
2021-09-22 00:00:20 +02:00
Christian Kellner
aa19a1c4c0 sources: remove server and get method
The usage of the `sources.SourcesServer` and `sources.get` have
been removed from `Stage.run`, which was the only usage throughout
osbuild and thus it is not needed anymore and can be removed.
2021-04-29 12:58:01 +02:00
Christian Kellner
fa9c288988 sources: source itself controls cache sub-dir
Instead of supplying the full cache dir, i.e. the directory in
the store where the source will place the fetched resources, to
the source, only supply the root folder of the cache and let
the source itself create the desired sub-directory. This allows
the source to determine what type of resource it provides. This
makes the final directory independent of the name of the source:
a `org.osbuild.curl` source can place file-like resource in the
`org.osbuild.files` sub-directory. Then the `org.osbuild.files`
input can be used to get those from the cache directory.
2021-02-12 19:27:08 +01:00
Christian Kellner
931eac23c3 sources: introduce source items
All sources fetch various types of `items`, the specific nature
of which is dependent on the source type, but they are all
identifyable by a opaque identifier. In order for osbuild to
check that all the inputs that a stage needs are are indeed
contained in the manifest description, osbuild must learn what
ids are fetched by what source. This is done by standarzing
the common "items" part, i.e. the "id" -> "options for that id"
mapping that is common to all sources.
For the version 1 of the format, extract the files and ostree
the item information from the respective options.
Adapt the sources (files, ostree) so that they use the new items
information, but also fall back to the old style; the latter is
needed since the sources tests still uses the SourceServer.
2021-02-10 15:44:24 +01:00
Christian Kellner
8423da368a sources: introduce the Source class
Very much like, stages and inputs, a new `Source` calls represents
a source type with its options. A `download` method on the `Source`
class can be used to only donwload content to the cache.
2021-02-06 12:04:30 +01:00
Christian Kellner
8388a64ffa api: remove 'addr' param from message dispatcher
Now that jsoncomm is using a connection oriented protocol, the
`addr` parameter is not needed[*] and can thus be removed from
the `BaseAPI._message` message dispatcher. Adapt all usages
of it, including the tests.

[*] sendmsg ignores the destination parameter for connection
oriented sockets.
2020-07-29 02:16:20 +01:00
Christian Kellner
f8514e782c sources: use high level message dispatcher
Use the new `BaseAPI._message` high level message dispatcher that
is more convenient to use.
2020-07-27 12:50:38 +01:00
Christian Kellner
0c7284572e osbuild: auto-generate socket addresses for APIs
Rely on the ability of `BaseAPI` to auto-generate socket addresses
when no one was provided. The `BuildRoot` does not rely on the
sockets being created in the `BuildRoot.api` directory anymore and
will instead bind-mount each individual socket address to the well
known location via the `BaseAPI.endpoint` identifier.
Convert all API providers to take the `socket_address` as an
optional keyword argument.
2020-07-27 12:50:38 +01:00
Christian Kellner
bc81e68727 api: each API defines its 'endpoint' name
Add a new abstract class property to `BaseAPI` called `endpoint`,
meant to be implemented by deriving classes in order to identify
the end point name for the API provider.
Implement the new property in all existing API providers.
2020-07-27 12:50:38 +01:00
Christian Kellner
3ae056bf0f sources: port to use api.BaseAPI
Now that `api.BaseAPI` provides the basic scaffolding for API
servers, use that base class and remove the code duplication.
2020-07-27 12:50:38 +01:00
Christian Kellner
291fadd0b2 pylint: increase max attributes to 10
In three places we have more than 7 instances attributes, but less
then 10; instead of disabling the warning for all these cases,
increase the limit to a reasonable size of 10 and re-enable the
warnings in all the places.
2020-07-21 13:25:04 +02:00
Christian Kellner
e3eccbe491 osbuild: remove ability to pass in secrets
The way secrets work has been changed via commit 372b117: instead
of passing them in via the command line, the information how to
obtain secrets are encoded along the sources themselves.
The only stage that still has support for the old style way is the
deprecated org.osbuild.dnf stage, which might be removed in the
near future.
2020-07-10 11:44:15 +02:00
Christian Kellner
1896047bae sources: pass the library dir to the sources
The idea is that source can themselves spawn other modules, esp.
new secrets modules. For this they need to know the library dir,
aka 'libdir' throughout the osbuild source. Therefore change the
SourceServer to directly get the library directory instead of
just the sub-directory to the sources. Then pass the library
directory to via the JSON API to the source.
Adjust all usage of the SourceServer, including the tests.
2020-05-20 14:43:33 +02:00
David Rheinsberg
2bdb287731 sources: make cleanup explicit
This changes the sources module to explicitly cleanup event-loops.
Additionally, the implementation is protected against re-entrency which
we do not support (and do not need).

We did occasionally get the following exception when running
source-servers:

        /usr/lib/python3.8/asyncio/base_events.py:654: ResourceWarning: unclosed event loop <_UnixSelectorEventLoop running=False closed=False debug=False>
          _warn(f"unclosed event loop {self!r}", ResourceWarning, source=self)
        ResourceWarning: Enable tracemalloc to get the object allocation traceback

        Exception ignored in: <function BaseEventLoop.__del__ at 0x7f92589d14c0>
        Traceback (most recent call last):
          File "/usr/lib/python3.8/asyncio/base_events.py", line 656, in __del__
            self.close()
          File "/usr/lib/python3.8/asyncio/unix_events.py", line 58, in close
            super().close()
          File "/usr/lib/python3.8/asyncio/selector_events.py", line 92, in close
            self._close_self_pipe()
          File "/usr/lib/python3.8/asyncio/selector_events.py", line 99, in _close_self_pipe
            self._remove_reader(self._ssock.fileno())
          File "/usr/lib/python3.8/asyncio/selector_events.py", line 274, in _remove_reader
            key = self._selector.get_key(fd)
          File "/usr/lib/python3.8/selectors.py", line 190, in get_key
            return mapping[fileobj]
          File "/usr/lib/python3.8/selectors.py", line 71, in __getitem__
            fd = self._selector._fileobj_lookup(fileobj)
          File "/usr/lib/python3.8/selectors.py", line 225, in _fileobj_lookup
            return _fileobj_to_fd(fileobj)
          File "/usr/lib/python3.8/selectors.py", line 42, in _fileobj_to_fd
            raise ValueError("Invalid file descriptor: {}".format(fd))
        ValueError: Invalid file descriptor: -1

This is triggered when an event-loop is not closed explicitly via
`event_loop.close()`. It then tries to cleanup explicitly. The problem
here is that python has no knowledge of in which order it should
collect GC'ed objects. This might end up more or less random. Therefore,
file-descriptors might be closed in arbitrary order, leading to the
event-loop being unable to unregister its internal objects.

I am not entirely sure whether this is the case here. However, the error
definitely triggers on the internal event-loop socketpair, which there
is no other external access to. Furthermore, this socketpair is only set
to -1 in its own __del__ function. So unless we have a memory
corruption, I see nothing else that could trigger this.

With this fix in place, I can run `test_sources.py` in a loop without
triggering the bug.

It is quite likely that our other `*Server` classes need the same fix. I
did not verify, yet.
2020-04-24 11:48:16 +02:00
David Rheinsberg
4ad4da4658 osbuild: convert to jsoncomm
Convert the hard-coded DGRAM communication to util.jsoncomm. This
avoids hard-coding any IPC-details and simplifies the callers quite a
bit.
2020-04-21 13:47:38 +02:00
Tom Gundersen
7817ae5e8b sources: add org.osbuild.files source
This source adds support for downloaded files. The files are
indexed by their content hash, and the only option is their URL.

The main usecase for this will be downloading rpms. Allowing depsolving
to be done outside of osbuild, network access to be restricted and
downloaded rpms to be reused between runs.

Each source is now passed two additional arguments, a cache directory
and an output directory. Both are in the source's namespace, and
the source is responsible for managing them. Each directory may
contain contents from previous runs, but neither is ever guaranteed
to do so.

Downloaded contents may be saved to the cache and resued between
runs, and the requested content should be written to the output dir.
If secrets are used, the source must only ever write contents to
the output that corresponds to the available secrets (rather than
contents from the cache from previous runs).

Each stage is passed an additional argument, a sources directory.
The directory is read-only, and contains a subdirectory named after
each used source, which will contain the requseted contents when
the `Get()` call returns (if the source uses this functionality).

Based on a patch by Lars Karlitski.

Signed-off-by: Tom Gundersen <teg@jklm.no>
2020-02-06 19:01:12 +01:00
Lars Karlitski
12b5c6aaa4 sources: bump maximum message size to 64k
These messages contain certificate data, which is quite large.

We should probably use streaming sockets in the future.
2020-01-09 23:55:43 +01:00
Lars Karlitski
e123715bc6 osbuild: introduce secrets
Add a new command line option `--secrets`, which accepts a JSON file
that is structured similarly to a source file. It is should contain data
that is necessary to fetch content, but shouldn't appear in any logs.
2020-01-09 23:55:43 +01:00
Lars Karlitski
02ad4e3810 sources: fail gracefully when a source returns invalid JSON
Include the actual output of the source to help debugging.
2020-01-09 23:55:43 +01:00
Lars Karlitski
b9b2f99123 osbuild: create API sockets in the thread they're used in
This might (hopefully) fix a race in destructing the asyncio.EventLoop
that's used in all API classes, which leads to warnings about unhandled
exceptions on CI.

This also puts their creation closer to where the client-side sockets
are created.
2019-12-25 17:48:26 +01:00
Lars Karlitski
510e2b1e94 osbuild: introduce sources
Pipelines encode which source content they need in the form of
repository metadata checksums (or rpm checksums). In addition, they
encode where they fetch that source content from in the form of URLs.
This is overly specific and doesn't have to be in the pipeline's hash:
the checksum is enough to specify an image.

In practice, this precluded using alternative ways of getting at source
packages, such as local mirrors, which could speed up development.

Introduce a new osbuild API: sources. With it, a stage can query for a
way to fetch source content based on checksums.

The first such source is `org.osbuild.dnf`, which returns repository
configuration for a metadata checksum. Note that the dnf stage continues
to verify that the content it received matches the checksum it expects.

Sources are implemented as programs, living in a `sources` directory.
They are run on the host (i.e., uncontained) right now. Each source gets
passed options, which are taken from a new command line argument to
osbuild, and an array of checksums for which to return content.

This API is only available to stages right now.
2019-12-23 01:12:38 +01:00