The existing jsoncomm is a work of beautiy. For very big arguments
however the used `SOCK_SEQPACKET` hits the limitations of the
kernel network buffer size (see also [0]). This lead to various
workarounds in #824,#1331,#1836 where parts of the request are
encoded as part of the json method call and parts are done via
a side-channel via fd-passing.
This commit changes the code so that the fd channel is automatically
and transparently created and the workarounds are removed. A test
is added that ensures that very big messages can be passed.
[0] https://github.com/osbuild/osbuild/pull/1833
We recently hit the issue that `osbuild` crashed with:
```
Unable to decode response body "Traceback (most recent call last):
File \"/usr/bin/osbuild\", line 33, in <module>
sys.exit(load_entry_point('osbuild==124', 'console_scripts', 'osbuild')())
File \"/usr/lib/python3.9/site-packages/osbuild/main_cli.py\", line 181, in osbuild_cli
r = manifest.build(
File \"/usr/lib/python3.9/site-packages/osbuild/pipeline.py\", line 477, in build
res = pl.run(store, monitor, libdir, debug_break, stage_timeout)
File \"/usr/lib/python3.9/site-packages/osbuild/pipeline.py\", line 376, in run
results = self.build_stages(store,
File \"/usr/lib/python3.9/site-packages/osbuild/pipeline.py\", line 348, in build_stages
r = stage.run(tree,
File \"/usr/lib/python3.9/site-packages/osbuild/pipeline.py\", line 213, in run
data = ipmgr.map(ip, store)
File \"/usr/lib/python3.9/site-packages/osbuild/inputs.py\", line 94, in map
reply, _ = client.call_with_fds(\"map\", {}, fds)
File \"/usr/lib/python3.9/site-packages/osbuild/host.py\", line 373, in call_with_fds
kind, data = self.protocol.decode_message(ret)
File \"/usr/lib/python3.9/site-packages/osbuild/host.py\", line 83, in decode_message
raise ProtocolError(\"message empty\")
osbuild.host.ProtocolError: message empty
cannot run osbuild: exit status 1" into osbuild result: invalid character 'T' looking for beginning of value
...
input/packages (org.osbuild.files): Traceback (most recent call last):
input/packages (org.osbuild.files): File "/usr/lib/osbuild/inputs/org.osbuild.files", line 226, in <module>
input/packages (org.osbuild.files): main()
input/packages (org.osbuild.files): File "/usr/lib/osbuild/inputs/org.osbuild.files", line 222, in main
input/packages (org.osbuild.files): service.main()
input/packages (org.osbuild.files): File "/usr/lib/python3.11/site-packages/osbuild/host.py", line 250, in main
input/packages (org.osbuild.files): self.serve()
input/packages (org.osbuild.files): File "/usr/lib/python3.11/site-packages/osbuild/host.py", line 284, in serve
input/packages (org.osbuild.files): self.sock.send(reply, fds=reply_fds)
input/packages (org.osbuild.files): File "/usr/lib/python3.11/site-packages/osbuild/util/jsoncomm.py", line 407, in send
input/packages (org.osbuild.files): n = self._socket.sendmsg([serialized], cmsg, 0)
input/packages (org.osbuild.files): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
input/packages (org.osbuild.files): OSError: [Errno 90] Message too long
```
The underlying issue is that the reply of the `map()` call is too
big for the buffer that `jsoncomm` uses. This problem existed before
for the args of map and was fixed by introducing a temporary file
in https://github.com/osbuild/osbuild/pull/1331 (and similarly
before in https://github.com/osbuild/osbuild/pull/824).
This commit writes the return values also into a file. This should
fix the crash above and make the function more symetrical as well.
Alternative/complementary version of
https://github.com/osbuild/osbuild/pull/1833
Closes: HMS-4537
When `jsoncomm` fails because the message is too big it currently
does not indicate just how big the message was. This commit adds
this information so that it's easier for us to determine what to
do about it.
We could also include a pointer to `/proc/sys/net/core/wmem_defaults`
but it seems we want to not require fiddling with that so let's
not do it for now.
See also https://github.com/osbuild/osbuild/pull/1838
Often, a message is being sent and followed by a call to `recv`
to wait for a reply. Create a simple helper `send_and_recv` that
does both in one method.
Add a simple check for that helper to the tests.
Add a new constructor method that allows creating a `Socket` from
an existing file-descriptor of a socket. This might be need when
the socket was passed to a child process.
Add a simple test for the new constructor method.
Add a new constructor method, `Socket.new_pair`, to create a pair
of connected sockets (via `socketpair`) and wrap both sides via
`jsoncomm.Socket`.
Add a simple test to check it.
Explicit re-raise the BufferError exception in recv from the orignal
JSONDecodeError, so the latter gets recorded as the underlying cause.
Uncovered by pylint 2.6.0: W0707: "Not using raise from makes the
traceback inaccurate, because the message implies there is a bug in
the exception-handling code itself, which is a separate situation
than wrapping an exception."
Now that jsoncomm.Socket is using a connection-oriented socket,
the destination in `socket.sendmsg` is ignored and thus can and
should be dropped from the `jsoncomm.Socket.send` method.
Adjust the tests accordingly.
Switch to use a connection oriented datagram based protocol, i.e.
`SOCK_SEQPACKET`, instead of `SOCK_DGRAM`. It sill preserves
message boundaries, but since it is connection oriented the client
nor the server do not need to specify the destination addresses
of the peer in sendmsg/recvmesg. Moreover, the host will be able
to send messages to the client, even if the latter is sandboxed
with a separate network namespace. In the `SOCK_DRAM` case the
auto-bound address of the client would not be visible to the host
and thus sending messages would to it would fail.
Adapt the jsoncomm tests as well as `BaseAPI`.
Implement `accept` and `listen`, that call the equivalent methods
on the underlying socket; this prepares the move to a connection
oriented socket, i.e. `SOCK_SEQPACKET`.
Add a new `blocking` property to get and set the blocking state
of the underlying socket. In Python this is tied to the timeout
setting of the `socket.Socket`, i.e. non-blocking means having
any timeout specified, including "0" for not waiting at all.
Blocking means having a timeout value of `None`.
The getter is emulating the logic of `Socket.getblocking`, which
was added in 3.7, and we need to stay compatible with 3.6.
The logic is implemented in `Modules/socketmodule.c` in Python.
FdSet does derive directly and only from `object`. Not specifying
any base classes is the same as specifying an empty list of base
classes; therefore get rid of the empty list.
Add support for `util.types.PathLike` paths for socket addresses,
instead of just plain strings. Test it by using pathlib.Path to
create the address in the corresponding test.
Using `[]` as default value for arguments makes `pylint` complain. The
reason is that it creates an array statically at the time the function
is parsed, rather than dynamically on invocation of the function. This
means, when you append to this array, you change the global instance and
every further invocation of that function works on this modified array.
While our use-cases are safe, this is indeed a common pitfall. Lets
avoid using this and resort to `None` instead.
This silences a lot of warnings from pylint about "dangerous use of []".
Add a new module that implements a simple JSON communication channel.
This is meant to replace all our hard-coded SOCK_DGRAM code that is
copied all over the place.
This is intentionally left simple. It only supports synchronous
operations, trivial JSON encoding and decoding, and uses a message-based
transport mode.