Support the s390x bootloader zipl (z Initial Program Loader). We
supply the parameters for the kernel+initrd as well es the target,
i.e. the boot partition where the bootmap is creating, the device,
here called 'targetbase', to install the bootloader on, including
parameters describing the device (type, blocksize) and also the
offset of the partition containing the target from the start of
device (in sectors).
The kernel and initrd are found via the bootloader entry, ignoring
the rescue kernel.
Since zipl needs the device as well as access to the boot partition
the image is bound to a loopback device. Also keep the filesystem
tree mounted during the execution of the zipl installation.
Include the `bootloader` options in the STAGE_OPTS json schema.
Commit 8fcf7d5c4… introduce the `bootloader` option but the
corresponding schema entry was omitted.
With the introduction of the `bootloader` option, grub2 legacy
installation setting changed. Before, grub2 legacy installation
was dependent on the partition scheme, i.e. only when dos/mbr
layout was used grub2 got installed. After the change the default
is to install it unless `bootloader.type" is explicitly set, even
if the partition layout is GPT. But a legacy grub2 installation
on GPT requires a BIOS boot partition, so the new default is not
right for the case of pure (non-hyrid) UEFI images.
Therefore revert to the old behavior of only defaulting to grub2
legacy if the option is not explicitly set *and* the partition
layout is "dos"/"mbr".
Adapt the f30-qcow2-gpt sample, which is non-uefi grub2 legacy
but with GPT and a bios boot partition, to explicitly request
the grub2 bootloader.
As noted in earlier commits the grub2 boot image needs to be patched
to contain the position of the grub2 core. By default, the location
in the boot image is hard-coded to be the mbr gap (sector 1) but for
GPT partition schemes a separate BIOS boot partition is used that is
located at a "random" location. Refactor the code to generalize the
boot image patching, where the default mbr gap location is just a
special case of the general.
The GRUB2 bootloader in legacy mode, i.e. non-EFI mode, consists of
several stages. The fist one place in the in the Master Boot Record
of the disk will load and execute the next, second stage, consisting
of core modules and the grub kernel. The first bit is also known as
'boot' and the second as 'core'. When the 'MBR' partition layout is
being used, there is a gap between the Master Boot Record (MBR) and
the first partition (for historical and performance reasons). The
core image normally is placed into this gap (call the MBR gap).
When the partition layout is 'gpt' there is no standard gap that can
be used, instead a special partition ("BIOS boot" [1]) needs to be
created that can store the grub2 core image. Additionally, the 'boot'
image need to modified to point the sector of that partition. The
core image itself also needs to be modified with the information of
the location its own second sector. The location of the pointers
were taken from the grub2 source ([2] at commit [3]). For the 'boot'
image it is 'GRUB_BOOT_MACHINE_KERNEL_SECTOR' (0x5c) from 'pc/boot.h'
and for the core image "0x200 - GRUB_BOOT_MACHINE_LIST_SIZE (12)" to
be found in 'pc/diskboot.S'.
[1] https://en.wikipedia.org/wiki/BIOS_boot_partition
[2] https://github.com/rhboot/grub2
[3] 2a2e10c1b39672de3d5da037a50d5c371f49b40d
Extract the small piece of code that writes the grub2's boot image,
i.e. the first stage of the bootloader that will in turn jump to
the second stage. Currently the position of the core is hard-coded
to be the MBR gap, i.e. the gap between the MBR and the start of
the first partition. This is not a necessity, e.g. when using a
dedicated BIOS boot partition on GPT partition layouts. This re-
factoring should make it easier to add code dealing with such
situations.
Introduce support for ppc64le (Open Firmware). The main difference
to x86 legacy, i.e. non-efi, is that no stage 1 is required because
the core image is stored on a special 'PReP' partition, which must
be marked as bootable. The firmware then looks for that partition
and directly loads the core from there and executes it.
Introduce a `platform` parameter for the grub installer code which
controls various platform depended aspects, including a) the path
for the modules, b) what modules are compiled into the core, c) if
the boot image is written to the MBR and 4) where to write the core
image, i.e. mbr-gap or PReP partition.
Extract the function that writes the grub2 core to the image file.
The only supported location currently is the MBR gap, which is the
gap between the Master Boot Record and the first partition, which
for historical and performance reasons was aligned to a certain
sector (used to be 64 but now is even larger with 2048). In the
future other locations for the grub2 core will be supported such
as the PReP partition (ppc64le) or bios-boot (GPT hybrid booting).
Make the bootloader selection explicit by introducing a new option
called `bootloader`, which is an object, containing the `type` and
options belonging to the bootloader. For now only boot-loader that
is supported is "grub2".
Instead of hard-coding "msdos1", determine this partition id
dynamically based on the partition table type and the index
of the partition that contains /boot/grub2, which normally is
either a separate boot partition or the root partition. In
order to be able to do so, set the index of each Partition
when the partition information is read back via `sfdisk`.
NB: partition indexes start at 1 for grub2.
The filesystem module that grub2 needs to have in the core image
is the filesystem containing the grub modules, specifically the
"normal.mod", as well as the grub configuration. In the standard
case, which is also what osbuild uses, this is /boot/grub2; thus
we actually do want the filesystem containing that directory and
its type not the root filesystem.
Explain the concept and reason behind the grub2 core as well as the
details behind the selection of the core modules that get included.
Also elaborate a bit on the MBR gap. For more details about this see
https://en.wikipedia.org/wiki/GNU_GRUB#Version_2_(GRUB_2)
NB: This commit also changes the order of the grub modules, which in
turn changes the layout of the core.img and thus the hash value used
in the test; adapt those value to reflect the changed core.img.
The GPT (GUID Partition Table) standard for partition layout supports
giving partition a name in the Partition object as well as in the
option for the qemu stage when specifying the partition layout.
Introduce a method on the PartitionTable that returns the partition
containing the root filesystem. NB: this does not have to be the
first partition (which could be the EFI partition, or something
else), so we have to iterate through the partitions until we find
it.
Instead of having dictionaries representing the partition table,
partitions and filesystems together with some functions operating
on them, have proper python objects with methods. In the future
these objects could be extract and properly tested as well.
Add mkfs_vfat and hook it up into the generic mkfs_for_type()
dispatcher function. Install grub2 to the MBR only if the partition
table is of type "MBR".
Introduce two new assembler options `pttype` and `partitions` to
allow fine grained control over how the partition table is created.
The first one controls the partition type, either `mbr` (default,
when the key is missing) or `gpt`; if specified the `partitions`
key must contain a list of objects describing the individual
partitions (`start`, `size`, `type`) together with a `filesystem`
object describing the filesystem (`type`, `uuid`, `mountpoint`) to
be created on that partition.
In the case the `pttype` option is missing, the legacy mode is used
where `root_fs_uuid` and `root_fs_type` need to be specified.
Use the newly available partition information in the install_grub2
method: detect which module to use for the root filesystem and
assert the second stage fits between the MBR and the first partition.
Introduce a generic mkfs_for_type() function that will dispatch
to the correct mkfs function depending on the type. Additionally
refactor the partition creation and mounting code to handle more
than one partition.
Part of the refactoring to support uefi/gpt: the method that creates
the partition table now returns an array of dictionaries corresponding
to the individual partitions that have been created together with the
information for the filesystem that this partition should end up with.
Prepare the stage for uefi/gpt support by extracting the code that
installs GRUB and creates the partitions into its own functions.
Should not have any effect on the actual data written to the image.
Commit 283281f broke compression by appending the argument last to the
tar command line. It needs to appear before the file.
Fix that and add a test.
[teg: add minor fix]
This introduces the `root_fs_type` option on the org.osbuild.rawfs
assembler. It only accepts "ext4" and "xfs" values right now and
defaults to "ext4" to preserve backwards compatibility.
This introduces the `root_fs_type` option on the org.osbuild.qemu
assembler. It only accepts "ext4" and "xfs" values right now and
defaults to "ext4" to preserve backwards compatibility.
This commit adds semi-structured documentation to all osbuild stages and
assemblers. The variables added work like this:
* STAGE_DESC: Short description of the stage.
* STAGE_INFO: Longer documentation of the stage, including expected
behavior, required binaries, etc.
* STAGE_OPTS: A JSON Schema describing the stage's expected/allowed
options. (see https://json-schema.org/ for details)
It also has a little unittest to check stageinfo - specifically:
1. All (executable) stages in stages/* and assemblers/ must define strings named
STAGE_DESC, STAGE_INFO, and STAGE_OPTS
2. The contents of STAGE_OPTS must be valid JSON (if you put '{' '}'
around it)
3. STAGE_OPTS, if non-empty, should have a "properties" object
4. if STAGE_OPTS lists "required" properties, those need to be present
in the "properties" object.
The test is *not* included in .travis.yml because I'm not sure we want
to fail the build for this, but it's still helpful as a lint-style
check.
Rather than relying on the offset parameter, simply run mkfs on the
loopback device which is anyway being set up. This also allows us
not to specify the size explicitly.
Before this patch mkfs would complain (uneccesarily) about the
backing file containing a partition table. This is a false positive
as the partition table is in the region of the file before the
passed offset.
Signed-off-by: Tom Gundersen <teg@jklm.no>
We know the root partition we want, as we are setting it up. There
is no need to search for it by filesystem UUID. This simplifies the
setup and means the level 1.5 bootloader is always the same, and
not dependent on an embedded UUID.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Similar to the existing test, but uses qemu-nbd to mount the generated
image.
Using unittest.TestCase.subTest() for now, which means that the tests
aren't very independent. I think this is fine in this case, because
we're testing images independently from each other, reusing the base
tree in the store.
Closing the socket is the responsibility of whoever opened it.
Fix this in the only user (qemu assembler) by using socket() in a `with`
block, which closes the socket on exit.
Background:
grub2 works in three stages:
- The first stage is found in the first 440 bytes of the master
boot record, and its only purpose is to load and execute the
second stage. This stage is static, and just copied from the rpm
without modification.
- The second stage is found in the gap between the MBR and the
first partition, and may be up to 31kB in size. This stage is
specific to the host and must contain the instructions for
finding the right file system and subdirectory for the grub2
config and modules on the host, as well as the modules needed
to do this.
- The third stage is found in the `normal` module, which loads
grub2.conf, which in turn may load more modules and perform
arbitrary instructions.
Problem:
grub2-install is responsible for installing all these stages on the
target image. This goes against our design, as modifications outside
the filesystem should happen in the assembler, but modifications to
the filesystem should happen in a stage. In particular, we don't
want the contents of the image to differ in any way from the output
tree that is stored in our content store (the output of our last
stage). This causes a practical problem at the moment, as our
selinux stage is ran before the assembler, and as such the grub
modules do not get selinux labels applied.
It turns out that we could split grub2-install in two as we want,
by passing `--no-bootsector` to it to install only the modules,
and copy/genereta the two first stages as files under /boot and
then run `grub2-bios-setup` to write the stages from /boot into
the image where they belong.
Regrettably, this does not work as both `grub2-install` and
`grub2-bios-setup` introspect the system and block devices they
are being run on to generate the right configuration. This is not
what we want, as we would like to specifcy the config explicitly
and run them independently of the target image. The specific bug
we get in both cases is that the canonical path containing our
object store cannot be found.
Before osbuild this was not a problem, as other installers would
instal and assemble everything directly in the target image as a
loopback device. Something we explicitly do not want to do.
Solution:
This patch essentially reimplements grub2-install, or rather the
parts of it that we need. One change in behavior from the upstream
tool is that we no longer write the level one and level two boot
loaders to /boot before moving them into place, but just write them
directly where they belong (so they do not end up on the
filesystem).
The parts that copy files into /boot are now in the grub2 installer
and the parts that write the level one/two bootloaders are in the
qemu assembler.
This achieves a few principles I think we should always adher to:
- never run tools from the target image (no chroot)
- don't read/copy files from the target image that was written
by other stages. We already try to avoid sharing state, and
by treating the image as write-only, we avoid accidentally
sharing state through the target tree.
Based-on-suggestions-from: Javier Martinez Canillas <javierm@redhat.com>
With-god-like-debugging-and-fixes-by: Lars Karlitski <lubreni@redhat.com>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Otherwise, sfdik would pick one at random. We want our images to be
reproducible to the extent possible, so we must move all randomness
out of the assemblers when we can.
Signed-off-by: Tom Gundersen <teg@jklm.no>
Opt in to supporting the most common ones, if we want to support more
we can add support as the need arises.
Signed-off-by: Tom Gundersen <teg@jklm.no>
In tests we often use tar assembler as final stage. This means we
compress the image tree and decompress it right away. For this purposes
it is nice to have option to not have any compression. Actually,
this could very drastically improve CI running time.
A better option would be not to use tar at all and instead let osbuild
just dump the resulting tree. However, we felt this behaviour needs
more discussion and we need a fix asap.
Directory /tmp is hosted on tmpfs. Therefore the image size could be
limited by memory size. By moving the image to /var/tmp we assure that
the file is hosted on disk allowing us to build bigger images or build
images on memory-constrained machines (e.g. CI).
Don't try to guess how much room the filesystem will take up. In
practice, most people will want to specify a size anyway, depending on
their use case.
As is typical for osbuild, there are no convenience features for the
pipeline (it's not meant to be written manually). `size` must be given
in bytes and it must be a multiple of 512.
Assemblers are always run in their own, clean environment and can be
sure that there's only one instance of themselves running. Remove the
extra layer of temporary directory and use static names.
We used to let mkfs.ext4 initialize the filesystem for us, but it
turns out that the metadata attributes of the root directory were
not being initialized from the source tree. In particular, this
meant that the SELinu labels were left as unconfined_t, rather
than root_t, which would not allow us to boot in enforcing mode.
An alternative approach might be to fixup the root inode manually,
while still doing the rest using mkfs.ext4, but let's leave that
for the future if it turns out to be worth it.
Signed-off-by: Tom Gundersen <teg@jklm.no>