Linux Operating System Bootable containers enabled for AI Training
In order to run accelerated AI workloads, we've prepared bootc container images for the major AI platforms.
Makefile targets
| Target | Description |
|---|---|
| amd | Create bootable container for AMD platform |
| deepspeed | DeepSpeed container for optimization deep learning |
| disk-amd | Create disk image from bootable container for AMD platform |
| disk-intel | Create disk image from bootable container for Intel platform |
| disk-nvidia | Create disk image from bootable container for Nvidia platform |
| instruct-amd | Create instruct lab image for bootable container for AMD platform |
| instruct-intel | Create instruct lab image for bootable container for Intel platform |
| instruct-nvidia | Create instruct lab image for bootable container for Nvidia platform |
| intel | Create bootable container for Intel Habanalabs platform |
| nvidia | Create bootable container for NVidia platform |
| vllm | Containerized inference/serving engine for LLMs |
Makefile variables
| Variable | Description | Default |
|---|---|---|
| FROM | Overrides the base image for the Containerfiles | quay.io/centos-bootc/centos-bootc:stream9 |
| REGISTRY | Container Registry for storing container images | quay.io |
| REGISTRY_ORG | Container Registry organization | ai-lab |
| IMAGE_NAME | Container image name | platform (i.e. amd) |
| IMAGE_TAG | Container image tag | latest |
| CONTAINER_TOOL | Container tool used for build | podman |
| CONTAINER_TOOL_EXTRA_ARGS | Container tool extra arguments | |
| VENDOR | Container image vendor label | |
Note: AI content is huge and requires a lot of disk space >200GB free to build.
How to build InstructLab containers
In order to do AI Training you need to build instructlab container images.
Simply execute make instruct-<platform>. For example:
- make instruct-amd
- make instruct-intel
- make instruct-nvidia
Once you have these container images built it is time to build vllm.
How to build the vllm inference engine
- make vllm
How to build the deepspeed deepspeed container
- make deepspeed
How to build bootc container images
In order to build the images (by default based on CentOS Stream), a simple make <platform> should be enough. For example to build the nvidia, amd and intel bootc containers, respectively:
make nvidia
make amd
make intel
How to build bootc container images based on Red Hat Enterprise Linux
In order to build the training images based on Red Hat Enterprise Linux bootc images, the appropriate base container image must be used in the FROM field and the build process must be run on an entitled Red Hat 9.x Enterprise Linux with a valid subscription.
For example:
make nvidia FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
make amd FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
make intel FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
Of course, the other Makefile variables are still available, so the following is a valid build command:
make nvidia REGISTRY=myregistry.com REGISTRY_ORG=ai-training IMAGE_NAME=nvidia IMAGE_TAG=v1 FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
How to build disk images
bootc-image-builder produces disk images using a bootable container as input. Disk images can be used to directly provision a host The process will write the disk image in -bootc/build
IMPORTANT: osbuild-selinux package needs to be installed for bootc-image-builder to work in a SELinux enabled host
To invoke bootc-image-builder, execute make disk-
make disk-nvidia
or
make disk-nvidia DISK_TYPE=ami BOOTC_IMAGE=quay.io/ai-lab/nvidia-bootc-custom:latest
In addition to the variables common to all targets, a few extra can be defined to customize disk image creation
| Variable | Description | Default |
|---|---|---|
| BOOTC_IMAGE | Image to use as input | $REGISTRY/$REGISTRY_ORG/$IMAGE_NAME:$IMAGE_TAG |
| DISK_TYPE | Type of image to build | qcow2 |
| IMAGE_BUILDER_CONFIG | Path to a build-config file | EMPTY |
Image builder config file is documented in bootc-image-builder README
The following image disk types are currently available:
| Disk type | Target environment |
|---|---|
ami |
Amazon Machine Image |
qcow2 (default) |
QEMU |
vmdk |
VMDK usable in vSphere, among others |
anaconda-iso |
An unattended Anaconda installer that installs to the first disk found. |
raw |
Unformatted raw disk. |
Images customized for cloud providers
For building images customized for each supported cloud provider, please read the cloud providers section
Troubleshooting
Sometimes, interrupting the build process may lead to wanting a complete restart of the process. For those cases, we can instruct podman to start from scratch and discard the cached layers. This is possible by passing the --no-cache parameter to the build process by using the CONTAINER_TOOL_EXTRA_ARGS variable:
make <platform> CONTAINER_TOOL_EXTRA_ARGS="--no-cache"
The building of accelerated images requires a lot of temporary disk space. In case you need to specify a directory for temporary storage, this can be done with the TMPDIR environment variable:
make <platform> TMPDIR=/path/to/tmp