jaothan's picture
Upload 356 files
c33a7ce verified
Linux Operating System Bootable containers enabled for AI Training
===
In order to run accelerated AI workloads, we've prepared [bootc](https://github.com/containers/bootc) container images for the major AI platforms.
# Makefile targets
| Target | Description |
|-----------------|---------------------------------------------------------------------|
| amd | Create bootable container for AMD platform |
| deepspeed | DeepSpeed container for optimization deep learning |
| disk-amd | Create disk image from bootable container for AMD platform |
| disk-intel | Create disk image from bootable container for Intel platform |
| disk-nvidia | Create disk image from bootable container for Nvidia platform |
| instruct-amd | Create instruct lab image for bootable container for AMD platform |
| instruct-intel | Create instruct lab image for bootable container for Intel platform |
| instruct-nvidia | Create instruct lab image for bootable container for Nvidia platform|
| intel | Create bootable container for Intel Habanalabs platform |
| nvidia | Create bootable container for NVidia platform |
| vllm | Containerized inference/serving engine for LLMs |
# Makefile variables
| Variable | Description | Default |
|---------------------------|-------------------------------------------------|---------------------------------------------|
| FROM | Overrides the base image for the Containerfiles | `quay.io/centos-bootc/centos-bootc:stream9` |
| REGISTRY | Container Registry for storing container images | `quay.io` |
| REGISTRY_ORG | Container Registry organization | `ai-lab` |
| IMAGE_NAME | Container image name | platform (i.e. `amd`) |
| IMAGE_TAG | Container image tag | `latest` |
| CONTAINER_TOOL | Container tool used for build | `podman` |
| CONTAINER_TOOL_EXTRA_ARGS | Container tool extra arguments | ` ` |
| VENDOR | Container image vendor label | ` ` |
Note: AI content is huge and requires a lot of disk space >200GB free to build.
# How to build InstructLab containers
In order to do AI Training you need to build instructlab container images.
Simply execute `make instruct-<platform>`. For example:
* make instruct-amd
* make instruct-intel
* make instruct-nvidia
Once you have these container images built it is time to build vllm.
# How to build the vllm inference engine
* make vllm
# How to build the deepspeed deepspeed container
* make deepspeed
# How to build bootc container images
In order to build the images (by default based on CentOS Stream), a simple `make <platform>` should be enough. For example to build the `nvidia`, `amd` and `intel` bootc containers, respectively:
```
make nvidia
make amd
make intel
```
## How to build bootc container images based on Red Hat Enterprise Linux
In order to build the training images based on Red Hat Enterprise Linux bootc images, the appropriate base container image must be used in the `FROM` field and the build process must be run on an *entitled Red Hat 9.x Enterprise Linux* with a valid subscription.
For example:
```
make nvidia FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
make amd FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
make intel FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
```
Of course, the other Makefile variables are still available, so the following is a valid build command:
```
make nvidia REGISTRY=myregistry.com REGISTRY_ORG=ai-training IMAGE_NAME=nvidia IMAGE_TAG=v1 FROM=registry.redhat.io/rhel9/rhel-bootc:9.4
```
# How to build disk images
bootc-image-builder produces disk images using a bootable container as input. Disk images can be used to directly provision a host
The process will write the disk image in <platform>-bootc/build
IMPORTANT: `osbuild-selinux` package needs to be installed for bootc-image-builder to work in a SELinux enabled host
To invoke bootc-image-builder, execute make disk-<platform>
```
make disk-nvidia
```
or
```
make disk-nvidia DISK_TYPE=ami BOOTC_IMAGE=quay.io/ai-lab/nvidia-bootc-custom:latest
```
In addition to the variables common to all targets, a few extra can be defined to customize disk image creation
| Variable | Description | Default |
|-----------------------|-----------------------------------|--------------------------------------------------|
| BOOTC_IMAGE | Image to use as input | `$REGISTRY/$REGISTRY_ORG/$IMAGE_NAME:$IMAGE_TAG` |
| DISK_TYPE | Type of image to build | `qcow2` |
| IMAGE_BUILDER_CONFIG | Path to a build-config file | `EMPTY` |
Image builder config file is documented in [bootc-image-builder README](https://github.com/osbuild/bootc-image-builder?tab=readme-ov-file#-build-config)
The following image disk types are currently available:
| Disk type | Target environment |
|-----------------------|---------------------------------------------------------------------------------------|
| `ami` | [Amazon Machine Image](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html) |
| `qcow2` **(default)** | [QEMU](https://www.qemu.org/) |
| `vmdk` | [VMDK](https://en.wikipedia.org/wiki/VMDK) usable in vSphere, among others |
| `anaconda-iso` | An unattended Anaconda installer that installs to the first disk found. |
| `raw` | Unformatted [raw disk](https://en.wikipedia.org/wiki/Rawdisk). |
# Images customized for cloud providers
For building images customized for each supported cloud provider, please read the [cloud providers section](cloud)
# Troubleshooting
Sometimes, interrupting the build process may lead to wanting a complete restart of the process. For those cases, we can instruct `podman` to start from scratch and discard the cached layers. This is possible by passing the `--no-cache` parameter to the build process by using the `CONTAINER_TOOL_EXTRA_ARGS` variable:
```
make <platform> CONTAINER_TOOL_EXTRA_ARGS="--no-cache"
```
The building of accelerated images requires a lot of temporary disk space. In case you need to specify a directory for temporary storage, this can be done with the `TMPDIR` environment variable:
```
make <platform> TMPDIR=/path/to/tmp
```