Spaces:
Running
Running
| # Building from Source (Advanced) | |
| 🛑 **Note for App Developers:** You do **not** need to build this project from | |
| source to use it in your apps. If you are using Kotlin, Swift, or Python, | |
| please use our pre-built SDKs. More details in [technical overview](https://ai.google.dev/edge/litert-lm/overview). | |
| This section provides instructions for compiling the core LiteRT-LM C++ | |
| framework from scratch. You should only follow these steps if you are: | |
| * **A core contributor** fixing bugs or adding features to the LiteRT-LM engine. | |
| * **A native C++ developer** who requires custom compilation flags for an | |
| embedded system. | |
| - [Deploy to Windows](#deploy_to_windows) | |
| - [Deploy to Linux](#deploy_to_linux) | |
| - [Deploy to MacOS](#deploy_to_macos) | |
| - [Deploy to Android](#deploy_to_android) | |
| ## Build and Run | |
| This guide provides the necessary steps to build and execute a Large Language | |
| Model (LLM) on your device. Follow the instructions below to build and run the | |
| sample code. | |
| ### Prerequisites | |
| - **Git**: To clone the repository and manage versions. | |
| - **Bazel (version 7.6.1)**: This project uses `bazel` as its build system. | |
| #### Get the Source Code | |
| Current stable branch tag: | |
| [](https://github.com/google-ai-edge/LiteRT-LM/releases/latest) | |
| First, clone the repository to your local machine. We strongly recommend | |
| checking out the latest stable release tag to ensure you are working with a | |
| stable version of the code. | |
| **Clone the repository:** | |
| ``` | |
| git clone https://github.com/google-ai-edge/LiteRT-LM.git | |
| cd LiteRT-LM | |
| ``` | |
| **Fetch the latest tags from the remote repository:** | |
| ``` | |
| git fetch --tags | |
| ``` | |
| **Checkout the latest stable release | |
| ([](https://github.com/google-ai-edge/LiteRT-LM/releases/latest)):** | |
| To start working, create a new branch from the stable tag. This is the | |
| recommended approach for development. | |
| ``` | |
| git checkout -b <my-feature-branch> <release-tag, e.g. "v0.8.0"> | |
| ``` | |
| You are now on a local branch created from the tag and ready to work. | |
| #### Install Bazel | |
| This project requires Bazel version **7.6.1**. You can skip this if you already | |
| have it set up. | |
| The easiest way to manage Bazel versions is to install it via | |
| [Bazelisk](https://github.com/bazelbuild/bazelisk). Bazelisk will automatically | |
| download and use the correct Bazel version specified in the project's | |
| .bazelversion file. | |
| Alternatively, you can install Bazel manually by following the official | |
| installation [instructions](https://bazel.build/install) for your platform. | |
| ### Build and Run the Demo | |
| **LiteRT-LM** allows you to deploy and run LLMs on various platforms, including | |
| Android, Linux, MacOS, and Windows. `runtime/engine/litert_lm_main.cc` is a | |
| [demo](#demo-usage) that shows how to initialize and interact with the model. | |
| Please check the corresponding section below depending on your target deployment | |
| device and your development platform. | |
| Make sure [Git LFS](https://git-lfs.com) is installed, and run `git lfs pull` to | |
| fetch the latest prebuilt binaries. | |
| > Note: In order to run on GPU on all platforms, we need to take extra steps: | |
| > | |
| > 1. Add `--define=litert_link_capi_so=true` | |
| > `--define=resolve_symbols_in_exec=false` in the build command. | |
| > 1. `mkdir -p <test_dir>; cp <your litert_lm_main> <test_dir>; cp | |
| > ./prebuilt/<your OS>/<shared libaries> <test_dir>/` and make sure the | |
| > prebuilt .so/.dll/.dylib files are in the same directory as | |
| > `litert_lm_main` binary | |
| > 1. Running GPU on Windows needs DirectXShaderCompiler. See | |
| > [this Note](../../README.md#windows_gpu) for more details. | |
| <details> <span id="deploy_to_windows"></span> | |
| <summary><strong>Deploy to Windows</strong></summary> | |
| Building on Windows requires several prerequisites to be installed first. | |
| #### Prerequisites | |
| 1. **Visual Studio 2022** - Download from | |
| https://visualstudio.microsoft.com/downloads/ and install. Make sure it | |
| install the MSVC toolchain for all users, usually under this directory | |
| C:\Program Files. | |
| 2. **Git for Windows** - Install from https://git-scm.com/download/win | |
| (includes Git Bash needed for flatbuffer generation scripts). | |
| 3. **Python 3.13** - Download from https://www.python.org/downloads/ and | |
| install for all users. | |
| 4. **Bazel** - Install using Windows Package Manager (winget): `powershell | |
| winget install --id=Bazel.Bazelisk -e`. | |
| 5. **Java** - Install from https://www.oracle.com/java/technologies/downloads/ | |
| and set JAVA_HOME to point at the jdk directory. | |
| 6. **Enable long path** Make sure the LongPathsEnabled is true in the Registry. | |
| If needed, use `bazelisk --output_base=C:\bzl` to shorten the output path | |
| further. Otherwise, compilation errors related to file permission could | |
| happen. | |
| 7. Download the `.litertlm` model from the | |
| [Supported Models and Performance](../../README.md#supported-models-and-performance) | |
| section. | |
| #### Building and Running | |
| Once you've downloaded the `.litertlm` file, set the path for convenience: | |
| ```powershell | |
| $Env:MODEL_PATH = "C:\path\to\your_model.litertlm" | |
| ``` | |
| Build the binary: | |
| ```powershell | |
| # Build litert_lm_main for Windows. | |
| bazelisk build //runtime/engine:litert_lm_main --config=windows | |
| ``` | |
| Run the binary (make sure you run the following command in **powershell**): | |
| ```powershell | |
| # Run litert_lm_main.exe with a model .litertlm file. | |
| bazel-bin\runtime\engine\litert_lm_main.exe ` | |
| --backend=cpu ` | |
| --model_path=$Env:MODEL_PATH | |
| ``` | |
| </details> | |
| <details> <span id="deploy_to_linux"></span> | |
| <summary><strong>Deploy to Linux / Embedded</strong></summary> | |
| `clang` is used to build LiteRT-LM on linux. Build `litert_lm_main`, a CLI | |
| executable and run models on CPU. Note that you should download the `.litertlm` | |
| model from the | |
| [Supported Models and Performance](../../README.md#supported-models-and-performance) section. | |
| Note that one can also deploy the model to Raspberry Pi using the same setup and | |
| command in this section. | |
| Once you've downloaded the `.litertlm` file, set the path for convenience: | |
| ``` | |
| export MODEL_PATH=<path to your .litertlm file> | |
| ``` | |
| Build the binary: | |
| ``` | |
| bazel build //runtime/engine:litert_lm_main | |
| ``` | |
| Run the binary: | |
| ``` | |
| bazel-bin/runtime/engine/litert_lm_main \ | |
| --backend=cpu \ | |
| --model_path=$MODEL_PATH | |
| ``` | |
| </details> | |
| <details> <span id="deploy_to_macos"></span> | |
| <summary><strong>Deploy to MacOS</strong></summary> | |
| Xcode command line tools include clang. Run `xcode-select --install` if not | |
| installed before. Note that you should download the `.litertlm` model from the | |
| [Supported Models and Performance](../../README.md#supported-models-and-performance) section. | |
| Once you've downloaded the `.litertlm` file, set the path for convenience: | |
| ``` | |
| export MODEL_PATH=<path to your .litertlm file> | |
| ``` | |
| Build the binary: | |
| ``` | |
| bazel build //runtime/engine:litert_lm_main | |
| ``` | |
| Run the binary: | |
| ``` | |
| bazel-bin/runtime/engine/litert_lm_main \ | |
| --backend=cpu \ | |
| --model_path=$MODEL_PATH | |
| ``` | |
| </details> | |
| <details> <span id="deploy_to_android"></span> | |
| <summary><strong>Deploy to Android</strong></summary> | |
| To be able to interact with your Android device, please make sure you've | |
| properly installed | |
| [Android Debug Bridge](https://developer.android.com/tools/adb) and have a | |
| connected device that can be accessed via `adb`. | |
| **Note:** If you are interested in trying out LiteRT-LM with NPU acceleration, | |
| please check out [this page](https://ai.google.dev/edge/litert/next/npu) for | |
| more information about how to sign it up for an Early Access Program. | |
| <details> | |
| <summary><strong>Develop in Linux</strong></summary> | |
| To be able to build the binary for Android, one needs to install NDK r28b or | |
| newer from https://developer.android.com/ndk/downloads#stable-downloads. | |
| Specific steps are: | |
| - Download the `.zip` file from | |
| https://developer.android.com/ndk/downloads#stable-downloads. | |
| - Unzip the `.zip` file to your preferred location (say | |
| `/path/to/AndroidNDK/`) | |
| - Make `ANDROID_NDK_HOME` to point to the NDK directory. It should be | |
| something like: | |
| ``` | |
| export ANDROID_NDK_HOME=/path/to/AndroidNDK/ | |
| ``` | |
| *Tips: make sure your `ANDROID_NDK_HOME` points to the directory that has | |
| `README.md` in it.* | |
| With the above set up, let's try to build the `litert_lm_main` binary: | |
| ``` | |
| bazel build --config=android_arm64 //runtime/engine:litert_lm_main | |
| ``` | |
| </details> | |
| <details> | |
| <summary><strong>Develop in MacOS</strong></summary> | |
| Xcode command line tools include clang. Run `xcode-select --install` if not | |
| installed before. | |
| To be able to build the binary for Android, one needs to install NDK r28b or | |
| newer from https://developer.android.com/ndk/downloads#stable-downloads. | |
| Specific steps are: | |
| - Download the `.dmg` file from | |
| https://developer.android.com/ndk/downloads#stable-downloads. | |
| - Open the `.dmg` file and move the `AndroidNDK*` file to your preferred | |
| location (say `/path/to/AndroidNDK/`) | |
| - Make `ANDROID_NDK_HOME` to point to the NDK directory. It should be | |
| something like: | |
| ``` | |
| export ANDROID_NDK_HOME=/path/to/AndroidNDK/AndroidNDK*.app/Contents/NDK/ | |
| ``` | |
| *Tips: make sure your `ANDROID_NDK_HOME` points to the directory that has | |
| `README.md` in it.* | |
| With the above set up, let's try to build the `litert_lm_main` binary: | |
| ``` | |
| bazel build --config=android_arm64 //runtime/engine:litert_lm_main | |
| ``` | |
| </details> | |
| After the binary is successfully built, we can now try to run the model on | |
| device. Make sure you have the write access to the `DEVICE_FOLDER`: | |
| In order to run the binary on your Android device, we have to push a few assets | |
| / binaries. First set your `DEVICE_FOLDER`, please make sure you have the write | |
| access to it (typically you can put things under `/data/local/tmp/`): | |
| ``` | |
| export DEVICE_FOLDER=/data/local/tmp/ | |
| adb shell mkdir -p $DEVICE_FOLDER | |
| ``` | |
| To run with **CPU** backend, simply push the main binary and the `.litertlm` | |
| model to device and run. | |
| ``` | |
| # Skip model push if it is already there | |
| adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm | |
| adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER | |
| adb shell $DEVICE_FOLDER/litert_lm_main \ | |
| --backend=cpu \ | |
| --model_path=$DEVICE_FOLDER/model.litertlm | |
| ``` | |
| To run with **GPU** backend, we need additional `.so` files. They are located in | |
| the `prebuilt/` subfolder in the repo (we currently only support `arm64`). | |
| ``` | |
| # Skip model push if it is already there | |
| adb push $MODEL_PATH $DEVICE_FOLDER/model.litertlm | |
| adb push prebuilt/android_arm64/*.so $DEVICE_FOLDER | |
| adb push bazel-bin/runtime/engine/litert_lm_main $DEVICE_FOLDER | |
| adb shell LD_LIBRARY_PATH=$DEVICE_FOLDER \ | |
| $DEVICE_FOLDER/litert_lm_main \ | |
| --backend=gpu \ | |
| --model_path=$DEVICE_FOLDER/model.litertlm | |
| ``` | |
| </details> | |
| ### Demo Usage <span id="demo-usage"></span> | |
| `litert_lm_main` is a demo for running and evaluating large language models | |
| (LLMs) using our LiteRT [Engine/Conversation interface](../api/cpp/conversation.md). | |
| It provides basic functionalities as the following: | |
| - generating text based on a user-provided prompt. | |
| - executing the inference on various hardware backends, e.g. CPU / GPU. | |
| - includes options for performance analysis, allowing users to benchmark | |
| prefill and decoding speeds, as well as monitor peak memory consumption | |
| during the run. | |
| - supports both synchronous and asynchronous execution modes. | |
| <details> | |
| <summary><strong>Example commands</strong></summary> | |
| Below are a few example commands (please update accordingly when using `adb`): | |
| **Run the model with default prompt** | |
| ``` | |
| <path to binary directory>/litert_lm_main \ | |
| --backend=cpu \ | |
| --model_path=$MODEL_PATH | |
| ``` | |
| **Benchmark the model performance** | |
| ``` | |
| <path to binary directory>/litert_lm_main \ | |
| --backend=cpu \ | |
| --model_path=$MODEL_PATH \ | |
| --benchmark \ | |
| --benchmark_prefill_tokens=1024 \ | |
| --benchmark_decode_tokens=256 \ | |
| --async=false | |
| ``` | |
| *Tip: when benchmarking on Android devices, remember to use `taskset` to pin the | |
| executable to the main core for getting the consistent numbers, e.g. `taskset | |
| f0`.* | |
| **Run the model with your prompt** | |
| ``` | |
| <path to binary directory>/litert_lm_main \ | |
| --backend=cpu \ | |
| --input_prompt=\"Write me a song\" | |
| --model_path=$MODEL_PATH | |
| ``` | |
| More detailed description about each of the flags are in the following table: | |
| | Flag Name | Description | Default Value | | |
| | :----------------------------- | :------------------- | :------------------ | | |
| | `backend` | Executor backend to | `"gpu"` | | |
| : : use for LLM : : | |
| : : execution (e.g., : : | |
| : : cpu, gpu). : : | |
| | `model_path` | Path to the | `""` | | |
| : : `.litertlm` file for : : | |
| : : LLM execution. : : | |
| | `input_prompt` | Input prompt to use | `"What is the | | |
| : : for testing LLM : tallest building in : | |
| : : execution. : the world?"` : | |
| | `benchmark` | Benchmark the LLM | `false` | | |
| : : execution. : : | |
| | `benchmark_prefill_tokens` | If benchmark is true | `0` | | |
| : : and this value is > : : | |
| : : 0, the benchmark : : | |
| : : will use this number : : | |
| : : to set the prefill : : | |
| : : tokens, regardless : : | |
| : : of the input prompt. : : | |
| : : If this is non-zero, : : | |
| : : `async` must be : : | |
| : : `false`. : : | |
| | `benchmark_decode_tokens` | If benchmark is true | `0` | | |
| : : and this value is > : : | |
| : : 0, the benchmark : : | |
| : : will use this number : : | |
| : : to set the number of : : | |
| : : decode steps, : : | |
| : : regardless of the : : | |
| : : input prompt. : : | |
| | `async` | Run the LLM | `true` | | |
| : : execution : : | |
| : : asynchronously. : : | |
| | `report_peak_memory_footprint` | Report peak memory | `false` | | |
| : : footprint. : : | |
| </details> | |