Transformers documentation
ExecuTorch
Get started
Base classes
Models
Preprocessors
Inference
Pipeline API
Generate API
Optimization
Chat with models
Serving
Training
Quantization
Ecosystem integrations
Resources
API
You are viewing main version, which requires installation from source. If you'd like
regular pip install, checkout the latest stable version (v5.8.1).
ExecuTorch
ExecuTorch is a lightweight runtime for model inference on edge devices. It exports a PyTorch model into a portable, ahead-of-time format. A small C++ runtime plans memory and dispatches operations to hardware-specific backends. Execution and memory behavior is known before the model runs on device, so inference overhead is low.
Export a Transformers model with the optimum-executorch library.
CLI
Python
optimum-cli export executorch \
--model "HuggingFaceTB/SmolLM2-135M-Instruct" \
--task "text-generation" \
--recipe "xnnpack" \
--output_dir="./smollm2_exported"Transformers integration
The export process uses several Transformers components.
- from_pretrained() loads the model weights in safetensors format.
- Optimum applies graph optimizations and runs torch.export to create a
model.ptefile targeting your hardware backend. - AutoTokenizer or AutoProcessor loads the tokenizer or processor files and runs during inference.
- At runtime, a C++ runner class executes the
.ptefile on the ExecuTorch runtime.
#include <executorch/extension/llm/runner/text_llm_runner.h>
using namespace executorch::extension::llm;
int main() {
// Load tokenizer and create runner
auto tokenizer = load_tokenizer("path/to/tokenizer.json", nullptr, std::nullopt, 0, 0);
auto runner = create_text_llm_runner("path/to/model.pte", std::move(tokenizer));
// Load the model
runner->load();
// Configure generation
GenerationConfig config;
config.max_new_tokens = 100;
config.temperature = 0.8f;
// Generate text with streaming output
runner->generate("The capital of France is", config,
[](const std::string& token) { std::cout << token << std::flush; },
nullptr);
return 0;
}Resources
- ExecuTorch docs
- torch.export docs
- Exporting to production guide