Spaces:
Running
Running
| # LiteRT-LM Python API | |
| The Python API of LiteRT-LM for **Linux and MacOS** (Windows support is upcoming). | |
| Features like **multi-modality** and **tools use** are supported, while **GPU | |
| acceleration** is upcoming. | |
| ## Introduction | |
| Here is a sample terminal chat app built with the Python API: | |
| ```python | |
| import litert_lm | |
| litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app | |
| with litert_lm.Engine("path/to/model.litertlm") as engine: | |
| with engine.create_conversation() as conversation: | |
| while True: | |
| user_input = input("\n>>> ") | |
| for chunk in conversation.send_message_async(user_input): | |
| print(chunk["content"][0]["text"], end="", flush=True) | |
| ``` | |
|  | |
| ## Getting Started | |
| LiteRT-LM is available as a Python library. You can install the nightly version from PyPI: | |
| ```bash | |
| # Using pip | |
| pip install litert-lm-nightly | |
| # Using uv | |
| uv pip install litert-lm-nightly | |
| ``` | |
| ### 1. Initialize the Engine | |
| The `Engine` is the entry point to the API. It handles model loading and resource management. Using it as a context manager (with the `with` statement) ensures that native resources are released promptly. | |
| **Note:** Initializing the engine can take several seconds to load the model. | |
| ```python | |
| import litert_lm | |
| # Initialize with the model path and optionally specify the backend. | |
| # backend can be Backend.CPU (default). GPU support is upcoming. | |
| with litert_lm.Engine( | |
| "path/to/your/model.litertlm", | |
| backend=litert_lm.Backend.CPU, | |
| # Optional: Pick a writable dir for caching compiled artifacts. | |
| # cache_dir="/tmp/litert-lm-cache" | |
| ) as engine: | |
| # ... Use the engine to create a conversation ... | |
| pass | |
| ``` | |
| ### 2. Create a Conversation | |
| A `Conversation` manages the state and history of your interaction with the model. | |
| ```python | |
| # Optional: Configure system instruction and initial messages | |
| messages = [ | |
| {"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]}, | |
| ] | |
| # Create the conversation | |
| with engine.create_conversation(messages=messages) as conversation: | |
| # ... Interact with the conversation ... | |
| pass | |
| ``` | |
| ### 3. Sending Messages | |
| You can send messages synchronously or asynchronously (streaming). | |
| **Synchronous Example:** | |
| ```python | |
| # Simple string input | |
| response = conversation.send_message("What is the capital of France?") | |
| print(response["content"][0]["text"]) | |
| # Or with full message structure | |
| # response = conversation.send_message({"role": "user", "content": "..."}) | |
| ``` | |
| **Asynchronous (Streaming) Example:** | |
| ```python | |
| # sendMessageAsync returns an iterator of response chunks | |
| stream = conversation.send_message_async("Tell me a long story.") | |
| for chunk in stream: | |
| # Chunks are dictionaries containing pieces of the response | |
| for item in chunk.get("content", []): | |
| if item.get("type") == "text": | |
| print(item["text"], end="", flush=True) | |
| print() | |
| ``` | |
| ### 4. Multi-Modality | |
| Note: This requires models with multi-modality support, such as [Gemma3n](https://huggingface.co/google/gemma-3n-E2B-it-litert-lm). | |
| ```python | |
| # Initialize with vision and/or audio backends if needed | |
| with litert_lm.Engine( | |
| "path/to/multimodal_model.litertlm", | |
| audio_backend=litert_lm.Backend.CPU, | |
| # vision_backend=litert_lm.Backend.CPU, (GPU support is upcoming) | |
| ) as engine: | |
| with engine.create_conversation() as conversation: | |
| user_message = { | |
| "role": "user", | |
| "content": [ | |
| {"type": "audio", "path": "/path/to/audio.wav"}, | |
| {"type": "text", "text": "Describe this audio."}, | |
| ], | |
| } | |
| response = conversation.send_message(user_message) | |
| print(response["content"][0]["text"]) | |
| ``` | |
| ### 5. Defining and Using Tools | |
| Note: This requires models with tool support, such as [FunctionGemma](https://huggingface.co/google/functiongemma-270m-it). | |
| You can define Python functions as tools that the model can call automatically. | |
| ```python | |
| def add_numbers(a: float, b: float) -> float: | |
| """Adds two numbers. | |
| Args: | |
| a: The first number. | |
| b: The second number. | |
| """ | |
| return a + b | |
| # Register the tool in the conversation | |
| tools = [add_numbers] | |
| with engine.create_conversation(tools=tools) as conversation: | |
| # The model will call add_numbers automatically if it needs to sum values | |
| response = conversation.send_message("What is 123 + 456?") | |
| print(response["content"][0]["text"]) | |
| ``` | |
| LiteRT-LM uses the function's docstring and type hints to generate the tool schema for the model. | |