Spaces:

FINAL-Bench
/

LiteRT-LM

Running

App Files Files Community

LiteRT-LM / docs /api /python /getting_started.md

SeaWolf-AI

Upload full LiteRT-LM codebase

5f923cd verified 5 days ago

preview code

raw

history blame contribute delete

4.55 kB

	# LiteRT-LM Python API

	The Python API of LiteRT-LM for Linux and MacOS (Windows support is upcoming).
	Features like multi-modality and tools use are supported, while **GPU
	acceleration** is upcoming.

	## Introduction

	Here is a sample terminal chat app built with the Python API:

	```python
	import litert_lm

	litert_lm.set_min_log_severity(litert_lm.LogSeverity.ERROR) # Hide log for TUI app

	with litert_lm.Engine("path/to/model.litertlm") as engine:
	with engine.create_conversation() as conversation:
	while True:
	user_input = input("\n>>> ")
	for chunk in conversation.send_message_async(user_input):
	print(chunk["content"][0]["text"], end="", flush=True)
	```

	![](../kotlin/demo.gif)

	## Getting Started

	LiteRT-LM is available as a Python library. You can install the nightly version from PyPI:

	```bash
	# Using pip
	pip install litert-lm-nightly

	# Using uv
	uv pip install litert-lm-nightly
	```

	### 1. Initialize the Engine

	The `Engine` is the entry point to the API. It handles model loading and resource management. Using it as a context manager (with the `with` statement) ensures that native resources are released promptly.

	Note: Initializing the engine can take several seconds to load the model.

	```python
	import litert_lm

	# Initialize with the model path and optionally specify the backend.
	# backend can be Backend.CPU (default). GPU support is upcoming.
	with litert_lm.Engine(
	"path/to/your/model.litertlm",
	backend=litert_lm.Backend.CPU,
	# Optional: Pick a writable dir for caching compiled artifacts.
	# cache_dir="/tmp/litert-lm-cache"
	) as engine:
	# ... Use the engine to create a conversation ...
	pass
	```

	### 2. Create a Conversation

	A `Conversation` manages the state and history of your interaction with the model.

	```python
	# Optional: Configure system instruction and initial messages
	messages = [
	{"role": "system", "content": [{"type": "text", "text": "You are a helpful assistant."}]},
	]

	# Create the conversation
	with engine.create_conversation(messages=messages) as conversation:
	# ... Interact with the conversation ...
	pass
	```

	### 3. Sending Messages

	You can send messages synchronously or asynchronously (streaming).

	Synchronous Example:

	```python
	# Simple string input
	response = conversation.send_message("What is the capital of France?")
	print(response["content"][0]["text"])

	# Or with full message structure
	# response = conversation.send_message({"role": "user", "content": "..."})
	```

	Asynchronous (Streaming) Example:

	```python
	# sendMessageAsync returns an iterator of response chunks
	stream = conversation.send_message_async("Tell me a long story.")
	for chunk in stream:
	# Chunks are dictionaries containing pieces of the response
	for item in chunk.get("content", []):
	if item.get("type") == "text":
	print(item["text"], end="", flush=True)
	print()
	```

	### 4. Multi-Modality

	Note: This requires models with multi-modality support, such as [Gemma3n](https://huggingface.co/google/gemma-3n-E2B-it-litert-lm).

	```python
	# Initialize with vision and/or audio backends if needed
	with litert_lm.Engine(
	"path/to/multimodal_model.litertlm",
	audio_backend=litert_lm.Backend.CPU,
	# vision_backend=litert_lm.Backend.CPU, (GPU support is upcoming)
	) as engine:
	with engine.create_conversation() as conversation:
	user_message = {
	"role": "user",
	"content": [
	{"type": "audio", "path": "/path/to/audio.wav"},
	{"type": "text", "text": "Describe this audio."},
	],
	}
	response = conversation.send_message(user_message)
	print(response["content"][0]["text"])
	```

	### 5. Defining and Using Tools

	Note: This requires models with tool support, such as [FunctionGemma](https://huggingface.co/google/functiongemma-270m-it).

	You can define Python functions as tools that the model can call automatically.

	```python
	def add_numbers(a: float, b: float) -> float:
	"""Adds two numbers.

	Args:
	a: The first number.
	b: The second number.
	"""
	return a + b

	# Register the tool in the conversation
	tools = [add_numbers]
	with engine.create_conversation(tools=tools) as conversation:
	# The model will call add_numbers automatically if it needs to sum values
	response = conversation.send_message("What is 123 + 456?")
	print(response["content"][0]["text"])
	```

	LiteRT-LM uses the function's docstring and type hints to generate the tool schema for the model.