Reka Edge
Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding, video analysis, object detection, and agentic tool-use.
Learn more about the Reka Edge in our announcement blog post.
Quick Start
๐ค Transformers (macOS)
The easiest way to run the model is with the included example.py script. It uses PEP 723 inline metadata so uv resolves dependencies automatically โ no manual install step:
uv run example.py --image media/hamburger.jpg --prompt "What is in this image?"
Inline snippet
If you prefer not to use the script, install dependencies manually and paste the code below:
uv pip install "transformers==4.57.3" torch torchvision pillow tiktoken imageio einops av
import torch
from PIL import Image
from transformers import AutoModelForImageTextToText, AutoProcessor
model_id = "RekaAI/reka-edge-2603"
# Load processor and model
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
trust_remote_code=True,
torch_dtype=torch.float16,
).eval()
# Move to MPS (Apple Silicon GPU)
device = torch.device("mps")
model = model.to(device)
# Prepare an image + text query
image_path = "media/hamburger.jpg" # included in the model repo
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": "What is in this image?"},
],
}
]
# Tokenize using the chat template
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
)
# Move tensors to device
for key, val in inputs.items():
if isinstance(val, torch.Tensor):
if val.is_floating_point():
inputs[key] = val.to(device=device, dtype=torch.float16)
else:
inputs[key] = val.to(device=device)
# Generate
with torch.inference_mode():
# Stop on <sep> token (end-of-turn) in addition to default EOS
sep_token_id = processor.tokenizer.convert_tokens_to_ids("<sep>")
output_ids = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
eos_token_id=[processor.tokenizer.eos_token_id, sep_token_id],
)
# Decode only the generated tokens
input_len = inputs["input_ids"].shape[1]
new_tokens = output_ids[0, input_len:]
output_text = processor.tokenizer.decode(new_tokens, skip_special_tokens=True)
# Strip any trailing <sep> turn-boundary marker
output_text = output_text.replace("<sep>", "").strip()
print(output_text)
Video queries
The model also accepts video inputs. Use --video instead of --image:
uv run example.py --video media/dashcam.mp4 --prompt "Is this person falling asleep?"
messages = [
{
"role": "user",
"content": [
{"type": "video", "video": "media/dashcam.mp4"},
{"type": "text", "text": "Is this person falling asleep?"},
],
}
]
Object detection queries
Given an input image, we use Detect: {expression} to instruct the model to perform object detection, where {expression} can describe a single object or multiple objects.
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image_path},
{"type": "text", "text": "Detect: red car, man in the white"},
],
}
]
Text-only queries
Omit the image entry from the content list:
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": "What is the capital of France?"},
],
}
]
Then run the same tokenization and generation steps as above.
Notes for MacOS
- MPS and dtype: Apple's MPS backend does not support
bfloat16. Always usetorch.float16. Do not usedevice_map="auto"โ it is not compatible with MPS. Load the model to CPU first, then call.to("mps"). - Pinned transformers: This checkpoint was exported with
transformers==4.57.3. Using a different version may cause loading errors or incorrect behavior. - Memory: The model requires ~14 GB in float16. A Mac with 32 GB unified memory is recommended to leave headroom for the OS and generation buffers.
vLLM
For high-throughput serving, you can use the vllm-reka plugin. This plugin extends standard vLLM to support Reka's custom architectures and optimized tokenizer.
Installation
Please follow our vllm-reka installation instructions to install the plugin along with vLLM.
Serving the Model
You can start the OpenAI-compatible API server by running the script serve.sh in vllm-reka with $MODEL_PATH set to RekaAI/reka-edge-2603.
bash serve.sh
We enable BitsAndBytes quantization by default here to reduce memory usage. To disable quantization, remove the --quantization flag from server.sh.
Querying the Server
Once the server is running, you can send requests using the OpenAI API format:
import openai
client = openai.OpenAI(
base_url="http://localhost:8000/v1",
api_key="EMPTY",
timeout=3600
)
# Video query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "video_url", "video_url": {"url": "https://example.com/video.mp4"}},
{"type": "text", "text": "Describe the video"},
],
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Image query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}},
{"type": "text", "text": "What is in this image?"}
]
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Object detection query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://example.com/image.png"}},
{"type": "text", "text": "Detect: green banana"}
]
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
# Text-only query
response = client.chat.completions.create(
model="RekaAI/reka-edge-2603",
messages=[
{
"role": "user",
"content": "What is the capital of France?",
}
],
stop=["\n\n<sep>"],
)
print(response.choices[0].message.content)
Notes
**trust_remote_code=True**is required because the model uses custom architecture code (Yasa2ForConditionalGeneration) that is bundled in this repository and loaded via theauto_mapconfig.
Requirements
Edge Deployment Devices
- Mac devices with Apple Silicon
- OS: macOS 13+
- Minimum: 24 GB memory
- Recommended: 32 GB+ memory
- Linux and Windows Subsystem for Linux (WSL) PCs
- Minimum: 24 GB GPU and 24 GB+ system memory
- Recommended: 32 GB+ GPU and 32 GB+ system memory
- Nvidia Robotics & Edge AI systems
- Jetson Thor
- Jetson AGX Orin (both 32 GB and 64 GB variants)
Custom Deployment Options
With quantization, Reka Edge can also be run on:
- Jetson Orin Nano
- Samsung S25
- Qualcomm Snapdragon XR2 Gen 3 devices
- Apple iPhone, iPad, and Vision Pro
Reach out for support deploying Reka Edge to a custom edge compute platform.
Software Requirements
- Python: 3.12+
- uv (recommended) โ handles dependencies automatically
- Downloads last month
- 58