| | --- |
| | license: apple-amlr |
| | base_model: |
| | - mistralai/Mistral-7B-Instruct-v0.2 |
| | tags: |
| | - rag |
| | - compression |
| | - retrieval |
| | - instruction-tuned |
| | - generation |
| | library_name: transformers |
| | --- |
| | |
| |
|
| | # CLaRa-7B-Instruct (Compression-16 & 128) |
| |
|
| | The **CLaRa-7B-Instruct** model is our instruction-tuned unified RAG model with built-in semantic document compression (16× & 128x). |
| | It supports instruction-following QA directly from compressed document representations. |
| |
|
| | **Training recipe:** Instruction tuning on QA-style tasks built on top of the base semantic compression model. |
| | **Benchmarks:** Strong instruction-following performance under 16× compression. |
| |
|
| | --- |
| |
|
| | ## More details and usage examples: |
| |
|
| | Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659) |
| | GitHub: https://github.com/apple/ml-clara |
| |
|
| | Video (from @Fahd Mirza): https://youtu.be/al2VoAKn8GU?si=Q8bq7QNMaTvcArwa |
| |
|
| |
|
| | --- |
| |
|
| | ## Example Usage (Instruction-Tuned Inference) |
| |
|
| | ```python |
| | from transformers import AutoModel |
| | |
| | unirag = AutoModel.from_pretrained( |
| | "/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16", |
| | trust_remote_code=True |
| | ).to("cuda") |
| | |
| | documents = [ |
| | [ |
| | "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...", |
| | "Hagsatera is a genus of flowering plants from the orchid family...", |
| | "Alsobia is a genus of flowering plants in the family Gesneriaceae..." |
| | ] |
| | ] |
| | |
| | questions = [ |
| | "Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?" |
| | ] |
| | |
| | # Instruction-tuned usage |
| | out = unirag.generate_from_text( |
| | questions=questions, |
| | documents=documents, |
| | max_new_tokens=64 |
| | ) |
| | |
| | print("Generated answer:", out) |