apple
/

CLaRa-7B-Instruct

instruction-tuned

Model card Files Files and versions

CLaRa-7B-Instruct / README.md

yizheapple's picture

Update README.md

a3b5889 verified 3 months ago

|

1.71 kB

	---
	license: apple-amlr
	base_model:
	- mistralai/Mistral-7B-Instruct-v0.2
	tags:
	- rag
	- compression
	- retrieval
	- instruction-tuned
	- generation
	library_name: transformers
	---


	# CLaRa-7B-Instruct (Compression-16 & 128)

	The CLaRa-7B-Instruct model is our instruction-tuned unified RAG model with built-in semantic document compression (16× & 128x).
	It supports instruction-following QA directly from compressed document representations.

	Training recipe: Instruction tuning on QA-style tasks built on top of the base semantic compression model.
	Benchmarks: Strong instruction-following performance under 16× compression.

	---

	## More details and usage examples:

	Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659)
	GitHub: https://github.com/apple/ml-clara

	Video (from @Fahd Mirza): https://youtu.be/al2VoAKn8GU?si=Q8bq7QNMaTvcArwa


	---

	## Example Usage (Instruction-Tuned Inference)

	```python
	from transformers import AutoModel

	unirag = AutoModel.from_pretrained(
	"/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16",
	trust_remote_code=True
	).to("cuda")

	documents = [
	[
	"Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...",
	"Hagsatera is a genus of flowering plants from the orchid family...",
	"Alsobia is a genus of flowering plants in the family Gesneriaceae..."
	]
	]

	questions = [
	"Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?"
	]

	# Instruction-tuned usage
	out = unirag.generate_from_text(
	questions=questions,
	documents=documents,
	max_new_tokens=64
	)

	print("Generated answer:", out)