Instructions to use whw06/MIRA-Text-Group3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use whw06/MIRA-Text-Group3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="whw06/MIRA-Text-Group3")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("whw06/MIRA-Text-Group3")
model = AutoModelForMultimodalLM.from_pretrained("whw06/MIRA-Text-Group3", device_map="auto")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use whw06/MIRA-Text-Group3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "whw06/MIRA-Text-Group3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "whw06/MIRA-Text-Group3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/whw06/MIRA-Text-Group3

SGLang

How to use whw06/MIRA-Text-Group3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "whw06/MIRA-Text-Group3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "whw06/MIRA-Text-Group3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "whw06/MIRA-Text-Group3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "whw06/MIRA-Text-Group3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use whw06/MIRA-Text-Group3 with Docker Model Runner:
```
docker model run hf.co/whw06/MIRA-Text-Group3
```

whw06 commited on May 27

Commit

be2fcd9

verified ·

1 Parent(s): bdd929f

Update README.md

Browse files

Files changed (1) hide show

README.md +252 -1

README.md CHANGED Viewed

@@ -1,3 +1,254 @@
 ---
-license: mit
 ---

 ---
+license: apache-2.0
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - mira
+  - mid-training
+  - data-selection
+  - rubric-scorer
+  - source-aware
+  - moe
+  - qwen3
+base_model: Qwen/Qwen3.5-35B-A3B-Base
 ---
+# MIRA-Text-Group3
+A student scorer from **MIRA** (Mid-training Rubric Anchoring for Source-Aware Data Selection), fine-tuned to score **code-task documentation (PR / issue / wiki)** along a group-specific set of anchor rubric dimensions.
+> 📄 **Paper**: *MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection* (EMNLP 2026)
+> 💻 **Code**: https://github.com/Multilingual-Multimodal-NLP/mira
+---
+## TL;DR
+MIRA is a source-aware data selection framework for heterogeneous **mid-training** corpora. Instead of applying a single global quality rubric, MIRA (1) clusters sources into capability-coherent groups, (2) lets a frontier teacher (Kimi-K2.6) freely propose rubric dimensions and *anchors* them per group, (3) distills the anchored teacher into a lightweight **per-group student scorer**, and (4) applies reliability-aware aggregation with per-source retention thresholds.
+**This repository is one of those student scorers** — variant **3** in the **Text** family, specialized for **code-task documentation (PR / issue / wiki)**. Given an in-distribution record, it produces a numerical score and a short rationale for every anchor dimension in this group's rubric.
+---
+## Model summary
+| | |
+|---|---|
+| **Architecture** | Mixture-of-Experts decoder (35B total / ≈3B active params) |
+| **Base model** | [Qwen3.5-35B-A3B-Base](https://huggingface.co/Qwen) |
+| **Fine-tuning** | Full-parameter SFT on Kimi-K2.6 anchored teacher labels |
+| **Domain** | Long code-related documents covering pull-requests, issues, repo wikis, Stack-Overflow notebooks, and templated bug-fix / file-localization / test-generation instructions. Strongest intra-group similarity: `ct_fixbug ↔ ct_unit_generation = 0.949`. |
+| **Anchor rubric** | 15 group-specific dimensions (`group_D_dim_anchors.jsonl` in the project repo) |
+| **Source count** | 6 text sources |
+| **Output** | Structured (score, rationale) per anchor dimension |
+| **Precision** | BF16 |
+| **License** | Apache-2.0 (inherits from Qwen3) |
+---
+## Sources covered
+This scorer is calibrated for the following mid-training sources in the **Text / Code-task documentation** group:
+| Source | Description |
+|---|---|
+| `pr_issue` | PR / Issue learning notes |
+| `deepwiki` | Repository wiki documentation (0420 refresh) |
+| `stackoverflow_notebook` | Stack-Overflow-style notebooks |
+| `ct_file_loc` | GitHub problem → file-localization template |
+| `ct_fixbug` | Bug-solving instruction template |
+| `ct_unit_generation` | Unit-test generation template |
+The full source-grouping report (KMeans k=4 / 5 clusters, intra-group cosine similarities) is in the [project repo](https://github.com/Multilingual-Multimodal-NLP/mira).
+---
+## Anchor dimensions (15 slots)
+The scoring rubric for this group, discovered via Kimi-K2.6 free-form judging and clustered into 15 anchor dimensions (KMeans k=15 over the group's dim-score embeddings). Dimensions below are sorted by cluster size — larger clusters dominate the corpus and carry more signal. Anchor names are read verbatim from this group's `group_D_dim_anchors.jsonl`; **some names recur across slots** because semantically related but distinct rubric facets were clustered separately by the teacher.
+| Slot | Dimension | Cluster size |
+|---|---|---:|
+| **A1** | Practical Actionability | 83,742 |
+| **A2** | Practical Actionability | 76,628 |
+| **A3** | Analytical Depth | 76,326 |
+| **A4** | Pedagogical Clarity | 69,324 |
+| **A5** | Reasoning Transparency | 61,419 |
+| **A6** | Signal-to-Noise Ratio | 60,615 |
+| **A7** | Repository Tree Navigation | 55,411 |
+| **A8** | Document Structure & Formatting | 47,952 |
+| **A9** | Practical Actionability | 42,272 |
+| **A10** | Signal-to-Noise Ratio | 40,637 |
+| **A11** | Training Utility | 36,132 |
+| **A12** | Code Snippet Fidelity | 32,340 |
+| **A13** | Format Compliance (SEARCH/REPLACE) | 23,085 |
+| **A14** | Safety & Harmlessness | 17,185 |
+| **A15** | Output Format Adherence | 12,529 |
+The scorer outputs one `[Ai] <dimension>: <score>/10 — <rationale>` line per slot, plus `overall`, `training_recommendation`, `domain_tag`, and `brief`.
+---
+## Where this model fits in the MIRA pipeline
+```
+┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐  ┌──────────────────┐
+│ 1. Rubric        │  │ 2. Anchored      │  │ 3. Reliability   │  │ 4. Data          │
+│    Discovery     │→ │    Judge         │→ │    Aggregation   │→ │    Selection     │
+│ (Kimi-K2.6,      │  │    Distillation  │  │ (mask unreliable │  │ (per-source      │
+│  free-form       │  │ ◀── THIS MODEL   │  │  src×dim cells)  │  │  retention)      │
+│  judging)        │  │                  │  │                  │  │                  │
+└──────────────────┘  └──────────────────┘  └──────────────────┘  └──────────────────┘
+```
+`MIRA-Text-Group3` lives in Stage 2: it scores the full **Text / Code-task documentation** corpus so that downstream stages can apply reliability masking and source-aware retention.
+---
+## Intended use
+- **Primary**: Score code-task documentation (PR / issue / wiki) on this group's anchor dimensions to drive source-aware data selection and filtering.
+- **Secondary**: Research on rubric distillation, semantic quality scoring, and reliability diagnostics for heterogeneous training corpora.
+**Not intended for**:
+- General-purpose chat or instruction following — fine-tuned to emit structured scores, not freeform dialogue.
+- Single-shot quality judgments without the anchor-dimension prompt template — outputs will be miscalibrated.
+- Records outside the **Text / Code-task documentation** group; use the matching sibling scorer instead.
+---
+## Deployment
+The scorer is designed to be served via **vLLM** behind an OpenAI-compatible endpoint and called in batch from the MIRA scoring pipeline.
+### 1. Serve with vLLM (recommended)
+```bash
+vllm serve whw06/MIRA-Text-Group3 \
+    --tensor-parallel-size 8 \
+    --dtype bfloat16 \
+    --max-model-len 65536 \
+    --max-num-batched-tokens 131072 \
+    --gpu-memory-utilization 0.9 \
+    --trust-remote-code \
+    --port 8000
+```
+**Why these values** (verified on H200 141GB during the paper's per-source evaluation):
+- `max-model-len=65536` — 2× the mid-training cutoff. Records can hit ~60K tokens for densely-tokenized sources; 40K runs into prompt-overflow errors.
+- `max-num-batched-tokens=131072` — supports two full-length sequences per scheduling step.
+- `gpu-memory-utilization=0.9` — 35B BF16 weights take ~70GB, leaving ~57GB KV cache. Roughly 4 concurrent 65K-context sequences per GPU.
+- 8-way tensor parallel works well for the 35B MoE on a single 8×H200/A100 node.
+### 2. Call from Python
+```python
+from openai import OpenAI
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="EMPTY")
+resp = client.chat.completions.create(
+    model="whw06/MIRA-Text-Group3",
+    messages=[
+        {"role": "system", "content": SYSTEM_PROMPT},   # group-D anchor calibration
+        {"role": "user",   "content": USER_PROMPT},     # record + [A1]..[A15] template
+    ],
+    temperature=0.7,
+    top_p=0.95,
+    max_tokens=2048,
+)
+print(resp.choices[0].message.content)
+```
+### 3. Prompt template
+The user message asks for one structured line per anchor dimension (top-15 of this group):
+```
+[A1] {anchor_dim_1}: <score>/10 — <justification>
+[A2] {anchor_dim_2}: <score>/10 — <justification>
+...
+[A15] {anchor_dim_15}: <score>/10 — <justification>
+overall: <0-100>
+training_recommendation: <keep | downsample | drop>
+domain_tag: <short tag>
+brief: <one-sentence summary>
+```
+The system prompt embeds the **top-12 anchor calibration references** (canonical examples from clustering) so the student matches the teacher's scoring scale. The full prompt builder, anchor JSONL files, and output parser are in the project repo's `scoring/score_text_anchored.py`.
+---
+## Training details
+| | |
+|---|---|
+| **Teacher** | Kimi-K2.6 (free-form rubric discovery in Phase 1; anchored re-scoring in Phase 2) |
+| **Training data** | Kimi-K2.6 anchored labels on this group's Phase-2 corpus, split into a distillation set + a held-out validation split for reliability diagnostics |
+| **Loss** | Standard next-token CE over (score, rationale) labels for every anchor dimension |
+| **Hyperparameters** | Held constant across all MIRA student scorers; full settings in paper Appendix A.4 |
+| **Validation** | Per-dimension teacher–student MAE and Spearman ρ on a held-out split; dimensions failing reliability thresholds are masked **post-hoc** (Figure 3 in the paper) |
+Training loss / step curve is preserved in `trainer_state.json` for full reproducibility.
+---
+## Headline results (from the paper)
+End-to-end downstream evaluation: Qwen2.5-Coder-14B mid-trained on **25B-token MIRA-selected subsets** vs. baselines, then SFT, evaluated on 9 code benchmarks across 4 categories.
+| Method                | Code Gen | MultiplE | SQL (EX) | SWE-Multi | **Macro Avg** |
+|-----------------------|---------:|---------:|---------:|----------:|--------------:|
+| Base + SFT (no mid)   |    53.91 |    72.57 |    64.24 |      3.67 |         48.60 |
+| Raw Mixture (50B)     |    53.71 |    67.42 |    94.18 |     40.00 |         63.83 |
+| Random (25B)          |    52.71 |    71.44 |    91.03 |     35.00 |         63.23 |
+| DataMan (25B)         |    53.82 |    71.38 |    93.84 |     33.00 |         63.01 |
+| DSIR (25B)            |    48.74 |    67.26 |    95.20 |     27.00 |         59.55 |
+| PPL (25B)             |    50.52 |    57.74 |    90.66 |     20.00 |         54.73 |
+| MIRA-Global (25B)     |    53.12 |    67.84 |    94.26 |     32.00 |         61.81 |
+| **MIRA-Group (25B)**  | **54.53**|    71.85 |    94.08 |     36.33 |     **64.20** |
+| MIRA-Source (25B)     |    54.18 | **72.84**|    94.38 |     30.33 |         62.93 |
+**MIRA-Group matches the full 50B-token raw mixture while using only half the tokens**, and out-performs all 25B-token selection baselines on the macro average. This scorer is one of the 12 student models used by the MIRA-Group variant.
+---
+## Sibling models
+MIRA releases one student scorer per source-group variant. Use the matching scorer for each record's format:
+- **Agent**: [whw06/MIRA-Agent-Group1](https://huggingface.co/whw06/MIRA-Agent-Group1) · [-Group2](https://huggingface.co/whw06/MIRA-Agent-Group2) · [-Group3](https://huggingface.co/whw06/MIRA-Agent-Group3) · [-Group4](https://huggingface.co/whw06/MIRA-Agent-Group4)
+- **QA**: [whw06/MIRA-QA-Group1](https://huggingface.co/whw06/MIRA-QA-Group1) · [-Group2](https://huggingface.co/whw06/MIRA-QA-Group2) · [-Group3](https://huggingface.co/whw06/MIRA-QA-Group3) · [-Group4](https://huggingface.co/whw06/MIRA-QA-Group4) · [-Group5](https://huggingface.co/whw06/MIRA-QA-Group5)
+- **Text**: [whw06/MIRA-Text-Group1](https://huggingface.co/whw06/MIRA-Text-Group1) · [-Group2](https://huggingface.co/whw06/MIRA-Text-Group2) · **MIRA-Text-Group3 (this model)**
+---
+## Limitations
+- MIRA addresses **source-aware filtering** only. Source discovery, mixture-ratio design, curriculum scheduling, deduplication and contamination control remain orthogonal concerns.
+- This scorer is calibrated against the **Text / Code-task documentation** group; cross-domain transfer is not advised — use the matching sibling for other source formats.
+- Some anchor dimensions exhibit high teacher–student MAE and are **masked post-hoc** during aggregation (see paper §3.4). The model still emits scores for masked dimensions; downstream consumers should re-apply the reliability mask from the project repository.
+- Calibrated on 6 sources within this group; behavior on out-of-distribution formats is unverified.
+---
+## Citation
+```bibtex
+@inproceedings{wang2026mira,
+  title     = {MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection},
+  author    = {Wang, Haowen and Du, Yaxin and Yang, Jian and Wu, Jiajun and
+               Liu, Shukai and Zhang, Yuxuan and Wang, Pingjie and Chen, Siheng and
+               Zheng, Tuney and Zhou, Ming and Liu, Xianglong},
+  booktitle = {Proceedings of the 2026 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
+  year      = {2026}
+}
+```
+---
+## Acknowledgments
+Built on [Qwen3.5-35B-A3B-Base](https://huggingface.co/Qwen) and the [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) training stack. Teacher labels generated with [Kimi-K2.6](https://moonshot.ai).