Improve model card: Add metadata, paper link, code link, project blog link and usage instructions
Browse filesThis PR enriches the model card for `JacobiForcing_Coder_7B_v1` by adding:
- A link to the paper: [Fast and Accurate Causal Parallel Decoding using Jacobi Forcing](https://huggingface.co/papers/2512.14681).
- The `license: apache-2.0`.
- The `library_name: transformers` as the model is built on Transformers and uses its tokenizer and generation utilities.
- The `pipeline_tag: text-generation` to correctly classify the model's functionality.
- A link to the GitHub repository and project blog.
- A "Usage" section with code examples directly from the GitHub README to demonstrate how to use the model for inference.
This will significantly improve the discoverability and usability of the model.
README.md
CHANGED
|
@@ -1,3 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
Base Model: [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)
|
| 2 |
|
| 3 |
-
Training Data (Jacobi trajectories): https://huggingface.co/datasets/JacobiForcing/OpenCodeInstruct_training_data_n32
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
pipeline_tag: text-generation
|
| 4 |
+
library_name: transformers
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Jacobi Forcing: Fast and Accurate Causal Parallel Decoding
|
| 8 |
+
|
| 9 |
+
This repository contains the `JacobiForcing_Coder_7B_v1` model, presented in the paper [Fast and Accurate Causal Parallel Decoding using Jacobi Forcing](https://huggingface.co/papers/2512.14681).
|
| 10 |
+
|
| 11 |
Base Model: [Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)
|
| 12 |
|
| 13 |
+
Training Data (Jacobi trajectories): https://huggingface.co/datasets/JacobiForcing/OpenCodeInstruct_training_data_n32
|
| 14 |
+
|
| 15 |
+
Jacobi Forcing is a novel training technique that converts Large Language Models (LLMs) into native causal parallel decoders. This approach maintains the causal autoregressive backbone and addresses the AR-to-diffusion mismatch by training the model to handle noisy future blocks along its own Jacobi decoding trajectories.
|
| 16 |
+
|
| 17 |
+
It achieves up to $4.5\times$ higher tokens-per-forward and $4\times$ wall-clock speedup on coding and math tasks, while retaining near-AR generation quality.
|
| 18 |
+
|
| 19 |
+
You can find more details on the project blog: [Jacobi Forcing Blog](https://hao-ai-lab.github.io/blogs/jacobi-forcing/)
|
| 20 |
+
The official code repository is available here: [GitHub Repository](https://github.com/hao-ai-lab/JacobiForcing)
|
| 21 |
+
|
| 22 |
+
## Usage
|
| 23 |
+
|
| 24 |
+
You can try the chatbot demo locally or use the provided Python inference code.
|
| 25 |
+
|
| 26 |
+
### Local Chatbot Demo
|
| 27 |
+
```bash
|
| 28 |
+
# modify the script to use your local path
|
| 29 |
+
streamlit run applications/jacobi_model_chat.py
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
### Inference with Code
|
| 33 |
+
You can use our provided `eagenerate` function for speedup generation, similar to using `generate` from Hugging Face. Here is an example:
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
from eagle.model.ea_model import EaModel
|
| 37 |
+
from fastchat.model import get_conversation_template
|
| 38 |
+
import torch
|
| 39 |
+
|
| 40 |
+
# Assuming base_model_path and EAGLE_model_path are defined
|
| 41 |
+
# For example:
|
| 42 |
+
base_model_path = "Qwen/Qwen2.5-Coder-7B-Instruct"
|
| 43 |
+
EAGLE_model_path = "JacobiForcing/JacobiForcing_Coder_7B_v1" # Or your local path to the weights
|
| 44 |
+
|
| 45 |
+
model = EaModel.from_pretrained(
|
| 46 |
+
base_model_path=base_model_path,
|
| 47 |
+
ea_model_path=EAGLE_model_path,
|
| 48 |
+
torch_dtype=torch.float16,
|
| 49 |
+
low_cpu_mem_usage=True,
|
| 50 |
+
device_map="auto",
|
| 51 |
+
total_token=-1 # Automatically configure draft tokens
|
| 52 |
+
)
|
| 53 |
+
model.eval()
|
| 54 |
+
|
| 55 |
+
your_message="Hello"
|
| 56 |
+
conv = get_conversation_template("vicuna") # Use appropriate conversation template for your base model
|
| 57 |
+
conv.append_message(conv.roles[0], your_message)
|
| 58 |
+
conv.append_message(conv.roles[1], None)
|
| 59 |
+
prompt = conv.get_prompt()
|
| 60 |
+
|
| 61 |
+
input_ids = model.tokenizer([prompt]).input_ids
|
| 62 |
+
input_ids = torch.as_tensor(input_ids).cuda()
|
| 63 |
+
|
| 64 |
+
output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512)
|
| 65 |
+
output = model.tokenizer.decode(output_ids[0])
|
| 66 |
+
print(output)
|
| 67 |
+
```
|