Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
creative-writing
long-form-generation
reinforcement-learning
grpo
story-generation
conversational
Instructions to use rishanthrajendhran/POLARIS-no-HRI-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rishanthrajendhran/POLARIS-no-HRI-9B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rishanthrajendhran/POLARIS-no-HRI-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("rishanthrajendhran/POLARIS-no-HRI-9B") model = AutoModelForMultimodalLM.from_pretrained("rishanthrajendhran/POLARIS-no-HRI-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use rishanthrajendhran/POLARIS-no-HRI-9B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rishanthrajendhran/POLARIS-no-HRI-9B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-no-HRI-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rishanthrajendhran/POLARIS-no-HRI-9B
- SGLang
How to use rishanthrajendhran/POLARIS-no-HRI-9B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rishanthrajendhran/POLARIS-no-HRI-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-no-HRI-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rishanthrajendhran/POLARIS-no-HRI-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-no-HRI-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use rishanthrajendhran/POLARIS-no-HRI-9B with Docker Model Runner:
docker model run hf.co/rishanthrajendhran/POLARIS-no-HRI-9B
File size: 6,928 Bytes
7f19bde 040705e 7f19bde 040705e 7f19bde 040705e 7f19bde f68c359 7f19bde 6c5508a 7f19bde 6c5508a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 | ---
library_name: transformers
license: other
base_model: Qwen/Qwen3.5-9B
language:
- en
tags:
- text-generation
- creative-writing
- long-form-generation
- reinforcement-learning
- grpo
- story-generation
- transformers
pipeline_tag: text-generation
datasets:
- rishanthrajendhran/POLARIS
---
# POLARIS-no-HRI-9B
POLARIS-no-HRI-9B is the matched ablation variant of [POLARIS-9B](https://huggingface.co/rishanthrajendhran/POLARIS-9B).
It uses the same GRPO training recipe with the same structured Story Quality reward, identical
hyperparameters, and the same training data β but without human-reference injection (HRI).
Instead of 5 policy rollouts + 1 injected human-written story per group, it was trained with 6 policy
rollouts with no reference anchor.
It is a strong creative-writing model in its own right β substantially better than the base
Qwen3.5-9B β but lags POLARIS-9B most noticeably at far-transfer lengths (8β12k words).
## Comparison with POLARIS-9B
The gap between this model and POLARIS-9B is small at in-distribution lengths and grows at
longer requested lengths, consistent with HRI's role in maintaining gradient pressure toward
stronger writing as generation extends beyond the training range.
**Story Quality by requested length** (GPT-5.4 judge, 180 held-out prompts)
| Model | ID (1β4k) | Near OOD (4β8k) | Far OOD (8β12k) | Aggregate | Slope |
|-------|-----------|-----------------|-----------------|-----------|-------|
| POLARIS-9B | 57.4 | 48.2 | 44.1 | 52.1 | β3.0 |
| **POLARIS-no-HRI-9B** | **56.5** | **47.0** | **37.7** | **49.7** | **β3.8** |
| Qwen3.5-9B (base) | 35.1 | 8.7 | β11.8 | 18.5 | β10.8 |
| Qwen3.5-27B | 51.5 | 38.7 | 24.6 | 42.8 | β5.9 |
Slope is the linear fit across the six length buckets (points per step). A steeper negative
slope indicates faster quality degradation as requested length increases.
**EQ-Bench Longform by requested length** (GPT-5.4 judge, uniform aggregation)
| Model | ID (1β4k) | Near OOD (4β8k) | Far OOD (8β12k) | Aggregate |
|-------|-----------|-----------------|-----------------|-----------|
| POLARIS-9B | 63.1 | 57.5 | 54.3 | 59.8 |
| **POLARIS-no-HRI-9B** | **62.1** | **55.7** | **51.6** | **58.2** |
| Qwen3.5-9B (base) | 50.2 | 37.2 | 30.3 | 42.6 |
**Length adherence** (generated / requested word count)
| Model | ID (1β4k) | Near OOD (4β8k) | Far OOD (8β12k) | All |
|-------|-----------|-----------------|-----------------|-----|
| POLARIS-9B | 0.99 | 0.87 | 0.72 | 0.90 |
| **POLARIS-no-HRI-9B** | **0.94** | **0.86** | **0.70** | **0.87** |
| Qwen3.5-9B (base) | 1.09 | 0.96 | 0.88 | 1.01 |
**OOD benchmarks**
| Model | WritingBench (D4) | LongBench-Write | EQ-Bench Creative |
|-------|------------------|-----------------|-------------------|
| POLARIS-9B | 7.9 | 81.2 | 70.3 |
| **POLARIS-no-HRI-9B** | **7.8** | **82.1** | **69.7** |
| Qwen3.5-9B (base) | 6.8 | 67.1 | 59.2 |
On OOD benchmarks the two variants are essentially tied; the HRI advantage is concentrated at
long in-distribution lengths where narrative coherence and arc completion are required over many
thousands of tokens.
## Intended Use
- Long-form story generation (short-stories, flash fiction, narrative scenes)
- Creative writing (essays, book reviews, podcast scripts etc)
## Out-of-Scope Use
- Factual or knowledge-intensive writing where correctness matters
- Legal, medical, or financial content
- Reproducing or recovering the withheld training stories
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "rishanthrajendhran/POLARIS-no-HRI-9B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype="auto",
device_map="auto",
)
prompt = (
"Write a 2000-word story about an archivist who discovers that missing "
"library books are returning with handwritten notes from the future."
)
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=8192,
do_sample=True,
temperature=0.6,
top_p=0.95,
top_k=20,
repetition_penalty=1.10,
)
generated = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(generated)
```
## Recommended Generation Settings
Identical to POLARIS-9B.
| Setting | Value | Notes |
|---------|-------|-------|
| `temperature` | 0.4-1.0 | Lower temperatures (0.4-0.6) recommended for long-form story writing |
| `top_p` | 0.95 | |
| `top_k` | 20 | |
| `repetition_penalty` | 1.0-1.10 | |
| `presence_penalty` | 0.0-1.5 | Do no set repetition_penalty and presence_penalty together |
| `max_new_tokens` | 14336 | Minimum recommended for 8β12k target lengths |
| `enable_thinking` | True | |
## Prompting
it is recommended to include an explicit length request in the prompt:
```
Write a 3000-word story about [premise].
```
At far-transfer lengths (8β12k), this model undershoots more than POLARIS-9B (length ratio
β 0.70 vs 0.72). For generation targets above 6k words, POLARIS-9B is the recommended variant.
## Known Limitations
The same qualitative failure modes present in POLARIS-9B apply here β stylistic overloading
and local coherence failures β since both models share the same base, training data, and reward.
The key additional limitation of this variant relative to POLARIS-9B:
**Steeper quality degradation at long lengths.** Story Quality slope is β3.8 vs β3.0 for
POLARIS-9B. At 8β12k words, the gap to POLARIS-9B is 6.4 Story Quality points, compared to
~1β2 points at in-distribution lengths. If your use case involves prompts requesting long
stories, POLARIS-9B is the better choice.
## Training
Identical to POLARIS-9B except for the group composition.
| Parameter | Value |
|-----------|-------|
| Base model | Qwen3.5-9B |
| Training algorithm | GRPO |
| Training data | ~1,388 promptβstory pairs from 100 short-story anthologies |
| Max reference length | 4,000 words |
| GPUs | 4Γ A100 80GB |
| Training time | ~48 hours |
| Compute cost | ~$400 |
| Judge cost | ~$60 (Gemini 3 Flash, flex tier) |
| Training steps | 160 |
| Batch size | 8 GRPO groups |
| Group size | 6 policy rollouts (no human reference) |
| HRI | **Disabled** |
| Online reward judge | Gemini 3 Flash |
| Evaluation judge | GPT-5.4 |
## Citation
```bibtex
@misc{rajendhran2026polarisguidingsmallmodels,
title={POLARIS: Guiding Small Models to Write Long Stories},
author={Rishanth Rajendhran and Jenna Russell and Mohit Iyyer and John Frederick Wieting},
year={2026},
eprint={2606.04095},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2606.04095},
}
``` |