Instructions to use rishanthrajendhran/POLARIS-no-HRI-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rishanthrajendhran/POLARIS-no-HRI-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rishanthrajendhran/POLARIS-no-HRI-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("rishanthrajendhran/POLARIS-no-HRI-9B")
model = AutoModelForMultimodalLM.from_pretrained("rishanthrajendhran/POLARIS-no-HRI-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use rishanthrajendhran/POLARIS-no-HRI-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rishanthrajendhran/POLARIS-no-HRI-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-no-HRI-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rishanthrajendhran/POLARIS-no-HRI-9B

SGLang

How to use rishanthrajendhran/POLARIS-no-HRI-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rishanthrajendhran/POLARIS-no-HRI-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-no-HRI-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rishanthrajendhran/POLARIS-no-HRI-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-no-HRI-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use rishanthrajendhran/POLARIS-no-HRI-9B with Docker Model Runner:
```
docker model run hf.co/rishanthrajendhran/POLARIS-no-HRI-9B
```

POLARIS-no-HRI-9B

File size: 6,928 Bytes

---
library_name: transformers
license: other
base_model: Qwen/Qwen3.5-9B
language:
- en
tags:
- text-generation
- creative-writing
- long-form-generation
- reinforcement-learning
- grpo
- story-generation
- transformers
pipeline_tag: text-generation
datasets:
- rishanthrajendhran/POLARIS
---

# POLARIS-no-HRI-9B

POLARIS-no-HRI-9B is the matched ablation variant of [POLARIS-9B](https://huggingface.co/rishanthrajendhran/POLARIS-9B).
It uses the same GRPO training recipe with the same structured Story Quality reward, identical
hyperparameters, and the same training data — but without human-reference injection (HRI).
Instead of 5 policy rollouts + 1 injected human-written story per group, it was trained with 6 policy
rollouts with no reference anchor.

It is a strong creative-writing model in its own right — substantially better than the base
Qwen3.5-9B — but lags POLARIS-9B most noticeably at far-transfer lengths (8–12k words).

## Comparison with POLARIS-9B

The gap between this model and POLARIS-9B is small at in-distribution lengths and grows at
longer requested lengths, consistent with HRI's role in maintaining gradient pressure toward
stronger writing as generation extends beyond the training range.

**Story Quality by requested length** (GPT-5.4 judge, 180 held-out prompts)

| Model | ID (1–4k) | Near OOD (4–8k) | Far OOD (8–12k) | Aggregate | Slope |
|-------|-----------|-----------------|-----------------|-----------|-------|
| POLARIS-9B | 57.4 | 48.2 | 44.1 | 52.1 | −3.0 |
| **POLARIS-no-HRI-9B** | **56.5** | **47.0** | **37.7** | **49.7** | **−3.8** |
| Qwen3.5-9B (base) | 35.1 | 8.7 | −11.8 | 18.5 | −10.8 |
| Qwen3.5-27B | 51.5 | 38.7 | 24.6 | 42.8 | −5.9 |

Slope is the linear fit across the six length buckets (points per step). A steeper negative
slope indicates faster quality degradation as requested length increases.

**EQ-Bench Longform by requested length** (GPT-5.4 judge, uniform aggregation)

| Model | ID (1–4k) | Near OOD (4–8k) | Far OOD (8–12k) | Aggregate |
|-------|-----------|-----------------|-----------------|-----------|
| POLARIS-9B | 63.1 | 57.5 | 54.3 | 59.8 |
| **POLARIS-no-HRI-9B** | **62.1** | **55.7** | **51.6** | **58.2** |
| Qwen3.5-9B (base) | 50.2 | 37.2 | 30.3 | 42.6 |

**Length adherence** (generated / requested word count)

| Model | ID (1–4k) | Near OOD (4–8k) | Far OOD (8–12k) | All |
|-------|-----------|-----------------|-----------------|-----|
| POLARIS-9B | 0.99 | 0.87 | 0.72 | 0.90 |
| **POLARIS-no-HRI-9B** | **0.94** | **0.86** | **0.70** | **0.87** |
| Qwen3.5-9B (base) | 1.09 | 0.96 | 0.88 | 1.01 |

**OOD benchmarks**

| Model | WritingBench (D4) | LongBench-Write | EQ-Bench Creative |
|-------|------------------|-----------------|-------------------|
| POLARIS-9B | 7.9 | 81.2 | 70.3 |
| **POLARIS-no-HRI-9B** | **7.8** | **82.1** | **69.7** |
| Qwen3.5-9B (base) | 6.8 | 67.1 | 59.2 |

On OOD benchmarks the two variants are essentially tied; the HRI advantage is concentrated at
long in-distribution lengths where narrative coherence and arc completion are required over many
thousands of tokens.

## Intended Use

- Long-form story generation (short-stories, flash fiction, narrative scenes)
- Creative writing (essays, book reviews, podcast scripts etc)

## Out-of-Scope Use

- Factual or knowledge-intensive writing where correctness matters
- Legal, medical, or financial content
- Reproducing or recovering the withheld training stories

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "rishanthrajendhran/POLARIS-no-HRI-9B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto",
)

prompt = (
    "Write a 2000-word story about an archivist who discovers that missing "
    "library books are returning with handwritten notes from the future."
)

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,
)
inputs = tokenizer(text, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=8192,
    do_sample=True,
    temperature=0.6,
    top_p=0.95,
    top_k=20,
    repetition_penalty=1.10,
)

generated = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(generated)
```

## Recommended Generation Settings

Identical to POLARIS-9B.

| Setting | Value | Notes |
|---------|-------|-------|
| `temperature` | 0.4-1.0 | Lower temperatures (0.4-0.6) recommended for long-form story writing |
| `top_p` | 0.95 | |
| `top_k` | 20 | |
| `repetition_penalty` | 1.0-1.10 | |
| `presence_penalty` | 0.0-1.5 | Do no set repetition_penalty and presence_penalty together |
| `max_new_tokens` | 14336 | Minimum recommended for 8–12k target lengths |
| `enable_thinking` | True | |

## Prompting

it is recommended to include an explicit length request in the prompt:

```
Write a 3000-word story about [premise].
```

At far-transfer lengths (8–12k), this model undershoots more than POLARIS-9B (length ratio
≈ 0.70 vs 0.72). For generation targets above 6k words, POLARIS-9B is the recommended variant.

## Known Limitations

The same qualitative failure modes present in POLARIS-9B apply here — stylistic overloading
and local coherence failures — since both models share the same base, training data, and reward.
The key additional limitation of this variant relative to POLARIS-9B:

**Steeper quality degradation at long lengths.** Story Quality slope is −3.8 vs −3.0 for
POLARIS-9B. At 8–12k words, the gap to POLARIS-9B is 6.4 Story Quality points, compared to
~1–2 points at in-distribution lengths. If your use case involves prompts requesting long
stories, POLARIS-9B is the better choice.

## Training

Identical to POLARIS-9B except for the group composition.

| Parameter | Value |
|-----------|-------|
| Base model | Qwen3.5-9B |
| Training algorithm | GRPO |
| Training data | ~1,388 prompt–story pairs from 100 short-story anthologies |
| Max reference length | 4,000 words |
| GPUs | 4× A100 80GB |
| Training time | ~48 hours |
| Compute cost | ~$400 |
| Judge cost | ~$60 (Gemini 3 Flash, flex tier) |
| Training steps | 160 |
| Batch size | 8 GRPO groups |
| Group size | 6 policy rollouts (no human reference) |
| HRI | **Disabled** |
| Online reward judge | Gemini 3 Flash |
| Evaluation judge | GPT-5.4 |

## Citation

```bibtex
@misc{rajendhran2026polarisguidingsmallmodels,
      title={POLARIS: Guiding Small Models to Write Long Stories}, 
      author={Rishanth Rajendhran and Jenna Russell and Mohit Iyyer and John Frederick Wieting},
      year={2026},
      eprint={2606.04095},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2606.04095}, 
}
```