Instructions to use rishanthrajendhran/POLARIS-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use rishanthrajendhran/POLARIS-9B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="rishanthrajendhran/POLARIS-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("rishanthrajendhran/POLARIS-9B")
model = AutoModelForMultimodalLM.from_pretrained("rishanthrajendhran/POLARIS-9B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use rishanthrajendhran/POLARIS-9B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rishanthrajendhran/POLARIS-9B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rishanthrajendhran/POLARIS-9B

SGLang

How to use rishanthrajendhran/POLARIS-9B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "rishanthrajendhran/POLARIS-9B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "rishanthrajendhran/POLARIS-9B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rishanthrajendhran/POLARIS-9B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use rishanthrajendhran/POLARIS-9B with Docker Model Runner:
```
docker model run hf.co/rishanthrajendhran/POLARIS-9B
```

POLARIS-9B / README.md

rishanthrajendhran

Update README.md

a297c82 verified 3 days ago

preview code

raw

history blame contribute delete

8.74 kB

	---
	library_name: transformers
	license: other
	base_model: Qwen/Qwen3.5-9B
	language:
	- en
	tags:
	- text-generation
	- creative-writing
	- long-form-generation
	- reinforcement-learning
	- grpo
	- story-generation
	- transformers
	pipeline_tag: text-generation
	datasets:
	- rishanthrajendhran/POLARIS
	extra_gated_prompt: "You agree to not share the model with others and also not use it for malicious purposes (eg. attempt to extract copyrighted human-written stories it was trained on)"
	extra_gated_fields:
	Company: text
	Country: country
	Specific date: date_picker
	I want to use this model for:
	type: select
	options:
	- Research
	- Education
	- label: Other
	value: other
	I agree to use this model for non-commercial use ONLY: checkbox
	---

	# POLARIS-9B

	POLARIS-9B is a 9B parameter model for long-form English story generation, trained with a
	reinforcement-learning recipe on top of Qwen3.5-9B. Despite training only on stories up to 4k
	words, it maintains rubric quality on prompts requesting stories up to 12k words — 3× its
	training length. In pairwise evaluation it ranks above all tested open-weight models on
	EQ-Bench Creative Writing Elo, and a blinded human study finds it preferred to Qwen3.5-9B
	(67.5% winrate) and on par with Qwen3.5-27B (51.2% winrate).

	The training recipe — POLARIS (Policy Optimization with LLM-as-a-judge rewards and
	Anchored-Reference Injection for Storywriting) — uses two core components: a frontier LLM
	judge with a structured 16-dimension Story Quality rubric as the online reward, and
	human-reference injection (HRI), where a teacher-forced human-written story is inserted into
	each GRPO group as a high-reward anchor. The full training run costs approximately $500 in
	compute and judge calls (4×A100 80GB, ~48 hours).

	## Results

	Pairwise Elo (Gemini 3 Flash judge, dual-position)

	\| Rank \| Model \| EQ-Bench Creative Elo \|
	\|------\|-------\|----------------------\|
	\| 1 \| GPT-5.4 \| 1911 \|
	\| 2 \| Claude Opus 4.6 \| 1783 \|
	\| 3 \| POLARIS-9B \| 1661 \|
	\| 4 \| Gemini 3.1 Pro \| 1627 \|
	\| 5 \| Gemini 3 Flash \| 1620 \|
	\| 6 \| Gemma 4 31B \| 1514 \|
	\| 7 \| Qwen3.5-27B \| 1503 \|
	\| 9 \| Qwen3.5-9B (base) \| 1352 \|

	Story Quality by requested length (GPT-5.4 judge, 180 held-out prompts)

	\| Model \| ID (1–4k) \| Near OOD (4–8k) \| Far OOD (8–12k) \| Length ratio (8–12k) \|
	\|-------\|-----------\|-----------------\|-----------------\|----------------------\|
	\| POLARIS-9B \| 57.4 \| 48.2 \| 44.1 \| 0.72 \|
	\| Qwen3.5-27B \| 51.5 \| 38.7 \| 24.6 \| 0.82 \|
	\| Qwen3.5-9B (base) \| 35.1 \| 8.7 \| −11.8 \| 0.88 \|
	\| Gemma 4 31B \| 53.9 \| 49.7 \| 47.1 \| 0.36 \|

	Length ratio is generated / requested word count (1.0 = exact). Gemma 4 31B maintains quality
	at long lengths by writing substantially shorter stories than requested; POLARIS-9B is among the few
	open-weight models in our comparison that largely avoids quality collapse, length runaway, and
	severe under-generation at far-transfer lengths.

	Human evaluation (60 prompt–generation pairs, blinded, two annotators)

	\| Comparison \| POLARIS-9B winrate \| 95% CI \|
	\|-----------\|-------------------\|--------\|
	\| vs. Qwen3.5-9B \| 67.5% \| [55.0, 80.0] \|
	\| vs. Qwen3.5-27B \| 51.2% \| [38.8, 58.8] \|

	Annotator comments most often highlight stronger atmosphere, voice, and scene realization
	relative to the base model.

	## Intended Use

	- Long-form story generation (short stories, flash fiction, narrative scenes etc)
	- Creative writing (essays, book reviews, podcast scripts etc)

	POLARIS-9B is trained on short-story anthology data and transfers well to related narrative
	tasks. Within WritingBench, it performs strongest on categories closest to its training
	distribution: character design, fan fiction, novel manuscript, and podcast scripting.

	## Out-of-Scope Use

	- Factual or knowledge-intensive writing where correctness matters
	- Legal, medical, or financial content
	- Reproducing or recovering the withheld training stories

	## Usage

	POLARIS-9B uses extended thinking during generation. Enable thinking and provide adequate token
	budget for long stories.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "rishanthrajendhran/POLARIS-9B"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype="auto",
	device_map="auto",
	)

	prompt = (
	"Write a 2000-word story about an archivist who discovers that missing "
	"library books are returning with handwritten notes from the future."
	)

	messages = [{"role": "user", "content": prompt}]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize=False,
	add_generation_prompt=True,
	# Enable thinking — important for quality
	enable_thinking=True,
	)
	inputs = tokenizer(text, return_tensors="pt").to(model.device)

	outputs = model.generate(
	**inputs,
	max_new_tokens=8192,
	do_sample=True,
	temperature=0.6,
	top_p=0.95,
	top_k=20,
	repetition_penalty=1.10,
	)

	# Strip the thinking trace; return only the story
	generated = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
	print(generated)
	```

	## Recommended Generation Settings

	These match the settings used in the paper's main evaluation.

	\| Setting \| Value \| Notes \|
	\|---------\|-------\|-------\|
	\| `temperature` \| 0.4-1.0 \| Lower temperature (0.4-0.6) is recommended for long-form story writing
	\| `top_p` \| 0.95 \|
	\| `top_k` \| 20 \| \|
	\| `repetition_penalty` \| 1.0-1.10 \|
	\| `presence_penalty` \| 0.0-1.5 \| Do not set repetition_penalty and presence_penalty together
	\| `max_new_tokens` \| 14336 \| Minimum recommended for 8–12k target lengths \|
	\| `enable_thinking` \| True \| Thinking traces are used at generation time \|

	Thinking token budget counts toward `max_new_tokens` but is stripped before evaluation. If the
	model is producing very short stories, increasing `max_new_tokens` is usually the first thing to
	try.

	## Prompting

	It is recommended to include an explicit length request in the prompt. POLARIS-9B was trained with length-stratified
	prompts and uses the requested word count to calibrate output length. Example:

	```
	Write a 3000-word story about [premise].
	```

	At far-transfer lengths (8–12k words), the model undershoots somewhat (length ratio ≈ 0.72
	aggregated across the far-OOD bucket). This is still substantially better than much larger open-weight models that
	write 0.36× the requested length while appearing to maintain quality scores.

	## Known Limitations

	Stylistic overloading. The model can push too hard on specificity, jargon, or figurative
	density, making prose feel effortful to read even when individual sentences are well-crafted.
	Annotators flagged this as a recurring pattern.

	Local coherence failures. Contradicting details and confusing transitions may appear across
	examples, particularly in longer stories. The narrative usually stays on track, but individual
	passages may lose logical consistency.

	Length undershooting at far transfer. On prompts requesting 8–12k words, the model
	generates approximately 72% of the requested length on average. Quality is preserved relative
	to other open-weight models, but the full length target is not reliably met.

	Story-writing distribution. The training data is short-story anthology fiction (literary
	realism, horror/gothic, sci-fi, regional/folk writing). Performance on non-narrative writing
	categories (biography, essays, book reviews) is noticeably weaker.

	Single-seed training. The reported checkpoint reflects one training run. Seed-to-seed
	variance has not been characterized.

	## Training

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Base model \| Qwen3.5-9B \|
	\| Training algorithm \| GRPO \|
	\| Training data \| ~1,388 prompt–story pairs from 100 short-story anthologies \|
	\| Max reference length \| 4,000 words \|
	\| GPUs \| 4× A100 80GB \|
	\| Training time \| ~48 hours \|
	\| Compute cost \| ~$400 \|
	\| Judge cost \| ~$60 (Gemini 3 Flash, flex tier) \|
	\| Training steps \| 160 \|
	\| Batch size \| 8 GRPO groups \|
	\| Group size \| 6 (5 policy rollouts + 1 injected human reference) \|
	\| Online reward judge \| Gemini 3 Flash \|
	\| Evaluation judge \| GPT-5.4 \|

	The human-written stories used in training are derived from commercially purchased anthologies
	and are not released. The associated prompt dataset is released separately.

	## Citation

	```bibtex
	@misc{rajendhran2026polarisguidingsmallmodels,
	title={POLARIS: Guiding Small Models to Write Long Stories},
	author={Rishanth Rajendhran and Jenna Russell and Mohit Iyyer and John Frederick Wieting},
	year={2026},
	eprint={2606.04095},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2606.04095},
	}
	```