Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
creative-writing
long-form-generation
reinforcement-learning
grpo
story-generation
conversational
Instructions to use rishanthrajendhran/POLARIS-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rishanthrajendhran/POLARIS-9B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rishanthrajendhran/POLARIS-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("rishanthrajendhran/POLARIS-9B") model = AutoModelForMultimodalLM.from_pretrained("rishanthrajendhran/POLARIS-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use rishanthrajendhran/POLARIS-9B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rishanthrajendhran/POLARIS-9B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rishanthrajendhran/POLARIS-9B
- SGLang
How to use rishanthrajendhran/POLARIS-9B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rishanthrajendhran/POLARIS-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rishanthrajendhran/POLARIS-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use rishanthrajendhran/POLARIS-9B with Docker Model Runner:
docker model run hf.co/rishanthrajendhran/POLARIS-9B
| library_name: transformers | |
| license: other | |
| base_model: Qwen/Qwen3.5-9B | |
| language: | |
| - en | |
| tags: | |
| - text-generation | |
| - creative-writing | |
| - long-form-generation | |
| - reinforcement-learning | |
| - grpo | |
| - story-generation | |
| - transformers | |
| pipeline_tag: text-generation | |
| datasets: | |
| - rishanthrajendhran/POLARIS | |
| extra_gated_prompt: "You agree to not share the model with others and also not use it for malicious purposes (eg. attempt to extract copyrighted human-written stories it was trained on)" | |
| extra_gated_fields: | |
| Company: text | |
| Country: country | |
| Specific date: date_picker | |
| I want to use this model for: | |
| type: select | |
| options: | |
| - Research | |
| - Education | |
| - label: Other | |
| value: other | |
| I agree to use this model for non-commercial use ONLY: checkbox | |
| # POLARIS-9B | |
| POLARIS-9B is a 9B parameter model for long-form English story generation, trained with a | |
| reinforcement-learning recipe on top of Qwen3.5-9B. Despite training only on stories up to 4k | |
| words, it maintains rubric quality on prompts requesting stories up to 12k words — 3× its | |
| training length. In pairwise evaluation it ranks above all tested open-weight models on | |
| EQ-Bench Creative Writing Elo, and a blinded human study finds it preferred to Qwen3.5-9B | |
| (67.5% winrate) and on par with Qwen3.5-27B (51.2% winrate). | |
| The training recipe — **POLARIS** (Policy Optimization with LLM-as-a-judge rewards and | |
| Anchored-Reference Injection for Storywriting) — uses two core components: a frontier LLM | |
| judge with a structured 16-dimension Story Quality rubric as the online reward, and | |
| human-reference injection (HRI), where a teacher-forced human-written story is inserted into | |
| each GRPO group as a high-reward anchor. The full training run costs approximately $500 in | |
| compute and judge calls (4×A100 80GB, ~48 hours). | |
| ## Results | |
| **Pairwise Elo (Gemini 3 Flash judge, dual-position)** | |
| | Rank | Model | EQ-Bench Creative Elo | | |
| |------|-------|----------------------| | |
| | 1 | GPT-5.4 | 1911 | | |
| | 2 | Claude Opus 4.6 | 1783 | | |
| | **3** | **POLARIS-9B** | **1661** | | |
| | 4 | Gemini 3.1 Pro | 1627 | | |
| | 5 | Gemini 3 Flash | 1620 | | |
| | 6 | Gemma 4 31B | 1514 | | |
| | 7 | Qwen3.5-27B | 1503 | | |
| | 9 | Qwen3.5-9B (base) | 1352 | | |
| **Story Quality by requested length** (GPT-5.4 judge, 180 held-out prompts) | |
| | Model | ID (1–4k) | Near OOD (4–8k) | Far OOD (8–12k) | Length ratio (8–12k) | | |
| |-------|-----------|-----------------|-----------------|----------------------| | |
| | POLARIS-9B | 57.4 | 48.2 | 44.1 | 0.72 | | |
| | Qwen3.5-27B | 51.5 | 38.7 | 24.6 | 0.82 | | |
| | Qwen3.5-9B (base) | 35.1 | 8.7 | −11.8 | 0.88 | | |
| | Gemma 4 31B | 53.9 | 49.7 | 47.1 | 0.36 | | |
| Length ratio is generated / requested word count (1.0 = exact). Gemma 4 31B maintains quality | |
| at long lengths by writing substantially shorter stories than requested; POLARIS-9B is among the few | |
| open-weight models in our comparison that largely avoids quality collapse, length runaway, *and* | |
| severe under-generation at far-transfer lengths. | |
| **Human evaluation** (60 prompt–generation pairs, blinded, two annotators) | |
| | Comparison | POLARIS-9B winrate | 95% CI | | |
| |-----------|-------------------|--------| | |
| | vs. Qwen3.5-9B | 67.5% | [55.0, 80.0] | | |
| | vs. Qwen3.5-27B | 51.2% | [38.8, 58.8] | | |
| Annotator comments most often highlight stronger atmosphere, voice, and scene realization | |
| relative to the base model. | |
| ## Intended Use | |
| - Long-form story generation (short stories, flash fiction, narrative scenes etc) | |
| - Creative writing (essays, book reviews, podcast scripts etc) | |
| POLARIS-9B is trained on short-story anthology data and transfers well to related narrative | |
| tasks. Within WritingBench, it performs strongest on categories closest to its training | |
| distribution: character design, fan fiction, novel manuscript, and podcast scripting. | |
| ## Out-of-Scope Use | |
| - Factual or knowledge-intensive writing where correctness matters | |
| - Legal, medical, or financial content | |
| - Reproducing or recovering the withheld training stories | |
| ## Usage | |
| POLARIS-9B uses extended thinking during generation. Enable thinking and provide adequate token | |
| budget for long stories. | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "rishanthrajendhran/POLARIS-9B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype="auto", | |
| device_map="auto", | |
| ) | |
| prompt = ( | |
| "Write a 2000-word story about an archivist who discovers that missing " | |
| "library books are returning with handwritten notes from the future." | |
| ) | |
| messages = [{"role": "user", "content": prompt}] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| # Enable thinking — important for quality | |
| enable_thinking=True, | |
| ) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=8192, | |
| do_sample=True, | |
| temperature=0.6, | |
| top_p=0.95, | |
| top_k=20, | |
| repetition_penalty=1.10, | |
| ) | |
| # Strip the thinking trace; return only the story | |
| generated = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) | |
| print(generated) | |
| ``` | |
| ## Recommended Generation Settings | |
| These match the settings used in the paper's main evaluation. | |
| | Setting | Value | Notes | | |
| |---------|-------|-------| | |
| | `temperature` | 0.4-1.0 | Lower temperature (0.4-0.6) is recommended for long-form story writing | |
| | `top_p` | 0.95 | | |
| | `top_k` | 20 | | | |
| | `repetition_penalty` | 1.0-1.10 | | |
| | `presence_penalty` | 0.0-1.5 | Do not set repetition_penalty and presence_penalty together | |
| | `max_new_tokens` | 14336 | Minimum recommended for 8–12k target lengths | | |
| | `enable_thinking` | True | Thinking traces are used at generation time | | |
| Thinking token budget counts toward `max_new_tokens` but is stripped before evaluation. If the | |
| model is producing very short stories, increasing `max_new_tokens` is usually the first thing to | |
| try. | |
| ## Prompting | |
| It is recommended to include an explicit length request in the prompt. POLARIS-9B was trained with length-stratified | |
| prompts and uses the requested word count to calibrate output length. Example: | |
| ``` | |
| Write a 3000-word story about [premise]. | |
| ``` | |
| At far-transfer lengths (8–12k words), the model undershoots somewhat (length ratio ≈ 0.72 | |
| aggregated across the far-OOD bucket). This is still substantially better than much larger open-weight models that | |
| write 0.36× the requested length while appearing to maintain quality scores. | |
| ## Known Limitations | |
| **Stylistic overloading.** The model can push too hard on specificity, jargon, or figurative | |
| density, making prose feel effortful to read even when individual sentences are well-crafted. | |
| Annotators flagged this as a recurring pattern. | |
| **Local coherence failures.** Contradicting details and confusing transitions may appear across | |
| examples, particularly in longer stories. The narrative usually stays on track, but individual | |
| passages may lose logical consistency. | |
| **Length undershooting at far transfer.** On prompts requesting 8–12k words, the model | |
| generates approximately 72% of the requested length on average. Quality is preserved relative | |
| to other open-weight models, but the full length target is not reliably met. | |
| **Story-writing distribution.** The training data is short-story anthology fiction (literary | |
| realism, horror/gothic, sci-fi, regional/folk writing). Performance on non-narrative writing | |
| categories (biography, essays, book reviews) is noticeably weaker. | |
| **Single-seed training.** The reported checkpoint reflects one training run. Seed-to-seed | |
| variance has not been characterized. | |
| ## Training | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Base model | Qwen3.5-9B | | |
| | Training algorithm | GRPO | | |
| | Training data | ~1,388 prompt–story pairs from 100 short-story anthologies | | |
| | Max reference length | 4,000 words | | |
| | GPUs | 4× A100 80GB | | |
| | Training time | ~48 hours | | |
| | Compute cost | ~$400 | | |
| | Judge cost | ~$60 (Gemini 3 Flash, flex tier) | | |
| | Training steps | 160 | | |
| | Batch size | 8 GRPO groups | | |
| | Group size | 6 (5 policy rollouts + 1 injected human reference) | | |
| | Online reward judge | Gemini 3 Flash | | |
| | Evaluation judge | GPT-5.4 | | |
| The human-written stories used in training are derived from commercially purchased anthologies | |
| and are not released. The associated prompt dataset is released separately. | |
| ## Citation | |
| ```bibtex | |
| @misc{rajendhran2026polarisguidingsmallmodels, | |
| title={POLARIS: Guiding Small Models to Write Long Stories}, | |
| author={Rishanth Rajendhran and Jenna Russell and Mohit Iyyer and John Frederick Wieting}, | |
| year={2026}, | |
| eprint={2606.04095}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2606.04095}, | |
| } | |
| ``` |