Text Generation
Transformers
Safetensors
English
qwen3_5
image-text-to-text
creative-writing
long-form-generation
reinforcement-learning
grpo
story-generation
conversational
Instructions to use rishanthrajendhran/POLARIS-no-HRI-9B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use rishanthrajendhran/POLARIS-no-HRI-9B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="rishanthrajendhran/POLARIS-no-HRI-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("rishanthrajendhran/POLARIS-no-HRI-9B") model = AutoModelForMultimodalLM.from_pretrained("rishanthrajendhran/POLARIS-no-HRI-9B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use rishanthrajendhran/POLARIS-no-HRI-9B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "rishanthrajendhran/POLARIS-no-HRI-9B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-no-HRI-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/rishanthrajendhran/POLARIS-no-HRI-9B
- SGLang
How to use rishanthrajendhran/POLARIS-no-HRI-9B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "rishanthrajendhran/POLARIS-no-HRI-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-no-HRI-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "rishanthrajendhran/POLARIS-no-HRI-9B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "rishanthrajendhran/POLARIS-no-HRI-9B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use rishanthrajendhran/POLARIS-no-HRI-9B with Docker Model Runner:
docker model run hf.co/rishanthrajendhran/POLARIS-no-HRI-9B
| library_name: transformers | |
| license: other | |
| base_model: Qwen/Qwen3.5-9B | |
| language: | |
| - en | |
| tags: | |
| - text-generation | |
| - creative-writing | |
| - long-form-generation | |
| - reinforcement-learning | |
| - grpo | |
| - story-generation | |
| - transformers | |
| pipeline_tag: text-generation | |
| datasets: | |
| - rishanthrajendhran/POLARIS | |
| # POLARIS-no-HRI-9B | |
| POLARIS-no-HRI-9B is the matched ablation variant of [POLARIS-9B](https://huggingface.co/rishanthrajendhran/POLARIS-9B). | |
| It uses the same GRPO training recipe with the same structured Story Quality reward, identical | |
| hyperparameters, and the same training data — but without human-reference injection (HRI). | |
| Instead of 5 policy rollouts + 1 injected human-written story per group, it was trained with 6 policy | |
| rollouts with no reference anchor. | |
| It is a strong creative-writing model in its own right — substantially better than the base | |
| Qwen3.5-9B — but lags POLARIS-9B most noticeably at far-transfer lengths (8–12k words). | |
| ## Comparison with POLARIS-9B | |
| The gap between this model and POLARIS-9B is small at in-distribution lengths and grows at | |
| longer requested lengths, consistent with HRI's role in maintaining gradient pressure toward | |
| stronger writing as generation extends beyond the training range. | |
| **Story Quality by requested length** (GPT-5.4 judge, 180 held-out prompts) | |
| | Model | ID (1–4k) | Near OOD (4–8k) | Far OOD (8–12k) | Aggregate | Slope | | |
| |-------|-----------|-----------------|-----------------|-----------|-------| | |
| | POLARIS-9B | 57.4 | 48.2 | 44.1 | 52.1 | −3.0 | | |
| | **POLARIS-no-HRI-9B** | **56.5** | **47.0** | **37.7** | **49.7** | **−3.8** | | |
| | Qwen3.5-9B (base) | 35.1 | 8.7 | −11.8 | 18.5 | −10.8 | | |
| | Qwen3.5-27B | 51.5 | 38.7 | 24.6 | 42.8 | −5.9 | | |
| Slope is the linear fit across the six length buckets (points per step). A steeper negative | |
| slope indicates faster quality degradation as requested length increases. | |
| **EQ-Bench Longform by requested length** (GPT-5.4 judge, uniform aggregation) | |
| | Model | ID (1–4k) | Near OOD (4–8k) | Far OOD (8–12k) | Aggregate | | |
| |-------|-----------|-----------------|-----------------|-----------| | |
| | POLARIS-9B | 63.1 | 57.5 | 54.3 | 59.8 | | |
| | **POLARIS-no-HRI-9B** | **62.1** | **55.7** | **51.6** | **58.2** | | |
| | Qwen3.5-9B (base) | 50.2 | 37.2 | 30.3 | 42.6 | | |
| **Length adherence** (generated / requested word count) | |
| | Model | ID (1–4k) | Near OOD (4–8k) | Far OOD (8–12k) | All | | |
| |-------|-----------|-----------------|-----------------|-----| | |
| | POLARIS-9B | 0.99 | 0.87 | 0.72 | 0.90 | | |
| | **POLARIS-no-HRI-9B** | **0.94** | **0.86** | **0.70** | **0.87** | | |
| | Qwen3.5-9B (base) | 1.09 | 0.96 | 0.88 | 1.01 | | |
| **OOD benchmarks** | |
| | Model | WritingBench (D4) | LongBench-Write | EQ-Bench Creative | | |
| |-------|------------------|-----------------|-------------------| | |
| | POLARIS-9B | 7.9 | 81.2 | 70.3 | | |
| | **POLARIS-no-HRI-9B** | **7.8** | **82.1** | **69.7** | | |
| | Qwen3.5-9B (base) | 6.8 | 67.1 | 59.2 | | |
| On OOD benchmarks the two variants are essentially tied; the HRI advantage is concentrated at | |
| long in-distribution lengths where narrative coherence and arc completion are required over many | |
| thousands of tokens. | |
| ## Intended Use | |
| - Long-form story generation (short-stories, flash fiction, narrative scenes) | |
| - Creative writing (essays, book reviews, podcast scripts etc) | |
| ## Out-of-Scope Use | |
| - Factual or knowledge-intensive writing where correctness matters | |
| - Legal, medical, or financial content | |
| - Reproducing or recovering the withheld training stories | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model_id = "rishanthrajendhran/POLARIS-no-HRI-9B" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| torch_dtype="auto", | |
| device_map="auto", | |
| ) | |
| prompt = ( | |
| "Write a 2000-word story about an archivist who discovers that missing " | |
| "library books are returning with handwritten notes from the future." | |
| ) | |
| messages = [{"role": "user", "content": prompt}] | |
| text = tokenizer.apply_chat_template( | |
| messages, | |
| tokenize=False, | |
| add_generation_prompt=True, | |
| enable_thinking=True, | |
| ) | |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) | |
| outputs = model.generate( | |
| **inputs, | |
| max_new_tokens=8192, | |
| do_sample=True, | |
| temperature=0.6, | |
| top_p=0.95, | |
| top_k=20, | |
| repetition_penalty=1.10, | |
| ) | |
| generated = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True) | |
| print(generated) | |
| ``` | |
| ## Recommended Generation Settings | |
| Identical to POLARIS-9B. | |
| | Setting | Value | Notes | | |
| |---------|-------|-------| | |
| | `temperature` | 0.4-1.0 | Lower temperatures (0.4-0.6) recommended for long-form story writing | | |
| | `top_p` | 0.95 | | | |
| | `top_k` | 20 | | | |
| | `repetition_penalty` | 1.0-1.10 | | | |
| | `presence_penalty` | 0.0-1.5 | Do no set repetition_penalty and presence_penalty together | | |
| | `max_new_tokens` | 14336 | Minimum recommended for 8–12k target lengths | | |
| | `enable_thinking` | True | | | |
| ## Prompting | |
| it is recommended to include an explicit length request in the prompt: | |
| ``` | |
| Write a 3000-word story about [premise]. | |
| ``` | |
| At far-transfer lengths (8–12k), this model undershoots more than POLARIS-9B (length ratio | |
| ≈ 0.70 vs 0.72). For generation targets above 6k words, POLARIS-9B is the recommended variant. | |
| ## Known Limitations | |
| The same qualitative failure modes present in POLARIS-9B apply here — stylistic overloading | |
| and local coherence failures — since both models share the same base, training data, and reward. | |
| The key additional limitation of this variant relative to POLARIS-9B: | |
| **Steeper quality degradation at long lengths.** Story Quality slope is −3.8 vs −3.0 for | |
| POLARIS-9B. At 8–12k words, the gap to POLARIS-9B is 6.4 Story Quality points, compared to | |
| ~1–2 points at in-distribution lengths. If your use case involves prompts requesting long | |
| stories, POLARIS-9B is the better choice. | |
| ## Training | |
| Identical to POLARIS-9B except for the group composition. | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Base model | Qwen3.5-9B | | |
| | Training algorithm | GRPO | | |
| | Training data | ~1,388 prompt–story pairs from 100 short-story anthologies | | |
| | Max reference length | 4,000 words | | |
| | GPUs | 4× A100 80GB | | |
| | Training time | ~48 hours | | |
| | Compute cost | ~$400 | | |
| | Judge cost | ~$60 (Gemini 3 Flash, flex tier) | | |
| | Training steps | 160 | | |
| | Batch size | 8 GRPO groups | | |
| | Group size | 6 policy rollouts (no human reference) | | |
| | HRI | **Disabled** | | |
| | Online reward judge | Gemini 3 Flash | | |
| | Evaluation judge | GPT-5.4 | | |
| ## Citation | |
| ```bibtex | |
| @misc{rajendhran2026polarisguidingsmallmodels, | |
| title={POLARIS: Guiding Small Models to Write Long Stories}, | |
| author={Rishanth Rajendhran and Jenna Russell and Mohit Iyyer and John Frederick Wieting}, | |
| year={2026}, | |
| eprint={2606.04095}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2606.04095}, | |
| } | |
| ``` |