| --- |
| language: |
| - en |
| license: apache-2.0 |
| base_model: Qwen/Qwen3-4B-Instruct-2507 |
| tags: |
| - scientific-evaluation |
| - citation-prediction |
| - preference-learning |
| - GRPO |
| pipeline_tag: text-generation |
| library_name: transformers |
| --- |
| |
| # SciJudge-Qwen3-4B |
|
|
| SciJudge-Qwen3-4B is a fine-tuned language model for **scientific paper evaluation**. Given two academic papers' metadata (title, abstract, publication date), it predicts which paper has a higher citation count — serving as a proxy for assessing research impact and "scientific taste." |
|
|
| This model is part of the paper: **[AI Can Learn Scientific Taste](https://arxiv.org/abs/2603.14473)**. |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| model_name = "OpenMOSS-Team/SciJudge-4B" |
| tokenizer = AutoTokenizer.from_pretrained(model_name) |
| model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="bfloat16", device_map="auto") |
| |
| messages = [ |
| {"role": "system", "content": "You are a helpful assistant. You first think about the reasoning process in your mind and then provide the user with the answer."}, |
| {"role": "user", "content": "Today is 2025-12-10. Based on the titles, abstracts, and publication dates of the following two papers A and B, determine which paper has a higher citation count.\nShow your reasoning process in <reason> </reason> tags. And return the final answer in <answer> </answer> tags. The final answer should contain only 'A' or 'B'.\n\nPaper A:\nTitle: ...\nAbstract: ...\nDate: ...\n\nPaper B:\nTitle: ...\nAbstract: ...\nDate: ..."} |
| ] |
| |
| text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.8, top_k=20) |
| response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True) |
| print(response) |
| ``` |
|
|
| ## Training Details |
|
|
| - **Base model:** Qwen3-4B-Instruct-2507 |
| - **Training method:** GRPO (Generative Reward Policy Optimization) with DAPO loss |
| - **Training data:** 720,341 preference pairs from arXiv papers |
| - **Learning rate:** 8e-7 (cosine schedule, 5% warmup) |
| - **Batch size:** 8 per device × 64 GPUs × 2 gradient accumulation = 1024 effective |
| - **Optimizer:** AdamW (β1=0.9, β2=0.95, weight decay=0.1) |
| - **Precision:** bfloat16 |
| - **KL coefficient (β):** 0.03 |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{scijudge2025, |
| title={AI Can Learn Scientific Taste}, |
| year={2025} |
| } |
| ``` |
|
|