| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | base_model: |
| | - Qwen/Qwen3-14B |
| | --- |
| | |
| | # Paper-Summarizer-Qwen3-14B |
| |
|
| | A fine-tuned Qwen3-14B model specialized for generating structured summaries of scientific research papers in standardized JSON format. |
| |
|
| | ## Model Description |
| |
|
| | This model is part of [Project OSSAS](https://github.com/context-labs/laion-data-explorer), developed in collaboration with LAION and Wynd Labs to democratize access to scientific knowledge by creating structured summaries of research papers at scale. |
| |
|
| | **Base Model**: Qwen 3 14B |
| | **Training Data**: 110,000 curated research papers |
| | **Performance**: Achieves 73.9% accuracy on QA evaluation, comparable to GPT-5 (74.6%) |
| | **Cost Efficiency**: 98% lower cost than closed-source alternatives |
| |
|
| | This generates comprehensive structured summaries in a JSON format. The papers are either classified as SCIENTIFIC_TEXT, PARTIAL_SCIENTIFIC_TEXT, or NON_SCIENTIFIC_TEXT. The fields extracted are key research elements such as methodology, results, claims, and limitations. |
| | |
| | The model supports papers up to 131K tokens. |
| | |
| | ## Usage |
| | |
| | ### Serving the Model |
| | ```bash |
| | vllm serve inference-net/Paper-Summarizer-Qwen3-14B \ |
| | --port 8000 \ |
| | --host 0.0.0.0 \ |
| | --trust-remote-code \ |
| | --data-parallel-size 1 \ |
| | --tensor-parallel-size 1 \ |
| | --max-num-seqs 32 \ |
| | --max-model-len 131072 \ |
| | --max-num-batched-tokens 8192 \ |
| | --gpu-memory-utilization 0.90 \ |
| | --enable-prefix-caching \ |
| | --enable-chunked-prefill |
| | ``` |
| | |
| | ### Making Requests |
| | ```python |
| | import requests |
| | |
| | # System prompt (required) |
| | system_prompt = """[Insert the full system prompt from the prompt.txt file - |
| | see the full prompt in the model repository]""" |
| |
|
| | # User prompt: the paper text to summarize |
| | paper_text = """ |
| | Title: Your Paper Title |
| | Authors: Author 1, Author 2 |
| | Abstract: ... |
| | [Full paper content] |
| | """ |
| | |
| | # API request |
| | response = requests.post( |
| | "http://localhost:8000/v1/chat/completions", |
| | json={ |
| | "model": "inference-net/Paper-Summarizer-Qwen3-14B", |
| | "messages": [ |
| | {"role": "system", "content": system_prompt}, |
| | {"role": "user", "content": paper_text} |
| | ], |
| | "temperature": 0.2 |
| | }, |
| | timeout=600 |
| | ) |
| | |
| | result = response.json() |
| | summary = result["choices"][0]["message"]["content"] |
| | print(summary) |
| | ``` |
| | |
| | ### System Prompt |
| | |
| | The model requires a specific system prompt that defines the JSON schema and extraction instructions. The prompt instructs the model to: |
| | |
| | 1. **Classify** the text as SCIENTIFIC_TEXT, PARTIAL_SCIENTIFIC_TEXT, or NON_SCIENTIFIC_TEXT |
| | 2. **Extract** structured information including: |
| | - Title, authors, publication year |
| | - Research context and hypotheses |
| | - Methodological details |
| | - Key results with quantitative data |
| | - Claims with supporting evidence |
| | - Limitations and ethical considerations |
| |
|
| | The full system prompt is available in the model repository's `prompt.txt` file. |
| |
|
| | ### Output Format |
| |
|
| | The model outputs a single valid JSON object with this structure: |
| | ```json |
| | { |
| | "article_classification": "SCIENTIFIC_TEXT", |
| | "reason": null, |
| | "summary": { |
| | "title": "", |
| | "authors": "", |
| | "publication_year": null, |
| | "field_subfield": "", |
| | "executive_summary": "", |
| | "research_context": "", |
| | "methodological_details": "", |
| | "key_results": "", |
| | "claims": [...], |
| | "contradictions_and_limitations": "", |
| | ... |
| | } |
| | } |
| | ``` |
| |
|
| | ## Performance |
| |
|
| | ### LLM-as-a-Judge Evaluation |
| | - **Score**: 4.207/5.0 |
| | - **Comparison**: Within 15% of GPT-5 (4.805/5.0) |
| |
|
| | ### QA Dataset Evaluation |
| | - **Accuracy**: 73.9% |
| | - **Comparison**: Ties with Gemini 2.5 Flash, nearly matches GPT-5 (74.6%) |
| |
|
| | ### Throughput |
| | - **Requests/sec**: 0.43 |
| | - **Input Tokens/sec**: 7,516.54 |
| | - **Output Tokens/sec**: 2,588.30 |
| |
|
| | ## Training Details |
| |
|
| | - **Training Set**: 100,000 papers |
| | - **Validation Set**: 10,000 papers |
| | - **Average Paper Length**: 81,334 characters |
| | - **Training Approach**: Post-training on summaries generated by frontier models (GPT-5, Claude 4.5 Sonnet, Gemini 2.5 Pro) |
| |
|
| | ## Limitations |
| |
|
| | - May generate subtle factual errors (hallucinations) for fine-grained details |
| | - Context limit (131K tokens) may truncate extremely long documents |
| | - Unified schema may not capture all domain-specific nuances |
| | - Summaries are research aids, not replacements for primary sources in high-stakes scenarios |
| |
|
| |
|
| | ## Related Resources |
| |
|
| | - **Paper Visualization Website**: https://laion.inference.net |
| | - **Visualization Repository**: https://github.com/context-labs/laion-data-explorer |
| | - **Alexandria Paper**: https://arxiv.org/abs/2502.19413 |
| | - **Nemotron Variant**: inference-net/Paper-Summarizer-Nemotron-12B |
| |
|
| | ## License |
| |
|
| | [License information to be added] |
| |
|
| | ## Acknowledgments |
| |
|
| | This work was made possible through collaboration with: |
| | - LAION |
| | - Wynd Labs |
| | - Inference.net |
| | - Contributors to bethgelab, PeS2o, Common Pile, and OpenAlex |