Text Generation
Transformers
PyTorch
Safetensors
chat
nuojohnchen commited on
Commit
0213538
·
verified ·
1 Parent(s): e5d8ba3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +153 -0
README.md CHANGED
@@ -23,6 +23,159 @@ tags:
23
  library_name: transformers
24
  ---
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ```
27
  @misc{chen2025xtragpt,
28
  title={XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision},
 
23
  library_name: transformers
24
  ---
25
 
26
+ # XtraGPT: Context-Aware and Controllable Academic Paper Revision for Human-AI Collaboration
27
+
28
+ <p align="center">
29
+ <a href="https://arxiv.org/abs/2505.11336">
30
+ <img alt="arXiv" src="https://img.shields.io/badge/arXiv-2505.11336-b31b1b.svg">
31
+ </a>
32
+ </p>
33
+
34
+ ## Model Overview
35
+
36
+ **XtraGPT** is a family of open-source Large Language Models (LLMs) designed specifically for **human-AI collaborative academic paper revision**. Unlike general-purpose models that often perform surface-level polishing, XtraGPT is fine-tuned to **understand the full context** of a research paper and execute specific, **criteria-guided** revision instructions.
37
+
38
+ The models were trained on a dataset of 140,000 high-quality instruction-revision pairs derived from top-tier conference papers (ICLR).
39
+
40
+ **Key Features:**
41
+ * **Context-Aware:** Processes the full paper context to ensure revisions maintain consistency with the global narrative.
42
+ * **Controllable:** Follows specific user instructions aligned with 20 academic writing criteria across 6 sections (Abstract, Introduction, etc.).
43
+ * **Iterative Workflow:** Designed to support the "Human-AI Collaborative" (HAC) lifecycle where authors retain creative control.
44
+
45
+ **Available Model Sizes:**
46
+ * **1.5B** (Based on Qwen/Qwen2.5-1.5B-Instruct)
47
+ * **3B** (Based on meta-llama/Llama-3.2-3B-Instruct)
48
+ * **7B** (Based on Qwen/Qwen2.5-7B-Instruct)
49
+ * **14B** (Based on microsoft/phi-4)
50
+
51
+ ---
52
+
53
+ ## Inference with Transformers
54
+
55
+ To use XtraGPT with the standard Hugging Face `transformers` library, ensure you format your input using the specific tags `<PAPER_CONTENT>`, `<SELECTED_CONTENT>`, and `<QUESTION>`.
56
+
57
+ ```python
58
+ import torch
59
+ from transformers import AutoModelForCausalLM, AutoTokenizer
60
+
61
+ # Select the model size: "XtraGPT-1.5B", "XtraGPT-3B", "XtraGPT-7B", or "XtraGPT-14B"
62
+ model_name = "Xtra-Computing/XtraGPT-7B"
63
+
64
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
65
+ model = AutoModelForCausalLM.from_pretrained(
66
+ model_name,
67
+ torch_dtype=torch.float16,
68
+ device_map="auto"
69
+ )
70
+
71
+ # Define the Prompt Template tailored for XtraGPT
72
+ prompt_template = """Act as an expert model for improving articles **PAPER_CONTENT**.
73
+ The output needs to answer the **QUESTION** on **SELECTED_CONTENT** in the input. Avoid adding unnecessary length, unrelated details, overclaims, or vague statements.
74
+ Focus on clear, concise, and evidence-based improvements that align with the overall context of the paper.
75
+ <PAPER_CONTENT>
76
+ {paper_content}
77
+ </PAPER_CONTENT>
78
+ <SELECTED_CONTENT>
79
+ {selected_content}
80
+ </SELECTED_CONTENT>
81
+ <QUESTION>
82
+ {user_question}
83
+ </QUESTION>"""
84
+
85
+ # Example Data (from the "Attention Is All You Need" paper)
86
+ paper_content = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train."
87
+ selected_content = "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
88
+ user_question = "help me make it more concise."
89
+
90
+ # Format the input
91
+ formatted_prompt = prompt_template.format(
92
+ paper_content=paper_content,
93
+ selected_content=selected_content,
94
+ user_question=user_question
95
+ )
96
+
97
+ messages = [
98
+ {"role": "user", "content": formatted_prompt}
99
+ ]
100
+
101
+ # Apply chat template
102
+ text = tokenizer.apply_chat_template(
103
+ messages,
104
+ tokenize=False,
105
+ add_generation_prompt=True
106
+ )
107
+
108
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
109
+
110
+ # Generate
111
+ generated_ids = model.generate(
112
+ **model_inputs,
113
+ max_new_tokens=16384,
114
+ temperature=0.1
115
+ )
116
+
117
+ generated_ids = [
118
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
119
+ ]
120
+
121
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
122
+ print(response)
123
+ ```
124
+
125
+ -----
126
+
127
+ ## Inference with vLLM
128
+
129
+ XtraGPT is compatible with vLLM for high-throughput inference.
130
+
131
+ ### 1\. Launch the Server
132
+
133
+ Replace `XtraGPT-14B` with your specific model variant.
134
+
135
+ ```bash
136
+ python -m vllm.entrypoints.openai.api_server \
137
+ --port 8088 \
138
+ --model Xtra-Computing/XtraGPT-14B \
139
+ --served-model-name xtragpt \
140
+ --max-model-len 16384 \
141
+ --gpu-memory-utilization 0.95
142
+ ```
143
+
144
+ ### 2\. Send a Request (Client Side)
145
+
146
+ ```bash
147
+ curl [http://127.0.0.1:8088/v1/chat/completions](http://127.0.0.1:8088/v1/chat/completions) \
148
+ -H "Content-Type: application/json" \
149
+ -d '{
150
+ "model": "xtragpt",
151
+ "messages": [
152
+ {
153
+ "role": "user",
154
+ "content": "Please improve the selected content based on the following. Act as an expert model for improving articles **PAPER_CONTENT**.\nThe output needs to answer the **QUESTION** on **SELECTED_CONTENT** in the input. Avoid adding unnecessary length, unrelated details, overclaims, or vague statements.\nFocus on clear, concise, and evidence-based improvements that align with the overall context of the paper.\n<PAPER_CONTENT>\nThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.\n</PAPER_CONTENT>\n<SELECTED_CONTENT>\nThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration.\n</SELECTED_CONTENT>\n<QUESTION>\nhelp me make it more concise.\n</QUESTION>"
155
+ }
156
+ ],
157
+ "temperature": 0.1,
158
+ "max_new_tokens": 16384,
159
+ "stream": false
160
+ }'
161
+ ```
162
+
163
+ -----
164
+
165
+ ## Model License
166
+
167
+ This model is released under the **ModelGo Zero License 2.0 (MG0-2.0)**.
168
+
169
+ MG0-2.0 is a highly permissive open model license designed to facilitate the widest possible adoption and collaboration. It allows for **unrestricted use**, reproduction, distribution, and the creation of derivative works including for commercial purposes, without requiring attribution or imposing copyleft restrictions.
170
+
171
+ For more details on the license terms, please visit [ModelGo.li](https://www.modelgo.li/) or refer to the `LICENSE` file in the repository.
172
+
173
+ -----
174
+
175
+ ## Citation
176
+
177
+ If you use XtraGPT in your research, please cite our paper:
178
+
179
  ```
180
  @misc{chen2025xtragpt,
181
  title={XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision},