flyingmaverick commited on
Commit
8bc6c43
Β·
1 Parent(s): f95e25b

fix: version bump to 0.4.0, add citation_verification task

Browse files
Files changed (4) hide show
  1. README.md +47 -75
  2. __init__.py +1 -1
  3. pyproject.toml +1 -1
  4. server/app.py +2 -2
README.md CHANGED
@@ -1,13 +1,3 @@
1
- ---
2
- title: ScholarEnv
3
- emoji: πŸ”¬
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: docker
7
- pinned: false
8
- license: apache-2.0
9
- ---
10
-
11
  <div align="center">
12
 
13
  # πŸ”¬ ScholarEnv
@@ -19,10 +9,11 @@ license: apache-2.0
19
  [![License](https://img.shields.io/badge/License-Apache_2.0-orange?style=flat-square)](LICENSE)
20
  [![Tasks](https://img.shields.io/badge/Tasks-4-purple?style=flat-square)](#four-tasks)
21
  [![Tests](https://img.shields.io/badge/Tests-63%2F63-success?style=flat-square)](#testing)
 
22
 
23
  **An AI agent that investigates papers β€” not one that produces them.**
24
 
25
- [API Reference](#api-reference) Β· [Quick Start](#quick-start) Β· [Research](#research-foundation)
26
 
27
  ---
28
 
@@ -40,8 +31,8 @@ license: apache-2.0
40
 
41
  The key insight: **LLMs are already good at formatting. They fail at auditing.**
42
 
43
- Ask GPT-4o to format a manuscript β†’ scores ~0.92 with no training.
44
- Ask GPT-4o to find numerical claim mismatches in a paper β†’ scores **0.20–0.45**.
45
 
46
  That gap is exactly where RL adds value. The agent must discover a document traversal strategy β€” which sections to read first, which tables to cross-reference β€” that **varies by paper structure and cannot be reduced to a fixed prompt**. RL finds this strategy. Prompting cannot.
47
 
@@ -56,9 +47,9 @@ Formatting β†’ Consistency β†’ Claim Audit β†’ Citation Check
56
 
57
  | Task | What the agent does | Frontier baseline | RL target |
58
  |------|-------------------|-------------------|-----------|
59
- | `formatting_compliance` | Fix IEEE formatting violations | 0.80–0.95 | 0.95+ |
60
- | `internal_consistency` | Find where paper contradicts itself | 0.40–0.65 | 0.65–0.80 |
61
- | `claim_evidence_audit` | Find where text claims β‰  table values | **0.20–0.45** | **0.55–0.75** |
62
  | `citation_verification` | Identify ghost and misattributed references | 0.35–0.60 | 0.65–0.80 |
63
 
64
  Task 3's low baseline is the core RL contribution β€” it proves genuine training headroom exists.
@@ -68,7 +59,6 @@ Task 3's low baseline is the core RL contribution β€” it proves genuine training
68
  ## Reward Design
69
 
70
  ### Task 1 β€” Progressive Reward Shaping (PRS)
71
-
72
  Three stages unlock sequentially. Stage N only contributes when Stage N-1 β‰₯ threshold. Prevents GRPO gradient collapse.
73
 
74
  ```
@@ -77,41 +67,27 @@ Stage 2 β”‚ weight 0.35 β”‚ threshold 0.60 β”‚ Section order, word limits, capti
77
  Stage 3 β”‚ weight 0.25 β”‚ threshold 0.70 β”‚ IEEE citations, author block, keywords
78
  ```
79
 
80
- > Based on: [arXiv 2512.07478](https://arxiv.org/abs/2512.07478) β€” PRS for Agentic RL
81
-
82
  ### Tasks 2 & 3 β€” F-beta + Potential-Based Reward Shaping
83
-
84
  **F-beta (Ξ²=0.5)** weights precision 4Γ— over recall β€” prevents hallucination gaming:
85
-
86
  ```
87
- F_Ξ²(precision=1.0, recall=0.5) = 0.833 βœ“ correct and precise
88
- F_Ξ²(precision=0.2, recall=1.0) = 0.227 βœ— spamming guesses
89
  ```
90
 
91
- **PBRS** (Ng et al., ICML 1999) gives dense intermediate rewards on every navigation step:
92
-
93
  ```
94
  Ξ¦(s) = 0.30 Γ— sections_read/total + 0.30 Γ— tables_checked/total + 0.40 Γ— claims_extracted/est
95
- F(s,s') = Ξ³Β·Ξ¦(s') βˆ’ Ξ¦(s) ← policy-invariant, theoretically guaranteed
96
  ```
97
 
98
  ### Curriculum β€” AdaRFT + UCB1
99
-
100
- Keeps agent in productive zone (avg score 0.40–0.70). UCB1 maximises **learning gradient** (reward variance), not mean reward.
101
-
102
- ```
103
- avg > 0.70 β†’ select harder papers
104
- avg < 0.40 β†’ select easier papers
105
- ```
106
-
107
- > Based on: [arXiv 2504.05520](https://arxiv.org/abs/2504.05520) β€” AdaRFT Adaptive Data Selection
108
 
109
  ---
110
 
111
  ## Quick Start
112
 
113
  ### Install
114
-
115
  ```bash
116
  git clone https://github.com/Nensi1311/research-paper-formatter-agent
117
  cd research-paper-formatter-agent
@@ -119,27 +95,25 @@ pip install -r requirements.txt
119
  ```
120
 
121
  ### Generate corpus
122
-
123
  ```bash
124
  python scripts/generate_corpus.py
125
  ```
126
 
127
  ### Run tests
128
-
129
  ```bash
130
  python tests/test_all.py
131
  # β†’ ALL TESTS PASSED (63/63)
132
  ```
133
 
134
  ### Start server
135
-
136
  ```bash
137
  uvicorn server.app:app --host 0.0.0.0 --port 7860
138
  ```
139
 
140
- ### Test all 4 tasks β€” Linux/macOS
141
-
142
  ```bash
 
 
143
  for task in formatting_compliance internal_consistency claim_evidence_audit citation_verification; do
144
  curl -s -X POST localhost:7860/reset \
145
  -H "Content-Type: application/json" \
@@ -148,9 +122,10 @@ for task in formatting_compliance internal_consistency claim_evidence_audit cita
148
  done
149
  ```
150
 
151
- ### Test all 4 tasks β€” Windows PowerShell
152
-
153
  ```powershell
 
 
154
  foreach ($task in @("formatting_compliance","internal_consistency","claim_evidence_audit","citation_verification")) {
155
  $body = '{"task_id":"' + $task + '"}'
156
  $r = Invoke-RestMethod -Uri "http://localhost:7860/reset" -Method POST -ContentType "application/json" -Body $body
@@ -159,7 +134,6 @@ foreach ($task in @("formatting_compliance","internal_consistency","claim_eviden
159
  ```
160
 
161
  ### Docker
162
-
163
  ```bash
164
  docker build -t scholar-env .
165
  docker run -p 7860:7860 scholar-env
@@ -167,12 +141,11 @@ curl http://localhost:7860/health
167
  ```
168
 
169
  ### Run baseline agent
170
-
171
  ```bash
172
  export API_BASE_URL="https://api-inference.huggingface.co/v1"
173
  export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
174
  export HF_TOKEN="hf_your_token"
175
- export HF_SPACE_URL="https://flyingmaverick-scholar-env.hf.space"
176
 
177
  python inference.py
178
  # Writes: baseline_scores.json
@@ -183,16 +156,14 @@ python inference.py
183
  ## API Reference
184
 
185
  ### `POST /reset`
186
-
187
  ```json
188
  {"task_id": "formatting_compliance"}
189
  ```
190
-
191
- Returns observation with `manuscript_text`, `style_guide`, `step_count`, `max_steps`, `hint`.
192
 
193
  ### `POST /step`
194
 
195
- **Task 1 β€” submit formatted manuscript:**
196
  ```json
197
  {"task": "formatting_compliance", "formatted_text": "...full reformatted manuscript..."}
198
  ```
@@ -204,7 +175,7 @@ Returns observation with `manuscript_text`, `style_guide`, `step_count`, `max_st
204
  {"task": "claim_evidence_audit", "action_type": "extract_claims", "section_name": "results"}
205
  ```
206
 
207
- **Tasks 2/3 β€” submit findings:**
208
  ```json
209
  {
210
  "task": "claim_evidence_audit",
@@ -222,12 +193,12 @@ Returns observation with `manuscript_text`, `style_guide`, `step_count`, `max_st
222
  }
223
  ```
224
 
225
- **Task 4 β€” check citation:**
226
  ```json
227
  {"task": "citation_verification", "action_type": "check_citation", "citation_id": "ref_3"}
228
  ```
229
 
230
- **Task 4 β€” submit verdicts:**
231
  ```json
232
  {
233
  "task": "citation_verification",
@@ -238,14 +209,9 @@ Returns observation with `manuscript_text`, `style_guide`, `step_count`, `max_st
238
  }
239
  ```
240
 
241
- **Step response:**
242
  ```json
243
- {
244
- "observation": {...},
245
- "reward": 0.7341,
246
- "done": false,
247
- "info": {"f_beta": 0.73, "precision": 0.8, "recall": 0.67}
248
- }
249
  ```
250
 
251
  ### Other endpoints
@@ -253,7 +219,7 @@ Returns observation with `manuscript_text`, `style_guide`, `step_count`, `max_st
253
  | Endpoint | Method | Description |
254
  |---|---|---|
255
  | `/health` | GET | `{"status":"ok","version":"0.4.0"}` |
256
- | `/state` | GET | Episode state, curriculum summary |
257
  | `/tasks` | GET | All 4 task descriptions |
258
  | `/action_space` | GET | Full action schema |
259
 
@@ -263,21 +229,23 @@ Returns observation with `manuscript_text`, `style_guide`, `step_count`, `max_st
263
 
264
  ```
265
  β”œβ”€β”€ inference.py ← Baseline agent (root β€” required by spec)
266
- β”œβ”€β”€ models.py ← FormattingAction, ScholarAction, CitationAction
 
267
  β”œβ”€β”€ corpus.py ← PaperCorpus loader
268
  β”œβ”€β”€ openenv.yaml ← 4 tasks, endpoints, authors, baseline_script
269
  β”œβ”€β”€ Dockerfile
270
  β”œβ”€β”€ requirements.txt
 
271
  β”‚
272
  β”œβ”€β”€ data/
273
  β”‚ β”œβ”€β”€ papers/
274
- β”‚ β”‚ β”œβ”€β”€ paper_001.json ← NLP benchmark (easy)
275
- β”‚ β”‚ β”œβ”€β”€ paper_002.json ← CV survey (medium)
276
- β”‚ β”‚ └── paper_003.json ← MTL paper (hard)
277
  β”‚ └── styles/ieee.yaml
278
  β”‚
279
  β”œβ”€β”€ server/
280
- β”‚ β”œβ”€β”€ app.py ← FastAPI endpoints
281
  β”‚ β”œβ”€β”€ environment.py ← 4-task state machine
282
  β”‚ β”œβ”€β”€ reward_shaper.py ← PBRS (Ng et al. 1999)
283
  β”‚ β”œβ”€β”€ curriculum.py ← AdaRFT + UCB1
@@ -285,8 +253,8 @@ Returns observation with `manuscript_text`, `style_guide`, `step_count`, `max_st
285
  β”‚ β”œβ”€β”€ citation_verifier.py ← Citation parser + SQLite cache
286
  β”‚ └── graders/
287
  β”‚ β”œβ”€β”€ formatting_grader.py ← PRS 3-stage (Task 1)
288
- β”‚ β”œβ”€β”€ consistency_grader.py← F-beta (Task 2)
289
- β”‚ └── audit_grader.py ← F-beta + PBRS (Task 3)
290
  β”‚
291
  β”œβ”€β”€ scripts/generate_corpus.py
292
  └── tests/test_all.py ← 63 assertions
@@ -298,7 +266,7 @@ Returns observation with `manuscript_text`, `style_guide`, `step_count`, `max_st
298
 
299
  ```
300
  [Corpus] 8/8 βœ“
301
- [FormattingGrader] 8/8 βœ“ PRS stage locking
302
  [ConsistencyGrader] 9/9 βœ“ F-beta, hallucination penalty
303
  [AuditGrader] 6/6 βœ“ Evidence specificity, coverage bonus
304
  [PBRS] 6/6 βœ“ Potential monotonicity, bonus bounds
@@ -318,19 +286,22 @@ Results: 63/63 passed β€” ALL TESTS PASSED
318
  | [PRS Β· arXiv 2512.07478](https://arxiv.org/abs/2512.07478) | Task 1 progressive staging prevents GRPO gradient collapse |
319
  | [PBRS Β· Ng, Harada & Russell, ICML 1999](http://www.cs.utexas.edu/~ai-lab/pubs/ICML99-shaping.pdf) | Policy-invariant dense intermediate rewards |
320
  | [AdaRFT Β· arXiv 2504.05520](https://arxiv.org/abs/2504.05520) | Curriculum targeting [0.40, 0.70] productive zone |
321
- | [RLVE Β· arXiv 2511.07317](https://arxiv.org/abs/2511.07317) | Adaptive difficulty, UCB1 maximises variance |
322
  | [Veri-R1 Β· arXiv 2510.01932](https://arxiv.org/abs/2510.01932) | Online RL for claim verification is current SOTA |
323
- | [LaMer Β· arXiv 2512.16848](https://arxiv.org/abs/2512.16848) | Structured feedback improves agent 11–19% |
324
  | [StatCheck Β· Epskamp 2016](https://link.springer.com/article/10.3758/s13428-015-0664-2) | ~50% of papers have errors β€” scale motivation |
325
  | [GROBID Β· Lopez 2008–2025](https://github.com/kermitt2/grobid) | Prior art; CitationVerifier is our RL-native alternative |
326
 
327
  ---
328
 
329
- ## Authors
330
 
331
- **Nensi Pansuriya Β· Krushna Parmar Β· Ishita Bhojani**
332
-
333
- *Meta Γ— PyTorch OpenEnv Hackathon Β· Round 1 Β· April 2026*
 
 
 
334
 
335
  ---
336
 
@@ -344,6 +315,7 @@ Results: 63/63 passed β€” ALL TESTS PASSED
344
 
345
  *The future of AI isn't just models that generate β€” it's models that verify.*
346
 
 
347
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github)](https://github.com/Nensi1311/research-paper-formatter-agent)
348
 
349
  </div>
 
 
 
 
 
 
 
 
 
 
 
1
  <div align="center">
2
 
3
  # πŸ”¬ ScholarEnv
 
9
  [![License](https://img.shields.io/badge/License-Apache_2.0-orange?style=flat-square)](LICENSE)
10
  [![Tasks](https://img.shields.io/badge/Tasks-4-purple?style=flat-square)](#four-tasks)
11
  [![Tests](https://img.shields.io/badge/Tests-63%2F63-success?style=flat-square)](#testing)
12
+ [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20Space-Live-yellow?style=flat-square)](https://huggingface.co/spaces/nensi1311/research-paper-formatter-agent)
13
 
14
  **An AI agent that investigates papers β€” not one that produces them.**
15
 
16
+ [Live Demo](https://huggingface.co/spaces/nensi1311/research-paper-formatter-agent) Β· [API Reference](#api-reference) Β· [Quick Start](#quick-start) Β· [Research](#research-foundation)
17
 
18
  ---
19
 
 
31
 
32
  The key insight: **LLMs are already good at formatting. They fail at auditing.**
33
 
34
+ Ask GPT-4o to format a manuscript β†’ scores ~0.92 with no training.
35
+ Ask GPT-4o to find all numerical claim mismatches in a paper β†’ scores **0.20–0.45**.
36
 
37
  That gap is exactly where RL adds value. The agent must discover a document traversal strategy β€” which sections to read first, which tables to cross-reference β€” that **varies by paper structure and cannot be reduced to a fixed prompt**. RL finds this strategy. Prompting cannot.
38
 
 
47
 
48
  | Task | What the agent does | Frontier baseline | RL target |
49
  |------|-------------------|-------------------|-----------|
50
+ | `formatting_compliance` | Fix IEEE formatting violations in a manuscript | 0.80–0.95 | 0.95+ |
51
+ | `internal_consistency` | Find where the paper contradicts itself | 0.40–0.65 | 0.65–0.80 |
52
+ | `claim_evidence_audit` | Find where text claims don't match table values | **0.20–0.45** | **0.55–0.75** |
53
  | `citation_verification` | Identify ghost and misattributed references | 0.35–0.60 | 0.65–0.80 |
54
 
55
  Task 3's low baseline is the core RL contribution β€” it proves genuine training headroom exists.
 
59
  ## Reward Design
60
 
61
  ### Task 1 β€” Progressive Reward Shaping (PRS)
 
62
  Three stages unlock sequentially. Stage N only contributes when Stage N-1 β‰₯ threshold. Prevents GRPO gradient collapse.
63
 
64
  ```
 
67
  Stage 3 β”‚ weight 0.25 β”‚ threshold 0.70 β”‚ IEEE citations, author block, keywords
68
  ```
69
 
 
 
70
  ### Tasks 2 & 3 β€” F-beta + Potential-Based Reward Shaping
 
71
  **F-beta (Ξ²=0.5)** weights precision 4Γ— over recall β€” prevents hallucination gaming:
 
72
  ```
73
+ F_Ξ²(P=1.0, R=0.5) = 0.833 ← correct and precise βœ“
74
+ F_Ξ²(P=0.2, R=1.0) = 0.227 ← spamming guesses βœ—
75
  ```
76
 
77
+ **PBRS** (Ng et al., ICML 1999) gives dense intermediate rewards per navigation step:
 
78
  ```
79
  Ξ¦(s) = 0.30 Γ— sections_read/total + 0.30 Γ— tables_checked/total + 0.40 Γ— claims_extracted/est
80
+ F(s,s') = Ξ³Β·Ξ¦(s') βˆ’ Ξ¦(s) ← policy-invariant, guaranteed by theory
81
  ```
82
 
83
  ### Curriculum β€” AdaRFT + UCB1
84
+ Keeps the agent in the productive zone (avg score 0.40–0.70). UCB1 maximises **learning gradient** (reward variance), not mean reward β€” a paper always scoring 0.95 teaches nothing.
 
 
 
 
 
 
 
 
85
 
86
  ---
87
 
88
  ## Quick Start
89
 
90
  ### Install
 
91
  ```bash
92
  git clone https://github.com/Nensi1311/research-paper-formatter-agent
93
  cd research-paper-formatter-agent
 
95
  ```
96
 
97
  ### Generate corpus
 
98
  ```bash
99
  python scripts/generate_corpus.py
100
  ```
101
 
102
  ### Run tests
 
103
  ```bash
104
  python tests/test_all.py
105
  # β†’ ALL TESTS PASSED (63/63)
106
  ```
107
 
108
  ### Start server
 
109
  ```bash
110
  uvicorn server.app:app --host 0.0.0.0 --port 7860
111
  ```
112
 
113
+ ### Test endpoints β€” Linux/macOS
 
114
  ```bash
115
+ curl http://localhost:7860/health
116
+
117
  for task in formatting_compliance internal_consistency claim_evidence_audit citation_verification; do
118
  curl -s -X POST localhost:7860/reset \
119
  -H "Content-Type: application/json" \
 
122
  done
123
  ```
124
 
125
+ ### Test endpoints β€” Windows PowerShell
 
126
  ```powershell
127
+ Invoke-RestMethod -Uri "http://localhost:7860/health"
128
+
129
  foreach ($task in @("formatting_compliance","internal_consistency","claim_evidence_audit","citation_verification")) {
130
  $body = '{"task_id":"' + $task + '"}'
131
  $r = Invoke-RestMethod -Uri "http://localhost:7860/reset" -Method POST -ContentType "application/json" -Body $body
 
134
  ```
135
 
136
  ### Docker
 
137
  ```bash
138
  docker build -t scholar-env .
139
  docker run -p 7860:7860 scholar-env
 
141
  ```
142
 
143
  ### Run baseline agent
 
144
  ```bash
145
  export API_BASE_URL="https://api-inference.huggingface.co/v1"
146
  export MODEL_NAME="meta-llama/Llama-3.1-8B-Instruct"
147
  export HF_TOKEN="hf_your_token"
148
+ export HF_SPACE_URL="https://nensi1311-research-paper-formatter-agent.hf.space"
149
 
150
  python inference.py
151
  # Writes: baseline_scores.json
 
156
  ## API Reference
157
 
158
  ### `POST /reset`
 
159
  ```json
160
  {"task_id": "formatting_compliance"}
161
  ```
162
+ Returns `observation` with `manuscript_text`, `style_guide`, `step_count`, `max_steps`, `hint`.
 
163
 
164
  ### `POST /step`
165
 
166
+ **Task 1:**
167
  ```json
168
  {"task": "formatting_compliance", "formatted_text": "...full reformatted manuscript..."}
169
  ```
 
175
  {"task": "claim_evidence_audit", "action_type": "extract_claims", "section_name": "results"}
176
  ```
177
 
178
+ **Tasks 2/3 β€” submit:**
179
  ```json
180
  {
181
  "task": "claim_evidence_audit",
 
193
  }
194
  ```
195
 
196
+ **Task 4 β€” navigate:**
197
  ```json
198
  {"task": "citation_verification", "action_type": "check_citation", "citation_id": "ref_3"}
199
  ```
200
 
201
+ **Task 4 β€” submit:**
202
  ```json
203
  {
204
  "task": "citation_verification",
 
209
  }
210
  ```
211
 
212
+ **Response:**
213
  ```json
214
+ {"observation": {...}, "reward": 0.7341, "done": false, "info": {"f_beta": 0.73, "precision": 0.8, "recall": 0.67}}
 
 
 
 
 
215
  ```
216
 
217
  ### Other endpoints
 
219
  | Endpoint | Method | Description |
220
  |---|---|---|
221
  | `/health` | GET | `{"status":"ok","version":"0.4.0"}` |
222
+ | `/state` | GET | Episode state, curriculum summary, nav coverage |
223
  | `/tasks` | GET | All 4 task descriptions |
224
  | `/action_space` | GET | Full action schema |
225
 
 
229
 
230
  ```
231
  β”œβ”€β”€ inference.py ← Baseline agent (root β€” required by spec)
232
+ β”œβ”€β”€ models.py ← FormattingAction, ScholarAction, CitationAction,
233
+ β”‚ ScholarObservation, AnyAction (discriminated union)
234
  β”œβ”€β”€ corpus.py ← PaperCorpus loader
235
  β”œβ”€β”€ openenv.yaml ← 4 tasks, endpoints, authors, baseline_script
236
  β”œβ”€β”€ Dockerfile
237
  β”œβ”€β”€ requirements.txt
238
+ β”œβ”€β”€ validate-submission.sh ← Official 3-step pre-submission validator
239
  β”‚
240
  β”œβ”€β”€ data/
241
  β”‚ β”œβ”€β”€ papers/
242
+ β”‚ β”‚ β”œβ”€β”€ paper_001.json ← NLP benchmark (easy) β€” 5 refs, 1 ghost
243
+ β”‚ β”‚ β”œβ”€β”€ paper_002.json ← CV survey (medium) β€” 4 refs, 1 ghost
244
+ β”‚ β”‚ └── paper_003.json ← MTL paper (hard) β€” 5 refs, 1 ghost
245
  β”‚ └── styles/ieee.yaml
246
  β”‚
247
  β”œβ”€β”€ server/
248
+ β”‚ β”œβ”€β”€ app.py ← FastAPI: /reset /step /state /health /tasks
249
  β”‚ β”œβ”€β”€ environment.py ← 4-task state machine
250
  β”‚ β”œβ”€β”€ reward_shaper.py ← PBRS (Ng et al. 1999)
251
  β”‚ β”œβ”€β”€ curriculum.py ← AdaRFT + UCB1
 
253
  β”‚ β”œβ”€β”€ citation_verifier.py ← Citation parser + SQLite cache
254
  β”‚ └── graders/
255
  β”‚ β”œβ”€β”€ formatting_grader.py ← PRS 3-stage (Task 1)
256
+ β”‚ β”œβ”€β”€ consistency_grader.py← F-beta fuzzy-match (Task 2)
257
+ β”‚ └── audit_grader.py ← F-beta + PBRS coverage (Task 3)
258
  β”‚
259
  β”œβ”€β”€ scripts/generate_corpus.py
260
  └── tests/test_all.py ← 63 assertions
 
266
 
267
  ```
268
  [Corpus] 8/8 βœ“
269
+ [FormattingGrader] 8/8 βœ“ PRS stage locking verified
270
  [ConsistencyGrader] 9/9 βœ“ F-beta, hallucination penalty
271
  [AuditGrader] 6/6 βœ“ Evidence specificity, coverage bonus
272
  [PBRS] 6/6 βœ“ Potential monotonicity, bonus bounds
 
286
  | [PRS Β· arXiv 2512.07478](https://arxiv.org/abs/2512.07478) | Task 1 progressive staging prevents GRPO gradient collapse |
287
  | [PBRS Β· Ng, Harada & Russell, ICML 1999](http://www.cs.utexas.edu/~ai-lab/pubs/ICML99-shaping.pdf) | Policy-invariant dense intermediate rewards |
288
  | [AdaRFT Β· arXiv 2504.05520](https://arxiv.org/abs/2504.05520) | Curriculum targeting [0.40, 0.70] productive zone |
289
+ | [RLVE Β· arXiv 2511.07317](https://arxiv.org/abs/2511.07317) | Adaptive difficulty β€” why UCB1 maximises variance |
290
  | [Veri-R1 Β· arXiv 2510.01932](https://arxiv.org/abs/2510.01932) | Online RL for claim verification is current SOTA |
291
+ | [LaMer Β· arXiv 2512.16848](https://arxiv.org/abs/2512.16848) | Structured feedback fields improve agent 11–19% |
292
  | [StatCheck Β· Epskamp 2016](https://link.springer.com/article/10.3758/s13428-015-0664-2) | ~50% of papers have errors β€” scale motivation |
293
  | [GROBID Β· Lopez 2008–2025](https://github.com/kermitt2/grobid) | Prior art; CitationVerifier is our RL-native alternative |
294
 
295
  ---
296
 
297
+ ## Baseline Scores
298
 
299
+ | Task | Score | Notes |
300
+ |---|---|---|
301
+ | `formatting_compliance` | ~0.82 | Strong baseline, room to perfect |
302
+ | `internal_consistency` | ~0.51 | F-beta precision-biased |
303
+ | `claim_evidence_audit` | ~0.31 | **Core RL gap β€” biggest training value** |
304
+ | `citation_verification` | ~0.47 | Ghost detection improving with SQLite cache |
305
 
306
  ---
307
 
 
315
 
316
  *The future of AI isn't just models that generate β€” it's models that verify.*
317
 
318
+ [![Live Demo](https://img.shields.io/badge/%F0%9F%A4%97%20Live%20Demo-HuggingFace-blue?style=for-the-badge)](https://huggingface.co/spaces/nensi1311/research-paper-formatter-agent)
319
  [![GitHub](https://img.shields.io/badge/GitHub-Repository-black?style=for-the-badge&logo=github)](https://github.com/Nensi1311/research-paper-formatter-agent)
320
 
321
  </div>
__init__.py CHANGED
@@ -4,7 +4,7 @@ ScholarEnv β€” OpenEnv environment for scholarly integrity verification.
4
  from .models import FormattingAction, ScholarAction, ScholarObservation, EpisodeStatus
5
  from .corpus import PaperCorpus, Paper
6
 
7
- __version__ = "0.3.0"
8
  __all__ = [
9
  "FormattingAction",
10
  "ScholarAction",
 
4
  from .models import FormattingAction, ScholarAction, ScholarObservation, EpisodeStatus
5
  from .corpus import PaperCorpus, Paper
6
 
7
+ __version__ = "0.4.0"
8
  __all__ = [
9
  "FormattingAction",
10
  "ScholarAction",
pyproject.toml CHANGED
@@ -4,7 +4,7 @@ build-backend = "setuptools.backends.legacy:build"
4
 
5
  [project]
6
  name = "scholar-env"
7
- version = "0.3.0"
8
  description = "OpenEnv environment for scholarly integrity verification"
9
  readme = "README.md"
10
  license = {text = "Apache-2.0"}
 
4
 
5
  [project]
6
  name = "scholar-env"
7
+ version = "0.4.0"
8
  description = "OpenEnv environment for scholarly integrity verification"
9
  readme = "README.md"
10
  license = {text = "Apache-2.0"}
server/app.py CHANGED
@@ -41,7 +41,7 @@ app = FastAPI(
41
  "Three tasks: formatting compliance, internal consistency, "
42
  "claim-evidence audit."
43
  ),
44
- version="0.3.0",
45
  )
46
 
47
  app.add_middleware(
@@ -72,7 +72,7 @@ async def health() -> dict:
72
  env = get_env()
73
  return {
74
  "status": "ok",
75
- "version": "0.3.0",
76
  "corpus_size": len(env.corpus),
77
  "tasks": list(TASK_CONFIG.keys()),
78
  }
 
41
  "Three tasks: formatting compliance, internal consistency, "
42
  "claim-evidence audit."
43
  ),
44
+ version="0.4.0",
45
  )
46
 
47
  app.add_middleware(
 
72
  env = get_env()
73
  return {
74
  "status": "ok",
75
+ "version": "0.4.0",
76
  "corpus_size": len(env.corpus),
77
  "tasks": list(TASK_CONFIG.keys()),
78
  }