PaperAudit Qwen3 8B (SFT + RL)

Model Overview

PaperAudit_Qwen3_8B_sft_rl is a medium-scale model specifically trained for academic paper error detection and automated review tasks. This model is based on Qwen3 8B and has been optimized through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).

Model Information

  • Base Model: Qwen3 8B
  • Model Parameters: ~8 billion parameters
  • Training Method: Supervised Fine-Tuning (SFT) + Reinforcement Learning (RLHF)
  • Model Architecture: Qwen3ForCausalLM
  • Context Length: 40,960 tokens
  • Data Type: bfloat16

Model Features

  • Balanced Performance: 8B parameter scale, achieving a good balance between performance and efficiency
  • Specialized Optimization: Specifically optimized for academic paper error detection and review tasks
  • Reinforcement Learning: Aligned with human preferences through RLHF to improve review quality and error detection accuracy
  • Long Context Support: Supports 40K tokens context length, suitable for processing complete academic papers

Training Data

This model is trained on PaperAudit_Dataset. The dataset includes:

  • Academic papers downloaded from OpenReview
  • Structured paper content (processed via LlamaParse and LLM)
  • Synthetic error data for training error detection models
  • Human review feedback data

For more details about the dataset, please visit: https://huggingface.co/datasets/mayiwen/PaperAudit_Dataset

Usage

Install Dependencies

pip install transformers torch accelerate

Load Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "./qwen3_8b_sft_rl"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Inference Example

# Prepare input (paper error detection task)
prompt = """Please detect errors in the following academic paper paragraph:

[Paper content...]

Please identify errors and provide correction suggestions."""

# Encode input
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )

# Decode output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Application Scenarios

  • Academic paper error detection
  • Automated paper review
  • Academic writing quality assessment
  • Paper content analysis and feedback generation
  • Academic review assistant tools

Model Architecture Details

  • Hidden Size: 4096
  • Intermediate Size: 12288
  • Number of Attention Heads: 32
  • Number of Key-Value Heads: 8 (Grouped Query Attention)
  • Number of Hidden Layers: 36
  • Vocabulary Size: 151,936

Performance Advantages

Compared to the 3B model, the 8B model performs better in handling complex paper analysis tasks, with the ability to:

  • More accurately identify subtle academic errors
  • Provide more detailed and professional review comments
  • Better understand academic writing norms and standards

Notes

  • This model is specifically optimized for academic paper review tasks and may require further fine-tuning for other domains
  • It is recommended to use bfloat16 precision to save memory and improve inference speed
  • For long document processing, appropriate context window management strategies are recommended
  • Requires at least 16GB GPU memory for inference

Related Resources

  • Training Dataset: PaperAudit_Dataset
  • PaperAudit Project: For more details, please refer to the PaperAudit project documentation

License

Please refer to the license terms of the base model Qwen3.

Downloads last month
11
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mayiwen/PaperAudit_Qwen3_8B_sft_rl

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(841)
this model

Dataset used to train mayiwen/PaperAudit_Qwen3_8B_sft_rl