PaperAudit Llama3.2 3B (SFT + RL)

Model Overview

PaperAudit_Llama3.2_3B_sft_rl is a lightweight model specifically trained for academic paper error detection and automated review tasks. This model is based on Llama 3.2 3B Instruct and has been optimized through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).

Model Information

Base Model: Llama 3.2 3B Instruct
Model Parameters: ~3 billion parameters
Training Method: Supervised Fine-Tuning (SFT) + Reinforcement Learning (RLHF)
Model Architecture: LlamaForCausalLM
Context Length: 131,072 tokens
Data Type: bfloat16

Model Features

Lightweight and Efficient: 3B parameter scale, suitable for resource-constrained environments
Specialized Optimization: Specifically optimized for academic paper error detection and review tasks
Reinforcement Learning: Aligned with human preferences through RLHF to improve review quality and error detection accuracy
Long Context: Supports ultra-long context (131K tokens), suitable for processing complete academic papers

Training Data

This model is trained on PaperAudit_Dataset. The dataset includes:

Academic papers downloaded from OpenReview
Structured paper content (processed via LlamaParse and LLM)
Synthetic error data for training error detection models
Human review feedback data

For more details about the dataset, please visit: https://huggingface.co/datasets/mayiwen/PaperAudit_Dataset

Usage

Install Dependencies

pip install transformers torch accelerate

Load Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "./llama3.2_3b_sft_rl"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Inference Example

# Prepare input (paper error detection task)
prompt = """Please detect errors in the following academic paper paragraph:

[Paper content...]

Please identify errors and provide correction suggestions."""

# Encode input
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )

# Decode output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Application Scenarios

Academic paper error detection
Automated paper review
Academic writing quality assessment
Paper content analysis and feedback generation

Model Architecture Details

Hidden Size: 3072
Intermediate Size: 8192
Number of Attention Heads: 24
Number of Key-Value Heads: 8 (Grouped Query Attention)
Number of Hidden Layers: 28
Vocabulary Size: 128,256

Notes

This model is specifically optimized for academic paper review tasks and may require further fine-tuning for other domains
It is recommended to use bfloat16 precision to save memory and improve inference speed
For long document processing, appropriate context window management strategies are recommended

Related Resources

Training Dataset: PaperAudit_Dataset
PaperAudit Project: For more details, please refer to the PaperAudit project documentation

License

Please refer to the license terms of the base model Llama 3.2.

Downloads last month: 2

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mayiwen/PaperAudit_Llama3.2_3B_sft_rl

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(1598)

this model

mayiwen
/

PaperAudit_Llama3.2_3B_sft_rl