PaperAudit Llama3.2 3B (SFT + RL)

Model Overview

PaperAudit_Llama3.2_3B_sft_rl is a lightweight model specifically trained for academic paper error detection and automated review tasks. This model is based on Llama 3.2 3B Instruct and has been optimized through Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF).

Model Information

  • Base Model: Llama 3.2 3B Instruct
  • Model Parameters: ~3 billion parameters
  • Training Method: Supervised Fine-Tuning (SFT) + Reinforcement Learning (RLHF)
  • Model Architecture: LlamaForCausalLM
  • Context Length: 131,072 tokens
  • Data Type: bfloat16

Model Features

  • Lightweight and Efficient: 3B parameter scale, suitable for resource-constrained environments
  • Specialized Optimization: Specifically optimized for academic paper error detection and review tasks
  • Reinforcement Learning: Aligned with human preferences through RLHF to improve review quality and error detection accuracy
  • Long Context: Supports ultra-long context (131K tokens), suitable for processing complete academic papers

Training Data

This model is trained on PaperAudit_Dataset. The dataset includes:

  • Academic papers downloaded from OpenReview
  • Structured paper content (processed via LlamaParse and LLM)
  • Synthetic error data for training error detection models
  • Human review feedback data

For more details about the dataset, please visit: https://huggingface.co/datasets/mayiwen/PaperAudit_Dataset

Usage

Install Dependencies

pip install transformers torch accelerate

Load Model

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_path = "./llama3.2_3b_sft_rl"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

Inference Example

# Prepare input (paper error detection task)
prompt = """Please detect errors in the following academic paper paragraph:

[Paper content...]

Please identify errors and provide correction suggestions."""

# Encode input
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.7,
        do_sample=True,
        pad_token_id=tokenizer.pad_token_id
    )

# Decode output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Application Scenarios

  • Academic paper error detection
  • Automated paper review
  • Academic writing quality assessment
  • Paper content analysis and feedback generation

Model Architecture Details

  • Hidden Size: 3072
  • Intermediate Size: 8192
  • Number of Attention Heads: 24
  • Number of Key-Value Heads: 8 (Grouped Query Attention)
  • Number of Hidden Layers: 28
  • Vocabulary Size: 128,256

Notes

  • This model is specifically optimized for academic paper review tasks and may require further fine-tuning for other domains
  • It is recommended to use bfloat16 precision to save memory and improve inference speed
  • For long document processing, appropriate context window management strategies are recommended

Related Resources

  • Training Dataset: PaperAudit_Dataset
  • PaperAudit Project: For more details, please refer to the PaperAudit project documentation

License

Please refer to the license terms of the base model Llama 3.2.

Downloads last month
9
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mayiwen/PaperAudit_Llama3.2_3B_sft_rl

Finetuned
(898)
this model

Dataset used to train mayiwen/PaperAudit_Llama3.2_3B_sft_rl