---
library_name: transformers
datasets:
  - DataSeer/si-summarization-votes-r1-081725
base_model: Qwen/Qwen3-32B
tags:
  - lora
  - supervised-fine-tuning
  - summarization
  - qwen3
---

# Qwen3-32B Summarization LoRA Adapter

A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data).

## Model Details

### Model Description

This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique.

- **Developed by:** DataSeer
- **Model type:** Causal Language Model (LoRA Adapter)
- **Language:** English
- **Base model:** Qwen/Qwen3-32B
- **Training approach:** Multi-turn RL with LoRA
- **Dataset:** DataSeer/si-summarization-votes-r1-081725

### Model Architecture

- **Base Model:** Qwen3-32B (32.8B parameters)
- **LoRA Configuration:**
  - Rank (r): 8
  - Alpha: 32
  - Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
  - Dropout: 0
- **Precision:** bfloat16

## Training Details

### Training Data

The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True).

### Training Configuration

- **Training epochs:** 3
- **Learning rate:** 1e-5 (0.00001)
- **Batch size:** 1 per device
- **Gradient accumulation steps:** 8
- **Effective batch size:** 8
- **Learning rate scheduler:** Cosine
- **Optimizer:** AdamW (torch fused)
- **Precision:** bfloat16
- **Gradient checkpointing:** Enabled
- **Max sequence length:** 18,893 tokens

### Training Results

- **Final training loss:** 0.5931
- **Mean token accuracy:** 84.41%
- **Total training steps:** 93
- **Training runtime:** 56.6 minutes (3,398 seconds)
- **Training samples per second:** 0.216
- **Final learning rate:** 4.56e-8

### Hardware & Performance

- **Hardware:** 8x NVIDIA H100 80GB HBM3
- **Training time:** ~57 minutes
- **Memory optimization:** Gradient checkpointing, bfloat16 precision

## Usage

### Loading the Model

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-32B",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "path/to/adapter")
```

### Environmental Impact

Training was conducted on high-performance H100 GPUs for approximately 57 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.