--- library_name: transformers datasets: - DataSeer/si-summarization-votes-r1-081725 base_model: Qwen/Qwen3-32B tags: - lora - supervised-fine-tuning - summarization - qwen3 --- # Qwen3-32B Summarization LoRA Adapter A LoRA (Low-Rank Adaptation) fine-tuned adapter for the Qwen3-32B model, specifically trained for summarizing supplemental information for articles. We used multi-turn reinforcement learning based on the rollouts in the DataSeer summarization votes dataset (human preference data). ## Model Details ### Model Description This adapter fine-tunes the Qwen3-32B base model for improved summarization capabilities using LoRA technique. - **Developed by:** DataSeer - **Model type:** Causal Language Model (LoRA Adapter) - **Language:** English - **Base model:** Qwen/Qwen3-32B - **Training approach:** Multi-turn RL with LoRA - **Dataset:** DataSeer/si-summarization-votes-r1-081725 ### Model Architecture - **Base Model:** Qwen3-32B (32.8B parameters) - **LoRA Configuration:** - Rank (r): 8 - Alpha: 32 - Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj - Dropout: 0 - **Precision:** bfloat16 ## Training Details ### Training Data The model was trained on the `DataSeer/si-summarization-votes-r1-081725` dataset, which contains summarization rollouts with annotator votes. The dataset was filtered to include only positively-voted examples (label=True). ### Training Configuration - **Training epochs:** 3 - **Learning rate:** 1e-5 (0.00001) - **Batch size:** 1 per device - **Gradient accumulation steps:** 8 - **Effective batch size:** 8 - **Learning rate scheduler:** Cosine - **Optimizer:** AdamW (torch fused) - **Precision:** bfloat16 - **Gradient checkpointing:** Enabled - **Max sequence length:** 18,893 tokens ### Training Results - **Final training loss:** 0.5931 - **Mean token accuracy:** 84.41% - **Total training steps:** 93 - **Training runtime:** 56.6 minutes (3,398 seconds) - **Training samples per second:** 0.216 - **Final learning rate:** 4.56e-8 ### Hardware & Performance - **Hardware:** 8x NVIDIA H100 80GB HBM3 - **Training time:** ~57 minutes - **Memory optimization:** Gradient checkpointing, bfloat16 precision ## Usage ### Loading the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel # Load base model and tokenizer base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen3-32B", torch_dtype=torch.bfloat16, device_map="auto" ) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-32B") # Load LoRA adapter model = PeftModel.from_pretrained(base_model, "path/to/adapter") ``` ### Environmental Impact Training was conducted on high-performance H100 GPUs for approximately 57 minutes, representing a relatively efficient fine-tuning process thanks to the LoRA approach which only trains ~0.1% of the total model parameters.