cometadata
/

affiliation-parsing-lora-Qwen3-4B

Transformers

Safetensors

Model card Files Files and versions

xet

Community

parthsarin commited on Sep 8, 2025

Commit

c635454

verified ·

1 Parent(s): d947b15

Update README.md

Browse files

Files changed (1) hide show

README.md +59 -172

README.md CHANGED Viewed

@@ -1,199 +1,86 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
 ## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+license: apache-2.0
+base_model:
+- Qwen/Qwen3-4B
 ---
+# Affiliation Parsing LoRA
+This model is a fine-tuned version of [Qwen/Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B) trained using Group Relative Policy Optimization (GRPO) for parsing and extracting author affiliations from academic paper content.
+## Model Description
+- **Base Model**: Qwen3-4B (4.0B parameters)
+- **Training Method**: Group Relative Policy Optimization (GRPO) with LoRA
+- **Task**: Author affiliation extraction and parsing from academic paper content
+- **Training Data**: arXiv author affiliations dataset with PDF content and corresponding author/affiliation annotations
 ## Training Details
+### Training Configuration
+- **Training Algorithm**: GRPO Done Right (`dr_grpo`)
+- **Learning Rate**: 1e-5 with cosine scheduler and 3% warmup ratio
+- **Training Epochs**: 0.36 epochs completed
+- **Batch Size**: 1 per device, 8 gradient accumulation steps
+- **LoRA Configuration**:
+  - Rank (r): 8
+  - Alpha: 16
+  - Dropout: 0.01
+  - Target modules: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
+### Training Metrics
+- **Total Training Steps**: 890
+- **Total Tokens Processed**: 62,074,442
+- **Final Training Loss**: 0.075
+- **Answer Reward**: 2.21 ± 0.65
+- **Format Reward**: 0.925 ± 0.16
+### Hardware
+- **GPUs**: 8x NVIDIA H100 80GB HBM3
+- **Training Time**: ~23.9 hours (86,125 seconds)
+- **Precision**: bfloat16
+## Reward Functions
+The model was trained with two reward functions:
+1. **Format Reward**: Evaluates whether the generated output follows the expected structured format for author and affiliation data (standardized 0-1 scale)
+2. **Answer Reward**: Assesses the accuracy of extracted author names and affiliations compared to ground truth annotations
+## Usage
+The model processes academic paper content (up to ~6,000 tokens) and extracts structured author and affiliation information. It uses a system prompt that guides the model to parse author details from PDF content.
+### Expected Input Format
+The model expects PDF content from academic papers as input, truncated to approximately 6,000 tokens for processing efficiency.
+### Training Data Processing
+- **Max Prompt Length**: 7,000 tokens
+- **Max Completion Length**: 2,000 tokens
+- **Input Truncation**: PDF content truncated to 6,000 tokens during preprocessing
+## Performance
+The model achieved strong performance on formatting compliance:
+- **Format compliance**: 92.5% of outputs follow the correct structured format
+- **Content extraction**: Competitive performance on author and affiliation extraction tasks
+- **Consistent output**: Low variance in format reward indicates reliable structured output generation
+## Training Infrastructure
+- **Cluster**: SLURM-managed HPC environment
+- **Node**: Single node with 8 H100 GPUs
+- **Memory**: 2.1TB total system memory
+- **CUDA Version**: 12.8
+## Limitations
+- Trained specifically on academic paper content for affiliation extraction
+- Input limited to ~6,000 tokens due to truncation during training
+- Performance may vary on paper formats significantly different from arXiv content
+- Reward metrics are not standardized between 0 and 1 (except format reward), making absolute performance assessment challenging
+## Model Output
+The model generates structured author and affiliation data extracted from academic paper content, following the format patterns learned during GRPO training with the specified reward functions.