---
license: mit
base_model: gpt2-large
tags:
- natural-language-inference
- lora
- peft
- gpt2
- multinli
- text-classification
datasets:
- nyu-mll/multi_nli
language:
- en
pipeline_tag: text-classification
---

# GPT2-Large LoRA Fine-tuned for Natural Language Inference

This model is a LoRA (Low-Rank Adaptation) fine-tuned version of GPT2-large for Natural Language Inference (NLI) on the MultiNLI dataset.

## Model Details

- **Base Model**: GPT2-large (774M parameters)
- **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
- **Trainable Parameters**: ~2.3M (0.3% of total parameters)
- **Dataset**: MultiNLI (50K training samples)
- **Task**: Natural Language Inference (3-class classification)

## Performance

- **Test Accuracy (Matched)**: ~79.22%
- **Test Accuracy (Mismatched)**: ~80.38%
- **Training Method**: Parameter-efficient fine-tuning with LoRA
- **Hardware**: Trained on 36G vGPU

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

# Load tokenizer and base model
tokenizer = AutoTokenizer.from_pretrained("gpt2-large")
base_model = AutoModelForCausalLM.from_pretrained("gpt2-large")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "hilaryc112/LoRA-GPT2-Project")

# Format input
premise = "A person is outdoors, on a horse."
hypothesis = "A person is at a diner, ordering an omelette."
input_text = f"Premise: {premise}\nHypothesis: {hypothesis}\nRelationship:"

# Tokenize and generate
inputs = tokenizer(input_text, return_tensors="pt")
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=10, pad_token_id=tokenizer.eos_token_id)
    
prediction = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True).strip()
print(f"Prediction: {prediction}")  # Should output: contradiction, neutral, or entailment
```

## Training Configuration

{
  "model_name": "gpt2-large",
  "max_length": 512,
  "lora_r": 16,
  "lora_alpha": 32,
  "lora_dropout": 0.1,
  "target_modules": [
    "c_attn",
    "c_proj",
    "c_fc"
  ],
  "num_epochs": 3,
  "train_batch_size": 4,
  "eval_batch_size": 12,
  "gradient_accumulation_steps": 6,
  "learning_rate": 0.0002,
  "weight_decay": 0.01,
  "max_grad_norm": 1.0,
  "use_fp16": true,
  "gradient_checkpointing": true,
  "logging_steps": 100,
  "eval_steps": 500,
  "save_steps": 500,
  "save_total_limit": 3,
  "early_stopping_patience": 5,
  "data_dir": "./processed_data",
  "output_dir": "./gpt2_lora_multinli",
  "seed": 42,
  "use_wandb": false,
  "_comments": {
    "effective_batch_size": "6 * 6 = 36 (optimized for 36G vGPU)",
    "memory_optimization": "FP16 + gradient checkpointing enabled",
    "lora_config": "Rank 16 with alpha 32 for good performance/efficiency balance",
    "target_modules": "GPT2 attention and MLP layers for comprehensive adaptation",
    "training_data": "Uses 50K samples from MultiNLI training set (configured in preprocessing)",
    "evaluation_data": "Uses local dev files for matched/mismatched evaluation",
    "training_adjustments": "Reduced epochs to 2 and LR to 1e-4 for better training with real data",
    "eval_frequency": "Less frequent evaluation (every 500 steps) due to larger dataset"
  }
}

## Dataset Format

The model was trained on text-to-text format:
```
Premise: [premise text]
Hypothesis: [hypothesis text]
Relationship: [entailment/neutral/contradiction]
```

## Files

- `adapter_config.json`: LoRA adapter configuration
- `adapter_model.safetensors`: LoRA adapter weights
- `training_config.json`: Training hyperparameters and settings

## Citation

If you use this model, please cite:

```bibtex
@misc{gpt2-lora-multinli,
  title={GPT2-Large LoRA Fine-tuned for Natural Language Inference},
  author={HilaryKChen},
  year={2024},
  howpublished={\url{https://huggingface.co/hilaryc112/LoRA-GPT2-Project}}
}
```

## License

This model is released under the MIT License.