|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- question-answering |
|
|
- squad |
|
|
- gpt2 |
|
|
- fine-tuned |
|
|
license: mit |
|
|
--- |
|
|
|
|
|
# ChatMachine_v1: GPT-2 Fine-tuned on SQuAD |
|
|
|
|
|
This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model**: GPT-2 (124M parameters) |
|
|
- **Training Data**: Stanford Question Answering Dataset (SQuAD) |
|
|
- **Task**: Question Answering |
|
|
- **Framework**: PyTorch with Hugging Face Transformers |
|
|
|
|
|
## Training Details |
|
|
|
|
|
The model was fine-tuned using: |
|
|
- Mixed precision training (bfloat16) |
|
|
- Learning rate: 2e-5 |
|
|
- Batch size: 16 |
|
|
- Gradient accumulation steps: 8 |
|
|
- Warmup steps: 1000 |
|
|
- Weight decay: 0.1 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import GPT2LMHeadModel, GPT2Tokenizer |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1") |
|
|
tokenizer = GPT2Tokenizer.from_pretrained("gpt2") |
|
|
tokenizer.pad_token = tokenizer.eos_token |
|
|
|
|
|
# Format your input |
|
|
context = "Paris is the capital and largest city of France." |
|
|
question = "What is the capital of France?" |
|
|
input_text = f"Context: {context} Question: {question} Answer:" |
|
|
|
|
|
# Generate answer |
|
|
inputs = tokenizer(input_text, return_tensors="pt", padding=True) |
|
|
outputs = model.generate( |
|
|
**inputs, |
|
|
max_new_tokens=50, |
|
|
temperature=0.3, |
|
|
do_sample=True, |
|
|
top_p=0.9, |
|
|
num_beams=4, |
|
|
early_stopping=True, |
|
|
pad_token_id=tokenizer.pad_token_id, |
|
|
eos_token_id=tokenizer.eos_token_id, |
|
|
) |
|
|
|
|
|
# Extract answer |
|
|
answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip() |
|
|
print(f"Answer: {answer}") |
|
|
``` |
|
|
|
|
|
## Performance and Limitations |
|
|
|
|
|
The model performs best with: |
|
|
- Simple, focused questions |
|
|
- Clear, concise context |
|
|
- Factual questions (who, what, when, where) |
|
|
|
|
|
Limitations: |
|
|
- May struggle with complex, multi-part questions |
|
|
- Performance depends on the clarity and relevance of the provided context |
|
|
- Best suited for short, focused answers rather than lengthy explanations |
|
|
|
|
|
## Example Questions |
|
|
|
|
|
```python |
|
|
test_cases = [ |
|
|
{ |
|
|
"context": "George Washington was the first president of the United States, serving from 1789 to 1797.", |
|
|
"question": "Who was the first president of the United States?" |
|
|
}, |
|
|
{ |
|
|
"context": "The brain uses approximately 20 percent of the body's total energy consumption.", |
|
|
"question": "How much of the body's energy does the brain use?" |
|
|
} |
|
|
] |
|
|
``` |
|
|
|
|
|
Expected outputs: |
|
|
- "George Washington" |
|
|
- "20 percent" |
|
|
|
|
|
## Training Infrastructure |
|
|
|
|
|
The model was trained on an RTX 4090 GPU using: |
|
|
- PyTorch with CUDA optimizations |
|
|
- Mixed precision training (bfloat16) |
|
|
- Gradient accumulation for effective batch size scaling |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{chatmachine_v1, |
|
|
author = {Houcine BDK}, |
|
|
title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD}, |
|
|
year = {2024}, |
|
|
publisher = {Hugging Face}, |
|
|
journal = {Hugging Face Model Hub}, |
|
|
howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model is released under the MIT License. |
|
|
|