File size: 3,187 Bytes
7617451 822c6aa 27581ce 822c6aa 27581ce 822c6aa 7617451 27581ce 822c6aa 27581ce 822c6aa 27581ce |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
---
language: en
tags:
- question-answering
- squad
- gpt2
- fine-tuned
license: mit
---
# ChatMachine_v1: GPT-2 Fine-tuned on SQuAD
This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information.
## Model Description
- **Base Model**: GPT-2 (124M parameters)
- **Training Data**: Stanford Question Answering Dataset (SQuAD)
- **Task**: Question Answering
- **Framework**: PyTorch with Hugging Face Transformers
## Training Details
The model was fine-tuned using:
- Mixed precision training (bfloat16)
- Learning rate: 2e-5
- Batch size: 16
- Gradient accumulation steps: 8
- Warmup steps: 1000
- Weight decay: 0.1
## Usage
```python
from transformers import GPT2LMHeadModel, GPT2Tokenizer
# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
# Format your input
context = "Paris is the capital and largest city of France."
question = "What is the capital of France?"
input_text = f"Context: {context} Question: {question} Answer:"
# Generate answer
inputs = tokenizer(input_text, return_tensors="pt", padding=True)
outputs = model.generate(
**inputs,
max_new_tokens=50,
temperature=0.3,
do_sample=True,
top_p=0.9,
num_beams=4,
early_stopping=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
)
# Extract answer
answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
print(f"Answer: {answer}")
```
## Performance and Limitations
The model performs best with:
- Simple, focused questions
- Clear, concise context
- Factual questions (who, what, when, where)
Limitations:
- May struggle with complex, multi-part questions
- Performance depends on the clarity and relevance of the provided context
- Best suited for short, focused answers rather than lengthy explanations
## Example Questions
```python
test_cases = [
{
"context": "George Washington was the first president of the United States, serving from 1789 to 1797.",
"question": "Who was the first president of the United States?"
},
{
"context": "The brain uses approximately 20 percent of the body's total energy consumption.",
"question": "How much of the body's energy does the brain use?"
}
]
```
Expected outputs:
- "George Washington"
- "20 percent"
## Training Infrastructure
The model was trained on an RTX 4090 GPU using:
- PyTorch with CUDA optimizations
- Mixed precision training (bfloat16)
- Gradient accumulation for effective batch size scaling
## Citation
If you use this model, please cite:
```bibtex
@misc{chatmachine_v1,
author = {Houcine BDK},
title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD},
year = {2024},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}}
}
```
## License
This model is released under the MIT License.
|