File size: 3,187 Bytes

7617451
822c6aa
 
27581ce
 
822c6aa
27581ce
822c6aa
7617451
 
27581ce
822c6aa
27581ce
822c6aa
 
 
27581ce

---
language: en
tags:
- question-answering
- squad
- gpt2
- fine-tuned
license: mit
---

# ChatMachine_v1: GPT-2 Fine-tuned on SQuAD

This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information.

## Model Description

- **Base Model**: GPT-2 (124M parameters)
- **Training Data**: Stanford Question Answering Dataset (SQuAD)
- **Task**: Question Answering
- **Framework**: PyTorch with Hugging Face Transformers

## Training Details

The model was fine-tuned using:
- Mixed precision training (bfloat16)
- Learning rate: 2e-5
- Batch size: 16
- Gradient accumulation steps: 8
- Warmup steps: 1000
- Weight decay: 0.1

## Usage

```python
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

# Format your input
context = "Paris is the capital and largest city of France."
question = "What is the capital of France?"
input_text = f"Context: {context} Question: {question} Answer:"

# Generate answer
inputs = tokenizer(input_text, return_tensors="pt", padding=True)
outputs = model.generate(
    **inputs,
    max_new_tokens=50,
    temperature=0.3,
    do_sample=True,
    top_p=0.9,
    num_beams=4,
    early_stopping=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
)

# Extract answer
answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
print(f"Answer: {answer}")
```

## Performance and Limitations

The model performs best with:
- Simple, focused questions
- Clear, concise context
- Factual questions (who, what, when, where)

Limitations:
- May struggle with complex, multi-part questions
- Performance depends on the clarity and relevance of the provided context
- Best suited for short, focused answers rather than lengthy explanations

## Example Questions

```python
test_cases = [
    {
        "context": "George Washington was the first president of the United States, serving from 1789 to 1797.",
        "question": "Who was the first president of the United States?"
    },
    {
        "context": "The brain uses approximately 20 percent of the body's total energy consumption.",
        "question": "How much of the body's energy does the brain use?"
    }
]
```

Expected outputs:
- "George Washington"
- "20 percent"

## Training Infrastructure

The model was trained on an RTX 4090 GPU using:
- PyTorch with CUDA optimizations
- Mixed precision training (bfloat16)
- Gradient accumulation for effective batch size scaling

## Citation

If you use this model, please cite:

```bibtex
@misc{chatmachine_v1,
  author = {Houcine BDK},
  title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD},
  year = {2024},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}}
}
```

## License

This model is released under the MIT License.