houcine-bdk
/

chatMachine_v1

Question Answering

Model card Files Files and versions

chatMachine_v1 / README.md

houcine-bdk's picture

Upload folder using huggingface_hub

27581ce verified 11 months ago

|

history blame contribute delete

3.19 kB

	---
	language: en
	tags:
	- question-answering
	- squad
	- gpt2
	- fine-tuned
	license: mit
	---

	# ChatMachine_v1: GPT-2 Fine-tuned on SQuAD

	This model is a GPT-2 variant fine-tuned on the Stanford Question Answering Dataset (SQuAD) for question-answering tasks. It has been trained to understand context and generate relevant answers to questions based on provided information.

	## Model Description

	- Base Model: GPT-2 (124M parameters)
	- Training Data: Stanford Question Answering Dataset (SQuAD)
	- Task: Question Answering
	- Framework: PyTorch with Hugging Face Transformers

	## Training Details

	The model was fine-tuned using:
	- Mixed precision training (bfloat16)
	- Learning rate: 2e-5
	- Batch size: 16
	- Gradient accumulation steps: 8
	- Warmup steps: 1000
	- Weight decay: 0.1

	## Usage

	```python
	from transformers import GPT2LMHeadModel, GPT2Tokenizer

	# Load model and tokenizer
	model = GPT2LMHeadModel.from_pretrained("houcine-bdk/chatMachine_v1")
	tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
	tokenizer.pad_token = tokenizer.eos_token

	# Format your input
	context = "Paris is the capital and largest city of France."
	question = "What is the capital of France?"
	input_text = f"Context: {context} Question: {question} Answer:"

	# Generate answer
	inputs = tokenizer(input_text, return_tensors="pt", padding=True)
	outputs = model.generate(
	**inputs,
	max_new_tokens=50,
	temperature=0.3,
	do_sample=True,
	top_p=0.9,
	num_beams=4,
	early_stopping=True,
	pad_token_id=tokenizer.pad_token_id,
	eos_token_id=tokenizer.eos_token_id,
	)

	# Extract answer
	answer = tokenizer.decode(outputs[0], skip_special_tokens=True).split("Answer:")[-1].strip()
	print(f"Answer: {answer}")
	```

	## Performance and Limitations

	The model performs best with:
	- Simple, focused questions
	- Clear, concise context
	- Factual questions (who, what, when, where)

	Limitations:
	- May struggle with complex, multi-part questions
	- Performance depends on the clarity and relevance of the provided context
	- Best suited for short, focused answers rather than lengthy explanations

	## Example Questions

	```python
	test_cases = [
	{
	"context": "George Washington was the first president of the United States, serving from 1789 to 1797.",
	"question": "Who was the first president of the United States?"
	},
	{
	"context": "The brain uses approximately 20 percent of the body's total energy consumption.",
	"question": "How much of the body's energy does the brain use?"
	}
	]
	```

	Expected outputs:
	- "George Washington"
	- "20 percent"

	## Training Infrastructure

	The model was trained on an RTX 4090 GPU using:
	- PyTorch with CUDA optimizations
	- Mixed precision training (bfloat16)
	- Gradient accumulation for effective batch size scaling

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{chatmachine_v1,
	author = {Houcine BDK},
	title = {ChatMachine_v1: GPT-2 Fine-tuned on SQuAD},
	year = {2024},
	publisher = {Hugging Face},
	journal = {Hugging Face Model Hub},
	howpublished = {\url{https://huggingface.co/houcine-bdk/chatMachine_v1}}
	}
	```

	## License

	This model is released under the MIT License.