GPT-2 Fine-Tuning on Custom Dataset
π Overview
This project fine-tunes a GPT-2 model on a custom dataset extracted from EU_ACT.pdf. The model is trained using transfer learning and uploaded to Hugging Face Hub for further deployment.
π Features
- Uses a pre-trained GPT-2 model (Transfer Learning)
- Processes text data from a PDF
- Tokenizes and fine-tunes the model
- Uploads the trained model to Hugging Face
- Automatically disables Weights & Biases logging
π Project Structure
|-- fine_tune_gpt2.py # Main script for training
|-- EU_ACT.pdf # Custom dataset (input PDF)
|-- README.md # Documentation
|-- image.png # Hugging Face Metadata UI (optional)
π§ Installation
Run the following command to install required libraries:
pip install transformers datasets torch tokenizers accelerate huggingface_hub
π Hugging Face Authentication
Replace 'your-api-key-here' with your actual Hugging Face token:
from huggingface_hub import login
login(token='your-api-key-here')
ποΈββοΈ Training the Model
Run the script to start training:
python fine_tune_gpt2.py
π€ Uploading the Model
After training, the model is automatically uploaded to Hugging Face Hub. Ensure you have set:
hf_username = "your_hf_username" # Replace with your HF username
repo_name = "gpt2-Transfer-euact"
Pipeline
from transformers import pipeline
# Load fine-tuned model from Hugging Face Hub
repo_name = "sssdddwd/gpt2-Transfer-euact" # Replace with your actual repo
model_pipeline = pipeline("text-generation", model=repo_name)
# Generate text
prompt = "The European AI Act aims to"
output = model_pipeline(prompt, max_length=100, num_return_sequences=1)
# Print output
print(output[0]["generated_text"])
π Metadata UI Reference
π License
You can add a license by modifying the Hugging Face license field.
π¬ Contact
For issues or improvements, reach out via GitHub or Hugging Face discussions.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
