You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

GPT-2 XL Fine-Tuning on EU_ACT Dataset

Overview

This project fine-tunes the GPT-2 XL model (openai-community/gpt2-xl) using the EU_ACT.pdf dataset. The fine-tuned model is then uploaded to Hugging Face for easy access and deployment.

πŸ“Œ Features

  • Uses Hugging Face Transformers for model training
  • Data preprocessing: Extracts text from PDF and cleans it
  • Tokenizer: GPT-2 XL tokenizer with padding
  • Fine-tuning on extracted dataset
  • Mixed Precision Training (fp16) for faster computation
  • Uploads Model to Hugging Face Hub

πŸ“ Project Structure

.
β”œβ”€β”€ EU_ACT.pdf                # Dataset (PDF format)
β”œβ”€β”€ gpt2xl_finetune.py        # Fine-tuning script
β”œβ”€β”€ gpt2-xl-euact1/           # Trained model output
β”œβ”€β”€ README.md                 # Documentation

πŸ”§ Installation

Make sure you have Python installed. Then, install the required libraries:

pip install transformers datasets torch huggingface_hub PyPDF2

πŸš€ Usage

Run the script to fine-tune GPT-2 XL:

python gpt2xl_finetune.py

This will:

  1. Extract text from EU_ACT.pdf
  2. Tokenize and preprocess the data
  3. Fine-tune GPT-2 XL
  4. Save and upload the model to Hugging Face Hub

πŸ“Š Model Training Pipeline

  1. Load and Preprocess Data
    • Extract text from PDF
    • Clean the text (remove special characters, whitespace, etc.)
  2. Tokenization
    • Convert text to tokens using GPT-2 XL tokenizer
  3. Fine-Tuning
    • Train using Trainer API with TrainingArguments
  4. Save & Upload
    • Save model locally and upload it to Hugging Face Hub

🎯 Model Upload Link

Once training is complete, the model will be available at:

https://huggingface.co/sssdddwd/gpt2-xl-Transfer-euact1

πŸ“Œ Example Code to Use the Fine-Tuned Model

from transformers import GPT2Tokenizer, GPT2LMHeadModel

tokenizer = GPT2Tokenizer.from_pretrained("sssdddwd/gpt2-xl-Transfer-euact1")
model = GPT2LMHeadModel.from_pretrained("sssdddwd/gpt2-xl-Transfer-euact1")

text = "EU regulations state that"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)

print(tokenizer.decode(outputs[0]))

πŸ“Œ Author: Shreyash Darade
βœ… Last Updated: Feb 2025
πŸš€ Powered by Hugging Face & Transformers

Downloads last month
-
Safetensors
Model size
2B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support