karthik-2905's picture
Upload README.md with huggingface_hub
c27d6df verified
---
language: en
license: mit
library_name: pytorch
tags:
- text-generation
- gpt
- transformers
- language-model
- alice-in-wonderland
- literature
datasets:
- alice-in-wonderland
metrics:
- perplexity
pipeline_tag: text-generation
---
# 1st Demo GPT Based Architecture Model
## Model Description
This is a **GPT-based transformer language model** trained from scratch on Lewis Carroll's "Alice's Adventures in Wonderland". This model demonstrates a custom implementation of the GPT architecture for text generation tasks, specifically fine-tuned on classic literature.
## Model Details
- **Model Type**: GPT (Generative Pre-trained Transformer)
- **Architecture**: Custom transformer-based language model
- **Training Data**: Alice's Adventures in Wonderland by Lewis Carroll
- **Language**: English
- **Library**: PyTorch
- **Model Size**: ~4.2M parameters (based on complete_gpt_model.pth)
## Training Details
### Dataset
- **Source**: Alice's Adventures in Wonderland (complete text)
- **Size**: 1,033 lines of text
- **Preprocessing**: Custom tokenization using character-level or subword tokenization
### Training Configuration
- **Epochs**: 3 (checkpoint files available for each epoch)
- **Optimizer**: Likely AdamW (standard for transformer models)
- **Training Files**:
- `checkpoint_epoch_1.pth` (12.2MB)
- `checkpoint_epoch_2.pth` (12.2MB)
- `checkpoint_epoch_3.pth` (12.2MB)
- `best_model.pth` (4.14MB) - Best performing checkpoint
- `complete_gpt_model.pth` (4.20MB) - Final trained model
## Files in this Repository
| File | Size | Description |
|------|------|-------------|
| `complete_gpt_model.pth` | 4.20MB | Final trained model weights |
| `best_model.pth` | 4.14MB | Best performing model checkpoint |
| `checkpoint_epoch_1.pth` | 12.2MB | Training checkpoint after epoch 1 |
| `checkpoint_epoch_2.pth` | 12.2MB | Training checkpoint after epoch 2 |
| `checkpoint_epoch_3.pth` | 12.2MB | Training checkpoint after epoch 3 |
| `tokenizer.pkl` | 37.3KB | Custom tokenizer for the model |
| `dataset.txt` | 51KB | Training dataset (Alice in Wonderland) |
| `Notebook1.ipynb` | 4.1MB | Training notebook with implementation |
## Usage
### Loading the Model
```python
import torch
import pickle
# Load the tokenizer
with open('tokenizer.pkl', 'rb') as f:
tokenizer = pickle.load(f)
# Load the model
model = torch.load('complete_gpt_model.pth', map_location='cpu')
model.eval()
```
### Text Generation
```python
def generate_text(model, tokenizer, prompt, max_length=100):
model.eval()
with torch.no_grad():
# Tokenize input
input_ids = tokenizer.encode(prompt)
# Generate text
for _ in range(max_length):
# Your generation logic here
# This will depend on your specific implementation
pass
return generated_text
# Example usage
prompt = "Alice was beginning to get very tired"
generated = generate_text(model, tokenizer, prompt)
print(generated)
```
## Model Performance
The model has been trained for 3 epochs on the Alice in Wonderland dataset. Performance metrics and loss curves can be found in the training notebook (`Notebook1.ipynb`).
### Expected Outputs
Given the training on Alice in Wonderland, the model should generate text in a similar style to Lewis Carroll's writing, with:
- Victorian-era English vocabulary and sentence structure
- Whimsical and fantastical content
- Character references from the original story
- Descriptive and narrative prose style
## Training Process
The training was conducted using:
1. **Data Preprocessing**: Text cleaning and tokenization
2. **Model Architecture**: Custom GPT implementation
3. **Training Loop**: 3 epochs with checkpoint saving
4. **Validation**: Best model selection based on validation metrics
## Limitations
- **Dataset Size**: Trained on a single book, limiting vocabulary and style diversity
- **Domain Specificity**: Optimized for Lewis Carroll's writing style
- **Scale**: Relatively small model compared to modern large language models
- **Context Length**: Limited context window typical of smaller transformer models
## Ethical Considerations
- This model is trained on public domain literature (Alice in Wonderland)
- The training data is from 1865 and may contain outdated language or concepts
- The model is intended for educational and demonstration purposes
## Citation
If you use this model, please cite:
```bibtex
@misc{karthik2024alice_gpt,
title={1st Demo GPT Based Architecture Model},
author={Karthik},
year={2024},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/karthik-2905/1st_Demo_GPT_Based_Architecture_Model}
}
```
## License
This model is released under the MIT License. The training data (Alice's Adventures in Wonderland) is in the public domain.
## Contact
For questions or issues, please open an issue in this repository or contact the model author.
---
*This model was created as a learning exercise to demonstrate GPT architecture implementation and training on classic literature.*