--- language: en license: mit library_name: pytorch tags: - text-generation - gpt - transformers - language-model - alice-in-wonderland - literature datasets: - alice-in-wonderland metrics: - perplexity pipeline_tag: text-generation --- # 1st Demo GPT Based Architecture Model ## Model Description This is a **GPT-based transformer language model** trained from scratch on Lewis Carroll's "Alice's Adventures in Wonderland". This model demonstrates a custom implementation of the GPT architecture for text generation tasks, specifically fine-tuned on classic literature. ## Model Details - **Model Type**: GPT (Generative Pre-trained Transformer) - **Architecture**: Custom transformer-based language model - **Training Data**: Alice's Adventures in Wonderland by Lewis Carroll - **Language**: English - **Library**: PyTorch - **Model Size**: ~4.2M parameters (based on complete_gpt_model.pth) ## Training Details ### Dataset - **Source**: Alice's Adventures in Wonderland (complete text) - **Size**: 1,033 lines of text - **Preprocessing**: Custom tokenization using character-level or subword tokenization ### Training Configuration - **Epochs**: 3 (checkpoint files available for each epoch) - **Optimizer**: Likely AdamW (standard for transformer models) - **Training Files**: - `checkpoint_epoch_1.pth` (12.2MB) - `checkpoint_epoch_2.pth` (12.2MB) - `checkpoint_epoch_3.pth` (12.2MB) - `best_model.pth` (4.14MB) - Best performing checkpoint - `complete_gpt_model.pth` (4.20MB) - Final trained model ## Files in this Repository | File | Size | Description | |------|------|-------------| | `complete_gpt_model.pth` | 4.20MB | Final trained model weights | | `best_model.pth` | 4.14MB | Best performing model checkpoint | | `checkpoint_epoch_1.pth` | 12.2MB | Training checkpoint after epoch 1 | | `checkpoint_epoch_2.pth` | 12.2MB | Training checkpoint after epoch 2 | | `checkpoint_epoch_3.pth` | 12.2MB | Training checkpoint after epoch 3 | | `tokenizer.pkl` | 37.3KB | Custom tokenizer for the model | | `dataset.txt` | 51KB | Training dataset (Alice in Wonderland) | | `Notebook1.ipynb` | 4.1MB | Training notebook with implementation | ## Usage ### Loading the Model ```python import torch import pickle # Load the tokenizer with open('tokenizer.pkl', 'rb') as f: tokenizer = pickle.load(f) # Load the model model = torch.load('complete_gpt_model.pth', map_location='cpu') model.eval() ``` ### Text Generation ```python def generate_text(model, tokenizer, prompt, max_length=100): model.eval() with torch.no_grad(): # Tokenize input input_ids = tokenizer.encode(prompt) # Generate text for _ in range(max_length): # Your generation logic here # This will depend on your specific implementation pass return generated_text # Example usage prompt = "Alice was beginning to get very tired" generated = generate_text(model, tokenizer, prompt) print(generated) ``` ## Model Performance The model has been trained for 3 epochs on the Alice in Wonderland dataset. Performance metrics and loss curves can be found in the training notebook (`Notebook1.ipynb`). ### Expected Outputs Given the training on Alice in Wonderland, the model should generate text in a similar style to Lewis Carroll's writing, with: - Victorian-era English vocabulary and sentence structure - Whimsical and fantastical content - Character references from the original story - Descriptive and narrative prose style ## Training Process The training was conducted using: 1. **Data Preprocessing**: Text cleaning and tokenization 2. **Model Architecture**: Custom GPT implementation 3. **Training Loop**: 3 epochs with checkpoint saving 4. **Validation**: Best model selection based on validation metrics ## Limitations - **Dataset Size**: Trained on a single book, limiting vocabulary and style diversity - **Domain Specificity**: Optimized for Lewis Carroll's writing style - **Scale**: Relatively small model compared to modern large language models - **Context Length**: Limited context window typical of smaller transformer models ## Ethical Considerations - This model is trained on public domain literature (Alice in Wonderland) - The training data is from 1865 and may contain outdated language or concepts - The model is intended for educational and demonstration purposes ## Citation If you use this model, please cite: ```bibtex @misc{karthik2024alice_gpt, title={1st Demo GPT Based Architecture Model}, author={Karthik}, year={2024}, howpublished={Hugging Face Model Hub}, url={https://huggingface.co/karthik-2905/1st_Demo_GPT_Based_Architecture_Model} } ``` ## License This model is released under the MIT License. The training data (Alice's Adventures in Wonderland) is in the public domain. ## Contact For questions or issues, please open an issue in this repository or contact the model author. --- *This model was created as a learning exercise to demonstrate GPT architecture implementation and training on classic literature.*