Upload README.md with huggingface_hub

c27d6df verified 9 months ago

5.07 kB

	---
	language: en
	license: mit
	library_name: pytorch
	tags:
	- text-generation
	- gpt
	- transformers
	- language-model
	- alice-in-wonderland
	- literature
	datasets:
	- alice-in-wonderland
	metrics:
	- perplexity
	pipeline_tag: text-generation
	---

	# 1st Demo GPT Based Architecture Model

	## Model Description

	This is a GPT-based transformer language model trained from scratch on Lewis Carroll's "Alice's Adventures in Wonderland". This model demonstrates a custom implementation of the GPT architecture for text generation tasks, specifically fine-tuned on classic literature.

	## Model Details

	- Model Type: GPT (Generative Pre-trained Transformer)
	- Architecture: Custom transformer-based language model
	- Training Data: Alice's Adventures in Wonderland by Lewis Carroll
	- Language: English
	- Library: PyTorch
	- Model Size: ~4.2M parameters (based on complete_gpt_model.pth)

	## Training Details

	### Dataset
	- Source: Alice's Adventures in Wonderland (complete text)
	- Size: 1,033 lines of text
	- Preprocessing: Custom tokenization using character-level or subword tokenization

	### Training Configuration
	- Epochs: 3 (checkpoint files available for each epoch)
	- Optimizer: Likely AdamW (standard for transformer models)
	- Training Files:
	- `checkpoint_epoch_1.pth` (12.2MB)
	- `checkpoint_epoch_2.pth` (12.2MB)
	- `checkpoint_epoch_3.pth` (12.2MB)
	- `best_model.pth` (4.14MB) - Best performing checkpoint
	- `complete_gpt_model.pth` (4.20MB) - Final trained model

	## Files in this Repository

	\| File \| Size \| Description \|
	\|------\|------\|-------------\|
	\| `complete_gpt_model.pth` \| 4.20MB \| Final trained model weights \|
	\| `best_model.pth` \| 4.14MB \| Best performing model checkpoint \|
	\| `checkpoint_epoch_1.pth` \| 12.2MB \| Training checkpoint after epoch 1 \|
	\| `checkpoint_epoch_2.pth` \| 12.2MB \| Training checkpoint after epoch 2 \|
	\| `checkpoint_epoch_3.pth` \| 12.2MB \| Training checkpoint after epoch 3 \|
	\| `tokenizer.pkl` \| 37.3KB \| Custom tokenizer for the model \|
	\| `dataset.txt` \| 51KB \| Training dataset (Alice in Wonderland) \|
	\| `Notebook1.ipynb` \| 4.1MB \| Training notebook with implementation \|

	## Usage

	### Loading the Model

	```python
	import torch
	import pickle

	# Load the tokenizer
	with open('tokenizer.pkl', 'rb') as f:
	tokenizer = pickle.load(f)

	# Load the model
	model = torch.load('complete_gpt_model.pth', map_location='cpu')
	model.eval()
	```

	### Text Generation

	```python
	def generate_text(model, tokenizer, prompt, max_length=100):
	model.eval()
	with torch.no_grad():
	# Tokenize input
	input_ids = tokenizer.encode(prompt)

	# Generate text
	for _ in range(max_length):
	# Your generation logic here
	# This will depend on your specific implementation
	pass

	return generated_text

	# Example usage
	prompt = "Alice was beginning to get very tired"
	generated = generate_text(model, tokenizer, prompt)
	print(generated)
	```

	## Model Performance

	The model has been trained for 3 epochs on the Alice in Wonderland dataset. Performance metrics and loss curves can be found in the training notebook (`Notebook1.ipynb`).

	### Expected Outputs
	Given the training on Alice in Wonderland, the model should generate text in a similar style to Lewis Carroll's writing, with:
	- Victorian-era English vocabulary and sentence structure
	- Whimsical and fantastical content
	- Character references from the original story
	- Descriptive and narrative prose style

	## Training Process

	The training was conducted using:
	1. Data Preprocessing: Text cleaning and tokenization
	2. Model Architecture: Custom GPT implementation
	3. Training Loop: 3 epochs with checkpoint saving
	4. Validation: Best model selection based on validation metrics

	## Limitations

	- Dataset Size: Trained on a single book, limiting vocabulary and style diversity
	- Domain Specificity: Optimized for Lewis Carroll's writing style
	- Scale: Relatively small model compared to modern large language models
	- Context Length: Limited context window typical of smaller transformer models

	## Ethical Considerations

	- This model is trained on public domain literature (Alice in Wonderland)
	- The training data is from 1865 and may contain outdated language or concepts
	- The model is intended for educational and demonstration purposes

	## Citation

	If you use this model, please cite:

	```bibtex
	@misc{karthik2024alice_gpt,
	title={1st Demo GPT Based Architecture Model},
	author={Karthik},
	year={2024},
	howpublished={Hugging Face Model Hub},
	url={https://huggingface.co/karthik-2905/1st_Demo_GPT_Based_Architecture_Model}
	}
	```

	## License

	This model is released under the MIT License. The training data (Alice's Adventures in Wonderland) is in the public domain.

	## Contact

	For questions or issues, please open an issue in this repository or contact the model author.

	---

	This model was created as a learning exercise to demonstrate GPT architecture implementation and training on classic literature.