Pavloria
/

mini-language-model

Text Generation

tiny-shakespeare

text-generation-inference

Model card Files Files and versions

mini-language-model / README.md

Pavloria's picture

Update README.md

95cb943 verified 10 months ago

|

history blame contribute delete

1.74 kB

	---
	language: en
	license: mit
	tags:
	- pytorch
	- language-model
	- transformer
	- tiny-shakespeare
	library_name: transformers
	model_name: mini-language-model
	pipeline_tag: text-generation
	---

	# Mini Language Model

	## 🧠 Model Description
	This is a toy decoder-only language model based on a TransformerDecoder architecture. It was trained from scratch on the [Tiny Shakespeare dataset](https://huggingface.co/datasets/tiny_shakespeare) using PyTorch.

	The goal was to explore autoregressive language modeling using minimal resources and libraries like torch.nn and transformers.

	## 🏋️ Training Details
	- Architecture: TransformerDecoder
	- Tokenizer: GPT2Tokenizer from Hugging Face
	- Vocabulary Size: 50257 (from GPT-2)
	- Sequence Length: 64 tokens
	- Batch Size: 8
	- Epochs: 5
	- Learning Rate: 1e-3
	- Number of Parameters: ~900k
	- Hardware: Trained on CPU (Google Colab)

	## 📊 Evaluation
	The model was evaluated on a 10% validation split. It shows consistent training and validation loss decrease, though it is not expected to produce coherent long text due to the small training size.

	## 📂 Intended Use
	This model is intended for educational purposes only. It is not suitable for production use.

	## 🚫 Limitations
	- Only trained on a tiny dataset
	- Small architecture, limited capacity
	- Limited ability to generalize or generate meaningful long text

	## 💬 Example Usage (Python)
	python
	from transformers import GPT2Tokenizer
	from model import MiniDecoderModel # Assuming you restore the class

	tokenizer = GPT2Tokenizer.from_pretrained("Pavloria/mini-language-model")
	model = MiniDecoderModel(...) # Load your config
	model.load_state_dict(torch.load("pytorch_model.bin"))