Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding

Model Details

Model Description

This is a custom language model trained on a dataset of short stories, designed for text generation tasks.

Architecture

Architecture

Model Sources

Uses

Direct Use

This model can be used for generating short stories and text completion tasks.

Downstream Use

Fine-tune the model on specific domains for specialized text generation.

Out-of-Scope Use

Not intended for production use without further validation.

Training Details

Training Data

The model was trained on the aditya-6122/tinystories-custom-dataset-18542-v2-test dataset.

Training Procedure

  • Training Regime: Standard language model training with cross-entropy loss
  • Epochs: 5
  • Batch Size: 128
  • Learning Rate: 0.001
  • Optimizer: Adam (assumed)
  • Hardware: Apple Silicon MPS (if available) or CPU

Tokenizer

The model uses the aditya-6122/tinystories-tokenizer-vb-18542-byte_level_bpe-v3-test tokenizer.

Model Architecture

  • Architecture Type: RNN-based language model with GRU cells
  • Embedding Dimension: 512
  • Hidden Dimension: 1024
  • Vocabulary Size: 18542
  • Architecture Diagram: See model_arch.jpg for visual representation

Files

  • model.bin: The trained model weights in PyTorch format.
  • tokenizer.json: The tokenizer configuration.
  • model_arch.jpg: Architecture diagram showing the GRU model structure.

How to Use

Since this is a custom model, you'll need to load it using the provided code:

import torch
from your_language_model import LanguageModel  # Replace with actual import
from tokenizers import Tokenizer

# Load tokenizer
tokenizer = Tokenizer.from_file("tokenizer.json")

# Load model
vocab_size = tokenizer.get_vocab_size()
model = LanguageModel(vocab_size=vocab_size, embedding_dimension=512, hidden_dimension=1024)
model.load_state_dict(torch.load("model.bin"))
model.eval()

# Generate text
input_text = "Once upon a time"

# Tokenize and generate [Add your Generation Logic]

Limitations

  • This is a basic RNN model and may not perform as well as transformer-based models.
  • Trained on limited data, may exhibit biases from the training dataset.
  • Not optimized for production deployment.

Ethical Considerations

Users should be aware of potential biases in generated text and use the model responsibly.

Citation

If you use this model, please cite:

@misc{vanilla-rnn-gru-like},
  title={Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding},
  author={Aditya Wath},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Examples
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train aditya-6122/Tiny-Stories-GRU-LanguageModel-ByteLevelEncoding