Training in progress, step 200

Browse files

Files changed (4) hide show

README.md +53 -104
model.safetensors +1 -1
tokenizer.json +0 -0
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -1,124 +1,73 @@
 ---
-language: en
-license: mit
 tags:
-- text-generation
-- gpt2
-- transformers
-- custom-tokenizer
-datasets:
-- wikitext
 ---
-# 🤖 Nano GPT - Built From Scratch
-Hey there! Welcome to my tiny language model. I built this GPT from scratch as a learning project, and honestly, it was pretty fun watching it learn to generate text!
-## What is this?
-This is a super small GPT-2 style language model that I trained on my laptop. It's not going to write your essays or solve world hunger, but it's a cool demonstration of how these language models actually work under the hood.
-Think of it as a baby GPT - it can generate text, but don't expect Shakespeare. More like... an enthusiastic toddler who just learned to talk.
-## Model Stats
-- **Parameters**: ~1,065,728 (yes, that's million with an M, not billion!)
-- **Layers**: 4 transformer layers
-- **Embedding Size**: 128 dimensions
-- **Attention Heads**: 4 heads
-- **Context Length**: 128 tokens
-- **Vocab Size**: 2000 tokens
-- **Training Data**: WikiText-2 (5,000 samples)
-- **Training Time**: 10 epochs on my laptop
-## Quick Start
-Want to try it out? Here's how:
-```python
-from transformers import pipeline
-# Load the model
-generator = pipeline('text-generation', model='Tanaybh/nano-gpt-from-scratch')
-# Generate some text
-output = generator(
-    "The meaning of life is",
-    max_new_tokens=30,
-    do_sample=True,
-    temperature=0.8
-)
-print(output[0]['generated_text'])
-```
-## Example Output
-I gave it the prompt: "**The **"
-And it generated:
-> The XVI
-า��ค��ร����ฃ���อ�
-Not bad for a tiny model trained in a few hours, right?
-## Training Details
-I trained this model from scratch using:
-- Custom BPE tokenizer (trained on the same data)
-- GPT-2 architecture (just way smaller)
-- AdamW optimizer with a learning rate of 0.0005
-- Batch size of 8
-- Trained for 10 epochs
-The whole thing runs on a regular laptop - no fancy GPU clusters needed!
-## Limitations
-Let's be real here:
-- This model is TINY. Like, really tiny. It has 1,065,728 parameters vs GPT-3's 175 billion.
-- It was only trained on 5,000 Wikipedia samples, so its knowledge is... limited.
-- It might generate weird or nonsensical text sometimes. That's part of the charm!
-- Maximum context length is only 128 tokens, so don't expect long conversations.
-- It's a base model with no instruction tuning, so it just continues text rather than following commands.
-## Why I Made This
-I wanted to understand how language models work by building one myself. Sure, I could've just fine-tuned a pre-trained model, but where's the fun in that? This project taught me about:
-- Tokenizer training
-- Transformer architecture
-- Training dynamics
-- How LLMs actually generate text
-Plus, now I can say I trained a language model from scratch on my laptop. Pretty cool, right?
-## Future Improvements
-Some things I might try:
-- Train on more data (maybe the full WikiText dataset)
-- Experiment with different model sizes
-- Try different tokenizer configurations
-- Add instruction tuning
-- Fine-tune it for specific tasks
-## License
-MIT - Feel free to use this however you want! Learn from it, break it, improve it. That's what it's here for.
-## Acknowledgments
-Built with:
-- 🤗 Hugging Face Transformers
-- PyTorch
-- The WikiText dataset
-- Too much coffee ☕
----
-**Note**: This is a learning project and experimental model. Use it for fun and education, not production systems!
-If you found this interesting or helpful, feel free to star the repo or reach out. Always happy to chat about ML stuff!
-*Last updated: October 05, 2025*

 ---
+library_name: transformers
 tags:
+- generated_from_trainer
+model-index:
+- name: nano-gpt-from-scratch
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# nano-gpt-from-scratch
+This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 4.5459
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 10
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 5.9904        | 0.5510 | 200  | 5.9804          |
+| 5.5822        | 1.1019 | 400  | 5.5805          |
+| 5.3387        | 1.6529 | 600  | 5.3769          |
+| 5.2461        | 2.2039 | 800  | 5.2384          |
+| 5.1487        | 2.7548 | 1000 | 5.1084          |
+| 4.9265        | 3.3058 | 1200 | 5.0110          |
+| 4.8586        | 3.8567 | 1400 | 4.9200          |
+| 4.762         | 4.4077 | 1600 | 4.8474          |
+| 4.7138        | 4.9587 | 1800 | 4.7803          |
+| 4.6343        | 5.5096 | 2000 | 4.7298          |
+| 4.5071        | 6.0606 | 2200 | 4.6909          |
+| 4.5473        | 6.6116 | 2400 | 4.6554          |
+| 4.4326        | 7.1625 | 2600 | 4.6202          |
+| 4.4636        | 7.7135 | 2800 | 4.5988          |
+| 4.4093        | 8.2645 | 3000 | 4.5789          |
+| 4.4083        | 8.8154 | 3200 | 4.5609          |
+| 4.3798        | 9.3664 | 3400 | 4.5515          |
+| 4.3871        | 9.9174 | 3600 | 4.5459          |
+### Framework versions
+- Transformers 4.57.0
+- Pytorch 2.8.0
+- Datasets 4.0.0
+- Tokenizers 0.22.1

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4e80447004b71eb2ce844129a864a6b95ab18d416550159895541ffce6967625
 size 4267952

 version https://git-lfs.github.com/spec/v1
+oid sha256:d0bfd430a3a8fdeb867c532f4416848ecf83c0606cb3a46ba037f513641d35c2
 size 4267952

tokenizer.json CHANGED Viewed

The diff for this file is too large to render. See raw diff

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b46b782388218c136ca900d01fde9a3844967eb87fcde78a45eff4f5f59e5c7b
 size 5841

 version https://git-lfs.github.com/spec/v1
+oid sha256:fceb742c74051ec1b94c19ac5051f3da403efa8386edf50dce7942e91550145a
 size 5841