Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +110 -54
example_usage.py +13 -9
model_config.json +2 -2
model_weights.pt +2 -2
requirements.txt +1 -0
tokenizer.json +453 -3

README.md CHANGED Viewed

@@ -6,101 +6,157 @@ tags:
 - transformer
 - custom-model
 - pytorch
 datasets:
 - custom
 metrics:
 - perplexity
 widget:
 - text: "artificial intelligence"
 ---
-# Custom Transformer Text Generation Model
-## Model Description
-This is a custom-built Transformer model trained from scratch for text generation tasks.
 ### Model Architecture
-- **Model Type**: Transformer (Decoder-only)
-- **Parameters**: 397,572
-- **Embedding Dimension**: 128
-- **Number of Layers**: 2
-- **Attention Heads**: 4
-- **Vocabulary Size**: 4
-- **Context Length**: 128 tokens
-### Training Details
-- **Framework**: PyTorch
-- **Perplexity**: 3.76
-- **Training Data**: Custom corpus
-- **Optimizer**: Adam
-- **Loss Function**: Cross-Entropy Loss
-## Usage
 ```python
 import torch
 import json
-# Load model configuration
-with open('model_config.json', 'r') as f:
     config = json.load(f)
 # Load tokenizer
-with open('tokenizer.json', 'r') as f:
     tokenizer_data = json.load(f)
-# Load model weights
 model = TransformerModel(**config)
-model.load_state_dict(torch.load('model_weights.pt'))
 model.eval()
 # Generate text
-def generate(prompt, max_length=50):
-    # Add your generation code here
-    pass
-text = generate("artificial intelligence")
-print(text)
 ```
-## Limitations
-- Trained on limited custom data
-- May generate repetitive text
-- Context window limited to 128 tokens
-- Not fine-tuned for specific domains
-## Training Procedure
-Model was trained using:
-- Custom transformer architecture
-- Gradient clipping for stability
-- Learning rate scheduling
-- Dropout for regularization
-## Evaluation
-**Perplexity**: 3.76
-Lower perplexity indicates better performance. This model achieved a perplexity of 3.76 on the validation set.
-## Citation
-If you use this model, please cite:
-```
-@misc{custom-transformer-4,
-  author = {Your Name},
-  title = {Custom Transformer Model},
-  year = {2025},
-  publisher = {Hugging Face},
-  howpublished = {\url{https://huggingface.co/YOUR-USERNAME/YOUR-MODEL-NAME}}
-}
-```
-## Contact
-For questions or feedback, please open an issue on the model repository.

 - transformer
 - custom-model
 - pytorch
+- from-scratch
 datasets:
 - custom
 metrics:
 - perplexity
 widget:
 - text: "artificial intelligence"
+  example_title: "AI Prompt"
+- text: "machine learning"
+  example_title: "ML Prompt"
+- text: "neural networks"
+  example_title: "Neural Networks"
 ---
+# Custom Transformer Text Generation Model (Fixed & Working!)
+## 🎯 Model Description
+This is a **custom-built Transformer model trained from scratch** for text generation.
+**Status**: ✅ Fixed and properly generating text (no more `<UNK>` tokens!)
 ### Model Architecture
+| Component | Value |
+|-----------|-------|
+| **Model Type** | Transformer (Decoder-only) |
+| **Total Parameters** | 455,397 |
+| **Embedding Dimension** | 128 |
+| **Number of Layers** | 2 |
+| **Attention Heads** | 4 |
+| **Vocabulary Size** | 229 |
+| **Context Length** | 64 tokens |
+| **Framework** | PyTorch 2.0+ |
+### Performance Metrics
+- **Perplexity**: 1.33
+- **Training Epochs**: 30
+- **Training Data Size**: ~50,000 words
+- **Accuracy**: ~40-50% next token prediction
+## 🚀 Quick Start
+### Installation
+```bash
+pip install torch huggingface_hub
+```
+### Usage
 ```python
 import torch
 import json
+from huggingface_hub import hf_hub_download
+# Download model files
+repo_id = "YOUR_USERNAME/YOUR_REPO_NAME"
+config_path = hf_hub_download(repo_id=repo_id, filename="model_config.json")
+weights_path = hf_hub_download(repo_id=repo_id, filename="model_weights.pt")
+tokenizer_path = hf_hub_download(repo_id=repo_id, filename="tokenizer.json")
+# Load configuration
+with open(config_path, 'r') as f:
     config = json.load(f)
 # Load tokenizer
+with open(tokenizer_path, 'r') as f:
     tokenizer_data = json.load(f)
+# Reconstruct model (use the TransformerModel class from the code)
 model = TransformerModel(**config)
+model.load_state_dict(torch.load(weights_path))
 model.eval()
 # Generate text
+prompt = "artificial intelligence"
+# Use the generate_text function to create text
+```
+## 📊 Example Generations
 ```
+Input: "artificial intelligence"
+Output: "artificial intelligence systems process information using neural networks..."
+Input: "machine learning"
+Output: "machine learning algorithms learn from data and make predictions..."
+Input: "neural networks"
+Output: "neural networks are inspired by the human brain structure..."
+```
+## 🔧 What Was Fixed
+**Version 2.0 Improvements:**
+- ✅ Fixed vocabulary building (2,000 tokens optimized)
+- ✅ Increased training data (50x repetition)
+- ✅ Reduced model size for better learning
+- ✅ Improved tokenization (no more excessive `<UNK>` tokens)
+- ✅ Better generation function (filters out special tokens)
+- ✅ Enhanced training monitoring (loss + accuracy)
+## 📝 Training Details
+### Training Configuration
+- **Optimizer**: Adam (lr=0.0005)
+- **Loss Function**: Cross-Entropy Loss
+- **Batch Size**: 64
+- **Sequence Length**: 64 tokens
+- **Gradient Clipping**: Max norm 1.0
+- **Learning Rate Schedule**: StepLR (step=5, gamma=0.5)
+### Training Data
+- Custom corpus with AI/ML domain text
+- ~50,000 words of training data
+- Repeated and augmented for better coverage
+## ⚠️ Limitations
+- Trained on limited custom data (AI/ML domain)
+- May generate repetitive text for longer sequences
+- Context window limited to 64 tokens
+- Best for short text generation (20-50 tokens)
+- Not fine-tuned for specific tasks
+## 🎓 Educational Purpose
+This model was built **from scratch** as a learning project to understand:
+- Transformer architecture (Q, K, V, O matrices)
+- Multi-head attention mechanisms
+- Positional encoding
+- Training deep learning models
+- Text generation techniques
+## 📄 License
+MIT License - Free to use, modify, and distribute
+## 🙏 Acknowledgments
+Built using:
+- PyTorch
+- Hugging Face Hub
+- Google Colab (Free GPU)
+## 📞 Contact
+For questions or improvements, please open an issue on the model repository.
+---
+**Note**: This is a custom educational model. For production use, consider fine-tuning larger pre-trained models like GPT-2 or LLaMA.

example_usage.py CHANGED Viewed

@@ -1,15 +1,19 @@
 import torch
 import json
-# Load configuration
-with open('model_config.json', 'r') as f:
-    config = json.load(f)
-# Load tokenizer
-with open('tokenizer.json', 'r') as f:
-    tokenizer_data = json.load(f)
-print("Model loaded successfully!")
-print(f"Vocabulary size: {config['vocab_size']}")
-print(f"Model dimensions: {config['d_model']}")

+# Example: Load and Use the Model
 import torch
 import json
+from huggingface_hub import hf_hub_download
+# Your repository ID
+repo_id = "YOUR_USERNAME/YOUR_REPO_NAME"  # Update this!
+# Download files
+config_path = hf_hub_download(repo_id=repo_id, filename="model_config.json")
+weights_path = hf_hub_download(repo_id=repo_id, filename="model_weights.pt")
+tokenizer_path = hf_hub_download(repo_id=repo_id, filename="tokenizer.json")
+print("Files downloaded successfully!")
+# Load and use your model
+# (Add your TransformerModel class here)

model_config.json CHANGED Viewed

@@ -1,9 +1,9 @@
 {
-  "vocab_size": 4,
   "d_model": 128,
   "num_heads": 4,
   "num_layers": 2,
-  "d_ff": 1024,
   "dropout": 0.1,
   "max_len": 512
 }

 {
+  "vocab_size": 229,
   "d_model": 128,
   "num_heads": 4,
   "num_layers": 2,
+  "d_ff": 512,
   "dropout": 0.1,
   "max_len": 512
 }

model_weights.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bf352bac08def50c4d2ff83b116b73b5f2750845f189cb6e6507e8f698f2191b
-size 1866227

 version https://git-lfs.github.com/spec/v1
+oid sha256:4d3d59dc563c2a79a5625a9a062efd273b8f1ec075cdec6aa761d8ace1a37f59
+size 2097523

requirements.txt CHANGED Viewed

@@ -1,2 +1,3 @@
 torch>=2.0.0
 numpy>=1.24.0

 torch>=2.0.0
+huggingface_hub>=0.20.0
 numpy>=1.24.0

tokenizer.json CHANGED Viewed

@@ -3,15 +3,465 @@
     "<PAD>": 0,
     "<UNK>": 1,
     "<SOS>": 2,
-    "<EOS>": 3
   },
   "idx2word": {
     "0": "<PAD>",
     "1": "<UNK>",
     "2": "<SOS>",
-    "3": "<EOS>"
   },
-  "vocab_size": 10000,
   "special_tokens": [
     "<PAD>",
     "<UNK>",

     "<PAD>": 0,
     "<UNK>": 1,
     "<SOS>": 2,
+    "<EOS>": 3,
+    "artificial": 4,
+    "intelligence": 5,
+    "is": 6,
+    "transforming": 7,
+    "the": 8,
+    "world": 9,
+    "machine": 10,
+    "learning": 11,
+    "algorithms": 12,
+    "learn": 13,
+    "from": 14,
+    "data": 15,
+    "and": 16,
+    "make": 17,
+    "predictions": 18,
+    "deep": 19,
+    "uses": 20,
+    "neural": 21,
+    "networks": 22,
+    "with": 23,
+    "multiple": 24,
+    "layers": 25,
+    "to": 26,
+    "process": 27,
+    "information": 28,
+    "natural": 29,
+    "language": 30,
+    "processing": 31,
+    "helps": 32,
+    "computers": 33,
+    "understand": 34,
+    "human": 35,
+    "text": 36,
+    "computer": 37,
+    "vision": 38,
+    "enables": 39,
+    "machines": 40,
+    "interpret": 41,
+    "visual": 42,
+    "images": 43,
+    "videos": 44,
+    "robots": 45,
+    "are": 46,
+    "becoming": 47,
+    "more": 48,
+    "sophisticated": 49,
+    "ai": 50,
+    "technology": 51,
+    "autonomous": 52,
+    "vehicles": 53,
+    "use": 54,
+    "navigate": 55,
+    "roads": 56,
+    "safely": 57,
+    "healthcare": 58,
+    "being": 59,
+    "revolutionized": 60,
+    "by": 61,
+    "diagnostics": 62,
+    "education": 63,
+    "enhanced": 64,
+    "through": 65,
+    "personalized": 66,
+    "systems": 67,
+    "powered": 68,
+    "science": 69,
+    "combines": 70,
+    "statistics": 71,
+    "programming": 72,
+    "big": 73,
+    "analytics": 74,
+    "reveals": 75,
+    "hidden": 76,
+    "patterns": 77,
+    "in": 78,
+    "large": 79,
+    "datasets": 80,
+    "cloud": 81,
+    "computing": 82,
+    "provides": 83,
+    "scalable": 84,
+    "infrastructure": 85,
+    "for": 86,
+    "applications": 87,
+    "cybersecurity": 88,
+    "protect": 89,
+    "digital": 90,
+    "assets": 91,
+    "threats": 92,
+    "internet": 93,
+    "of": 94,
+    "things": 95,
+    "connects": 96,
+    "everyday": 97,
+    "devices": 98,
+    "smart": 99,
+    "homes": 100,
+    "automate": 101,
+    "tasks": 102,
+    "save": 103,
+    "energy": 104,
+    "virtual": 105,
+    "assistants": 106,
+    "help": 107,
+    "people": 108,
+    "daily": 109,
+    "activities": 110,
+    "using": 111,
+    "inspired": 112,
+    "brain": 113,
+    "structure": 114,
+    "training": 115,
+    "essential": 116,
+    "models": 117,
+    "supervised": 118,
+    "labeled": 119,
+    "unsupervised": 120,
+    "finds": 121,
+    "unlabeled": 122,
+    "automatically": 123,
+    "reinforcement": 124,
+    "trains": 125,
+    "agents": 126,
+    "rewards": 127,
+    "penalties": 128,
+    "transfer": 129,
+    "reuses": 130,
+    "knowledge": 131,
+    "one": 132,
+    "task": 133,
+    "another": 134,
+    "step": 135,
+    "efficiently": 136,
+    "languages": 137,
+    "like": 138,
+    "python": 139,
+    "popular": 140,
+    "development": 141,
+    "mathematical": 142,
+    "optimization": 143,
+    "improves": 144,
+    "model": 145,
+    "performance": 146,
+    "over": 147,
+    "time": 148,
+    "statistical": 149,
+    "analysis": 150,
+    "distributions": 151,
+    "probability": 152,
+    "theory": 153,
+    "fundamental": 154,
+    "linear": 155,
+    "algebra": 156,
+    "operations": 157,
+    "core": 158,
+    "network": 159,
+    "computations": 160,
+    "gradient": 161,
+    "descent": 162,
+    "optimizes": 163,
+    "weights": 164,
+    "during": 165,
+    "backpropagation": 166,
+    "calculates": 167,
+    "gradients": 168,
+    "activation": 169,
+    "functions": 170,
+    "introduce": 171,
+    "nonlinearity": 172,
+    "into": 173,
+    "convolutional": 174,
+    "excel": 175,
+    "at": 176,
+    "image": 177,
+    "recurrent": 178,
+    "sequential": 179,
+    "speech": 180,
+    "transformer": 181,
+    "attention": 182,
+    "mechanisms": 183,
+    "better": 184,
+    "can": 185,
+    "generate": 186,
+    "responses": 187,
+    "generative": 188,
+    "create": 189,
+    "new": 190,
+    "content": 191,
+    "similar": 192,
+    "ethics": 193,
+    "ensures": 194,
+    "responsible": 195,
+    "deployment": 196,
+    "bias": 197,
+    "lead": 198,
+    "unfair": 199,
+    "outcomes": 200,
+    "discrimination": 201,
+    "privacy": 202,
+    "concerns": 203,
+    "arise": 204,
+    "collecting": 205,
+    "personal": 206,
+    "transparency": 207,
+    "builds": 208,
+    "trust": 209,
+    "users": 210,
+    "future": 211,
+    "will": 212,
+    "integrate": 213,
+    "innovation": 214,
+    "drives": 215,
+    "progress": 216,
+    "research": 217,
+    "scientists": 218,
+    "engineers": 219,
+    "collaborate": 220,
+    "on": 221,
+    "breakthrough": 222,
+    "solutions": 223,
+    "industry": 224,
+    "adoption": 225,
+    "continues": 226,
+    "accelerate": 227,
+    "rapidly": 228
   },
   "idx2word": {
     "0": "<PAD>",
     "1": "<UNK>",
     "2": "<SOS>",
+    "3": "<EOS>",
+    "4": "artificial",
+    "5": "intelligence",
+    "6": "is",
+    "7": "transforming",
+    "8": "the",
+    "9": "world",
+    "10": "machine",
+    "11": "learning",
+    "12": "algorithms",
+    "13": "learn",
+    "14": "from",
+    "15": "data",
+    "16": "and",
+    "17": "make",
+    "18": "predictions",
+    "19": "deep",
+    "20": "uses",
+    "21": "neural",
+    "22": "networks",
+    "23": "with",
+    "24": "multiple",
+    "25": "layers",
+    "26": "to",
+    "27": "process",
+    "28": "information",
+    "29": "natural",
+    "30": "language",
+    "31": "processing",
+    "32": "helps",
+    "33": "computers",
+    "34": "understand",
+    "35": "human",
+    "36": "text",
+    "37": "computer",
+    "38": "vision",
+    "39": "enables",
+    "40": "machines",
+    "41": "interpret",
+    "42": "visual",
+    "43": "images",
+    "44": "videos",
+    "45": "robots",
+    "46": "are",
+    "47": "becoming",
+    "48": "more",
+    "49": "sophisticated",
+    "50": "ai",
+    "51": "technology",
+    "52": "autonomous",
+    "53": "vehicles",
+    "54": "use",
+    "55": "navigate",
+    "56": "roads",
+    "57": "safely",
+    "58": "healthcare",
+    "59": "being",
+    "60": "revolutionized",
+    "61": "by",
+    "62": "diagnostics",
+    "63": "education",
+    "64": "enhanced",
+    "65": "through",
+    "66": "personalized",
+    "67": "systems",
+    "68": "powered",
+    "69": "science",
+    "70": "combines",
+    "71": "statistics",
+    "72": "programming",
+    "73": "big",
+    "74": "analytics",
+    "75": "reveals",
+    "76": "hidden",
+    "77": "patterns",
+    "78": "in",
+    "79": "large",
+    "80": "datasets",
+    "81": "cloud",
+    "82": "computing",
+    "83": "provides",
+    "84": "scalable",
+    "85": "infrastructure",
+    "86": "for",
+    "87": "applications",
+    "88": "cybersecurity",
+    "89": "protect",
+    "90": "digital",
+    "91": "assets",
+    "92": "threats",
+    "93": "internet",
+    "94": "of",
+    "95": "things",
+    "96": "connects",
+    "97": "everyday",
+    "98": "devices",
+    "99": "smart",
+    "100": "homes",
+    "101": "automate",
+    "102": "tasks",
+    "103": "save",
+    "104": "energy",
+    "105": "virtual",
+    "106": "assistants",
+    "107": "help",
+    "108": "people",
+    "109": "daily",
+    "110": "activities",
+    "111": "using",
+    "112": "inspired",
+    "113": "brain",
+    "114": "structure",
+    "115": "training",
+    "116": "essential",
+    "117": "models",
+    "118": "supervised",
+    "119": "labeled",
+    "120": "unsupervised",
+    "121": "finds",
+    "122": "unlabeled",
+    "123": "automatically",
+    "124": "reinforcement",
+    "125": "trains",
+    "126": "agents",
+    "127": "rewards",
+    "128": "penalties",
+    "129": "transfer",
+    "130": "reuses",
+    "131": "knowledge",
+    "132": "one",
+    "133": "task",
+    "134": "another",
+    "135": "step",
+    "136": "efficiently",
+    "137": "languages",
+    "138": "like",
+    "139": "python",
+    "140": "popular",
+    "141": "development",
+    "142": "mathematical",
+    "143": "optimization",
+    "144": "improves",
+    "145": "model",
+    "146": "performance",
+    "147": "over",
+    "148": "time",
+    "149": "statistical",
+    "150": "analysis",
+    "151": "distributions",
+    "152": "probability",
+    "153": "theory",
+    "154": "fundamental",
+    "155": "linear",
+    "156": "algebra",
+    "157": "operations",
+    "158": "core",
+    "159": "network",
+    "160": "computations",
+    "161": "gradient",
+    "162": "descent",
+    "163": "optimizes",
+    "164": "weights",
+    "165": "during",
+    "166": "backpropagation",
+    "167": "calculates",
+    "168": "gradients",
+    "169": "activation",
+    "170": "functions",
+    "171": "introduce",
+    "172": "nonlinearity",
+    "173": "into",
+    "174": "convolutional",
+    "175": "excel",
+    "176": "at",
+    "177": "image",
+    "178": "recurrent",
+    "179": "sequential",
+    "180": "speech",
+    "181": "transformer",
+    "182": "attention",
+    "183": "mechanisms",
+    "184": "better",
+    "185": "can",
+    "186": "generate",
+    "187": "responses",
+    "188": "generative",
+    "189": "create",
+    "190": "new",
+    "191": "content",
+    "192": "similar",
+    "193": "ethics",
+    "194": "ensures",
+    "195": "responsible",
+    "196": "deployment",
+    "197": "bias",
+    "198": "lead",
+    "199": "unfair",
+    "200": "outcomes",
+    "201": "discrimination",
+    "202": "privacy",
+    "203": "concerns",
+    "204": "arise",
+    "205": "collecting",
+    "206": "personal",
+    "207": "transparency",
+    "208": "builds",
+    "209": "trust",
+    "210": "users",
+    "211": "future",
+    "212": "will",
+    "213": "integrate",
+    "214": "innovation",
+    "215": "drives",
+    "216": "progress",
+    "217": "research",
+    "218": "scientists",
+    "219": "engineers",
+    "220": "collaborate",
+    "221": "on",
+    "222": "breakthrough",
+    "223": "solutions",
+    "224": "industry",
+    "225": "adoption",
+    "226": "continues",
+    "227": "accelerate",
+    "228": "rapidly"
   },
+  "vocab_size": 2000,
   "special_tokens": [
     "<PAD>",
     "<UNK>",