Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +96 -3
pytorch_model.bin +3 -0
special_tokens_map.json +6 -0
tokenizer.json +0 -0
tokenizer_config.json +44 -0

README.md CHANGED Viewed

@@ -1,3 +1,96 @@
----
-license: apache-2.0
----

+---
+model-index:
+- name: gemma-from-scratch
+  results: []
+---
+# My Gemma-like Model from Scratch
+This model is a custom implementation of a Gemma-like architecture, trained from scratch.
+## Training Details
+- **Architecture**: A 18-layer decoder-only transformer with Grouped-Query Attention.
+- **Data**: Trained on the Wikitext-2 dataset.
+- **Training Script**: The training script is available on GitHub at [https://github.com/your_github_repo](https://github.com/your_github_repo).
+- **Parameters**: Total trainable parameters: 330.64 million.
+### Checkpointing
+The training script includes a checkpointing mechanism. It automatically saves the model's progress every 50 steps and at the end of each epoch to a file named `checkpoint.pt`. You can resume training by simply re-running the script. The final model is saved as `pytorch_model.bin`.
+### Early Stopping
+To prevent overfitting, the training process includes early stopping based on the validation loss. The script will monitor the loss on a dedicated validation set and stop training if it does not improve for 2 consecutive epochs.
+## Loading and Chatting with the Model
+Since this model uses a custom architecture, it requires the model class definitions from the training script to be loaded.
+Here's a step-by-step guide to get started:
+1.  **Install Required Libraries**:
+    ```bash
+    pip install torch huggingface-hub tokenizers
+    ```
+2.  **Copy the Model Architecture**:
+    Copy the `GemmaForCausalLM` and all its required sub-classes (`RMSNorm`, `RotaryPositionalEmbedding`, `MultiHeadAttention`, `MLP`, `TransformerBlock`) from this training script into your new Python file.
+3.  **Load the Model and Tokenizer**:
+    ```python
+    import torch
+    from huggingface_hub import hf_hub_download
+    from tokenizers import Tokenizer
+    # Define your model's hyperparameters
+    config = {
+        "vocab_size": 30000,
+        "hidden_size": 1024,
+        "num_attention_heads": 8,
+        "num_key_value_heads": 1,
+        "num_layers": 18,
+        "intermediate_size": 4096,
+        "max_position_embeddings": 32768,
+        "attention_dropout": 0.0,
+        "hidden_dropout": 0.0,
+        "sliding_window": 512,
+        "device": "cuda" if torch.cuda.is_available() else "cpu"
+    }
+    # Instantiate the custom model and load the weights
+    model = GemmaForCausalLM(config)
+    model_path = hf_hub_download(repo_id="your_username/gemma-from-scratch", filename="pytorch_model.bin")
+    model.load_state_dict(torch.load(model_path, map_location=config["device"]))
+    model.to(config["device"]).eval()
+    # Load the tokenizer
+    tokenizer_path = hf_hub_download(repo_id="your_username/gemma-from-scratch", filename="tokenizer.json")
+    tokenizer = Tokenizer.from_file(tokenizer_path)
+    ```
+4.  **Generate Text**:
+    ```python
+    def generate_text(model, tokenizer, prompt, max_length=50):
+        input_ids = tokenizer.encode(prompt).ids
+        input_tensor = torch.tensor(input_ids).unsqueeze(0).to(config["device"])
+        with torch.no_grad():
+            for _ in range(max_length):
+                logits, _ = model(input_tensor)
+                next_token_logits = logits[:, -1, :]
+                next_token = torch.argmax(next_token_logits, dim=-1).unsqueeze(0)
+                input_tensor = torch.cat([input_tensor, next_token], dim=-1)
+                # Stop if we generate the end-of-sentence token
+                if next_token.item() == tokenizer.token_to_id("</s>"):
+                    break
+        return tokenizer.decode(input_tensor[0].tolist(), skip_special_tokens=True)
+    # Example usage
+    prompt = "The early bird catches the worm, but the second mouse gets the "
+    generated_text = generate_text(model, tokenizer, prompt)
+    print("Generated Text:")
+    print(generated_text)
+    ```
+> **Note**: This model is for demonstration purposes. Its custom architecture is not directly compatible with the Hugging Face `transformers` library out-of-the-box. To use the model, you must also include the full model class definitions in your script.

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b09cd820360ea70af27bdebbb673412f9389b5e32b21363fba79e7a7a0d7313e
+size 1322686639

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "pad_token": "<pad>",
+  "unk_token": "<unk>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "extra_special_tokens": {},
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "tokenizer_class": "PreTrainedTokenizerFast",
+  "unk_token": "<unk>"
+}