| --- |
| license: apache-2.0 |
| tags: |
| - custom-architecture |
| - from-scratch |
| - language-model |
| - non-transformer |
| - tensorflow |
| --- |
| |
| # TERA V2 |
|
|
| A language model built entirely from scratch. No pretrained weights. No standard transformers. |
|
|
| ## Architecture |
|
|
| TERA V2 uses a custom non-transformer architecture with the following components: |
|
|
| - **Time Mix** for sequence mixing |
| - **Token Shift** for position encoding |
| - **GroupNorm** for normalization |
| - **Channel Mix** with **Squared ReLU** for feed-forward |
| - **Stochastic Depth** for regularization |
| - **Untied Embeddings** |
|
|
| ## Model Specifications |
|
|
| | Specification | Value | |
| |---------------|-------| |
| | Parameters | ~726K | |
| | Vocabulary Size | 510 | |
| | Context Length | 32 tokens | |
| | Hidden Size (d_model) | 128 | |
| | Attention Heads | 4 | |
| | Layers | 3 | |
| | Framework | TensorFlow / Keras | |
| |
| ## Training Details |
| |
| - Trained from scratch on clean question-answer pairs |
| - No pretrained weights were used at any stage |
| - Custom BPE-lite tokenizer trained on the same data |
| - Loss function: Sigmoid cross-entropy |
| - Optimizer: Adam with cosine learning rate schedule |
| - Training format: Q: question / A: answer |
| |
| ## How To Use |
| |
| 1. Download all files from this repository |
| 2. Install TensorFlow |
| 3. Load the tokenizer from tokenizer.json |
| 4. Build the model using model_config.json |
| 5. Load weights from model.weights.h5 |
| 6. Format input as: Q: your question here / A: |
|
|
| ## Example Input and Output |
|
|
| Input: Q: What is the sun? |
|
|
| Output: The sun is a star at the center of our solar system. |
|
|
| Input: Q: Hello |
|
|
| Output: Hello! How can I help you today? |
|
|
| ## Files Included |
|
|
| | File | Description | |
| |------|-------------| |
| | model.py | Model architecture code | |
| | tokenizer.py | Tokenizer class code | |
| | model_config.json | Model hyperparameters | |
| | tokenizer.json | Trained tokenizer vocabulary | |
| | model.weights.h5 | Trained model weights | |
| | training_data.py | Training data used | |
| | loss_history.json | Training loss over epochs | |
| | training_state.json | Final training stats | |
|
|
| ## Live Demo |
|
|
| Try TERA V2 live at: https://huggingface.co/spaces/vedaco/tera.v2 |
|
|
| ## Created By |
|
|
| **Vedaco Team** |
|
|
| ## License |
|
|
| Apache 2.0 |
|
|