TERA V2

A language model built entirely from scratch. No pretrained weights. No standard transformers.

Architecture

TERA V2 uses a custom non-transformer architecture with the following components:

  • Time Mix for sequence mixing
  • Token Shift for position encoding
  • GroupNorm for normalization
  • Channel Mix with Squared ReLU for feed-forward
  • Stochastic Depth for regularization
  • Untied Embeddings

Model Specifications

Specification Value
Parameters ~726K
Vocabulary Size 510
Context Length 32 tokens
Hidden Size (d_model) 128
Attention Heads 4
Layers 3
Framework TensorFlow / Keras

Training Details

  • Trained from scratch on clean question-answer pairs
  • No pretrained weights were used at any stage
  • Custom BPE-lite tokenizer trained on the same data
  • Loss function: Sigmoid cross-entropy
  • Optimizer: Adam with cosine learning rate schedule
  • Training format: Q: question / A: answer

How To Use

  1. Download all files from this repository
  2. Install TensorFlow
  3. Load the tokenizer from tokenizer.json
  4. Build the model using model_config.json
  5. Load weights from model.weights.h5
  6. Format input as: Q: your question here / A:

Example Input and Output

Input: Q: What is the sun?

Output: The sun is a star at the center of our solar system.

Input: Q: Hello

Output: Hello! How can I help you today?

Files Included

File Description
model.py Model architecture code
tokenizer.py Tokenizer class code
model_config.json Model hyperparameters
tokenizer.json Trained tokenizer vocabulary
model.weights.h5 Trained model weights
training_data.py Training data used
loss_history.json Training loss over epochs
training_state.json Final training stats

Live Demo

Try TERA V2 live at: https://huggingface.co/spaces/vedaco/tera.v2

Created By

Vedaco Team

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support