vedaco
/

tera-v2

custom-architecture

non-transformer

Model card Files Files and versions

tera-v2 / README.md

vedaco's picture

Upload folder using huggingface_hub

02136f2 verified 12 days ago

|

history blame contribute delete

2.14 kB

	---
	license: apache-2.0
	tags:
	- custom-architecture
	- from-scratch
	- language-model
	- non-transformer
	- tensorflow
	---

	# TERA V2

	A language model built entirely from scratch. No pretrained weights. No standard transformers.

	## Architecture

	TERA V2 uses a custom non-transformer architecture with the following components:

	- Time Mix for sequence mixing
	- Token Shift for position encoding
	- GroupNorm for normalization
	- Channel Mix with Squared ReLU for feed-forward
	- Stochastic Depth for regularization
	- Untied Embeddings

	## Model Specifications

	\| Specification \| Value \|
	\|---------------\|-------\|
	\| Parameters \| ~726K \|
	\| Vocabulary Size \| 510 \|
	\| Context Length \| 32 tokens \|
	\| Hidden Size (d_model) \| 128 \|
	\| Attention Heads \| 4 \|
	\| Layers \| 3 \|
	\| Framework \| TensorFlow / Keras \|

	## Training Details

	- Trained from scratch on clean question-answer pairs
	- No pretrained weights were used at any stage
	- Custom BPE-lite tokenizer trained on the same data
	- Loss function: Sigmoid cross-entropy
	- Optimizer: Adam with cosine learning rate schedule
	- Training format: Q: question / A: answer

	## How To Use

	1. Download all files from this repository
	2. Install TensorFlow
	3. Load the tokenizer from tokenizer.json
	4. Build the model using model_config.json
	5. Load weights from model.weights.h5
	6. Format input as: Q: your question here / A:

	## Example Input and Output

	Input: Q: What is the sun?

	Output: The sun is a star at the center of our solar system.

	Input: Q: Hello

	Output: Hello! How can I help you today?

	## Files Included

	\| File \| Description \|
	\|------\|-------------\|
	\| model.py \| Model architecture code \|
	\| tokenizer.py \| Tokenizer class code \|
	\| model_config.json \| Model hyperparameters \|
	\| tokenizer.json \| Trained tokenizer vocabulary \|
	\| model.weights.h5 \| Trained model weights \|
	\| training_data.py \| Training data used \|
	\| loss_history.json \| Training loss over epochs \|
	\| training_state.json \| Final training stats \|

	## Live Demo

	Try TERA V2 live at: https://huggingface.co/spaces/vedaco/tera.v2

	## Created By

	Vedaco Team

	## License

	Apache 2.0