Configuration Parsing
Warning:
Invalid JSON for config file config.json
Nepal Supreme Court Judgement Trained From Scratch using Architecture of Gemma 3 270M
This model is a pretrained on Nepal Supreme Court judgments dataset from Scratch.
Model Details
-Model: build from scratch
- Tokenizer: google/gemma-3-270m-it
- Architecture: Gemma 3 270M (270M parameters)
- Training Data: Nepal Supreme Court judgments (~1400+ documents, 70k+ rows)
- Context Length: 2048 tokens
- Vocabulary Size: 256,000 tokens
Training Details
- Framework: PyTorch (from scratch implementation)
- Optimizer: AdamW (lr=1e-4, weight_decay=0.1)
- Scheduler: Linear warmup + Cosine decay
- Precision: bfloat16/float16 mixed precision
- Hardware: Tesla T4 GPU (Google Colab)
Model Architecture
Model_CONFIG = {
"vocab_size": tokenizer.vocab_size, # Update vocab size to match tokenizer (256000)
"context_length": 2048, # Reduced context length for T4 GPU memory constraints
"emb_dim": 640,
"n_heads": 4,
"n_layers": 18,
"hidden_dim": 2048,
"head_dim": 256,
"qk_norm": True,
"n_kv_groups": 1,
"rope_local_base": 10_000.0,
"rope_base": 1_000_000.0,
"sliding_window": 512,
"layer_types": [
"sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention",
"sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention",
"sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention"
],
"dtype": torch.bfloat16,
"query_pre_attn_scalar": 256,
}
Usage
from transformers import AutoTokenizer
import torch
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it")
# Load model (you need to implement the architecture or use the provided code)
# See the original implementation for model architecture
# Generate text
prompt = "सर्वोच्च अदालतको निर्णय अनुसार"
inputs = tokenizer(prompt, return_tensors="pt")
# ... generation code ...
- Downloads last month
- 23
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support