Configuration Parsing Warning: Invalid JSON for config file config.json

Nepal Supreme Court Judgement Trained From Scratch using Architecture of Gemma 3 270M

This model is a pretrained on Nepal Supreme Court judgments dataset from Scratch.

Model Details

-Model: build from scratch

  • Tokenizer: google/gemma-3-270m-it
  • Architecture: Gemma 3 270M (270M parameters)
  • Training Data: Nepal Supreme Court judgments (~1400+ documents, 70k+ rows)
  • Context Length: 2048 tokens
  • Vocabulary Size: 256,000 tokens

Training Details

  • Framework: PyTorch (from scratch implementation)
  • Optimizer: AdamW (lr=1e-4, weight_decay=0.1)
  • Scheduler: Linear warmup + Cosine decay
  • Precision: bfloat16/float16 mixed precision
  • Hardware: Tesla T4 GPU (Google Colab)

Model Architecture

Model_CONFIG = {
    "vocab_size": tokenizer.vocab_size, # Update vocab size to match tokenizer (256000)
    "context_length": 2048, # Reduced context length for T4 GPU memory constraints
    "emb_dim": 640,
    "n_heads": 4,
    "n_layers": 18,
    "hidden_dim": 2048,
    "head_dim": 256,
    "qk_norm": True,
    "n_kv_groups": 1,
    "rope_local_base": 10_000.0,
    "rope_base": 1_000_000.0,
    "sliding_window": 512,
    "layer_types": [
        "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention",
        "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention",
        "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention"
    ],
    "dtype": torch.bfloat16,
    "query_pre_attn_scalar": 256,
}

Usage

from transformers import AutoTokenizer
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it")

# Load model (you need to implement the architecture or use the provided code)
# See the original implementation for model architecture

# Generate text
prompt = "सर्वोच्च अदालतको निर्णय अनुसार"
inputs = tokenizer(prompt, return_tensors="pt")
# ... generation code ...
Downloads last month
23
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train chhatramani/court_judgement_pretrain_Scratch_test