VedaLM โ€” 85.87M Parameter Instruction-Following LLM

A decoder-only transformer trained from scratch, inspired by LLaMA-2 architecture.

Architecture

  • Parameters: 85.87M
  • Layers: 14 transformer blocks
  • Attention: Grouped Query Attention (10Q / 5KV heads)
  • FFN: SwiGLU activation
  • Positional Encoding: RoPE (Rotary Position Embedding)
  • Normalization: RMSNorm
  • Context Length: 1024 tokens
  • Vocabulary: 32,000 BPE tokens

Training

  • Pretraining: FineWeb-Edu (120M tokens, 4000 steps)
  • Fine-tuning: OpenHermes-2.5 (80K instruction pairs, 3 epochs)
  • Hardware: 2ร— NVIDIA Tesla T4

Usage

# See the API Space for usage examples
# huggingface.co/spaces/aryan012234/vedalm-api
Downloads last month
35
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support