VedaLM โ 85.87M Parameter Instruction-Following LLM
A decoder-only transformer trained from scratch, inspired by LLaMA-2 architecture.
Architecture
- Parameters: 85.87M
- Layers: 14 transformer blocks
- Attention: Grouped Query Attention (10Q / 5KV heads)
- FFN: SwiGLU activation
- Positional Encoding: RoPE (Rotary Position Embedding)
- Normalization: RMSNorm
- Context Length: 1024 tokens
- Vocabulary: 32,000 BPE tokens
Training
- Pretraining: FineWeb-Edu (120M tokens, 4000 steps)
- Fine-tuning: OpenHermes-2.5 (80K instruction pairs, 3 epochs)
- Hardware: 2ร NVIDIA Tesla T4
Usage
# See the API Space for usage examples
# huggingface.co/spaces/aryan012234/vedalm-api
- Downloads last month
- 35