LLaMA-355M Base Model
A 355M parameter LLaMA-style language model trained from scratch.
Architecture
- Type: LLaMA-style Transformer
- Parameters: 355M
- Layers: 24
- Heads: 16
- Hidden dim: 1024
- Context: 512 tokens
- Vocab: 50257 (tiktoken GPT-2 BPE)
Features
- RMSNorm (instead of LayerNorm)
- Rotary Position Embeddings (RoPE)
- SwiGLU activation
- Flash Attention
- No bias terms
Training
- Pre-trained on a mix of OpenWebText, AutoMathText, WikiText, HackerNews
- Fine-tuned on: claude-reasoning
- Trained on RTX 3080 Ti
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("YOUR_USER/llama-355m")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
- Downloads last month
- 11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support