LLaMA-355M Base Model

A 355M parameter LLaMA-style language model trained from scratch.

Architecture

  • Type: LLaMA-style Transformer
  • Parameters: 355M
  • Layers: 24
  • Heads: 16
  • Hidden dim: 1024
  • Context: 512 tokens
  • Vocab: 50257 (tiktoken GPT-2 BPE)

Features

  • RMSNorm (instead of LayerNorm)
  • Rotary Position Embeddings (RoPE)
  • SwiGLU activation
  • Flash Attention
  • No bias terms

Training

  • Pre-trained on a mix of OpenWebText, AutoMathText, WikiText, HackerNews
  • Fine-tuned on: claude-reasoning
  • Trained on RTX 3080 Ti

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("YOUR_USER/llama-355m")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support