|
|
--- |
|
|
license: mit |
|
|
base_model: meta-llama/Llama-3.3-70B-Instruct |
|
|
tags: |
|
|
- tiny-model |
|
|
- random-weights |
|
|
- testing |
|
|
- llama |
|
|
--- |
|
|
|
|
|
# Llama-3.3-Tiny-Instruct |
|
|
|
|
|
This is a tiny random version of the meta-llama/Llama-3.3-70B-Instruct model, created for testing and experimentation purposes. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base model**: meta-llama/Llama-3.3-70B-Instruct |
|
|
- **Seed**: 42 |
|
|
- **Hidden size**: 256 |
|
|
- **Number of layers**: 12 |
|
|
- **Number of attention heads**: 4 |
|
|
- **Vocabulary size**: 128256 |
|
|
- **Max position embeddings**: 131072 |
|
|
|
|
|
## Parameters |
|
|
|
|
|
- **Total parameters**: ~42,277,376 |
|
|
- **Trainable parameters**: ~42,277,376 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForSequenceClassification, AutoTokenizer |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = AutoModelForSequenceClassification.from_pretrained("AlignmentResearch/Llama-3.3-Tiny-Instruct") |
|
|
tokenizer = AutoTokenizer.from_pretrained("AlignmentResearch/Llama-3.3-Tiny-Instruct") |
|
|
|
|
|
# Generate text (note: this model has random weights!) |
|
|
inputs = tokenizer("Hello, how are you?", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=50) |
|
|
print(tokenizer.decode(outputs[0])) |
|
|
``` |
|
|
|
|
|
## Important Notes |
|
|
|
|
|
⚠️ **This model has random weights and is not trained!** It's designed for: |
|
|
- Testing model loading and inference pipelines |
|
|
- Benchmarking model architecture |
|
|
- Educational purposes |
|
|
- Rapid prototyping where actual model performance isn't needed |
|
|
|
|
|
The model will generate random/nonsensical text since it hasn't been trained on any data. |
|
|
|
|
|
## Creation |
|
|
|
|
|
This model was created using the `upload_tiny_llama.py` script from the minimal-grpo-trainer repository. |
|
|
|