ToyLlama
Collection
Llamas, trained as experiment, on one RX 6600. • 3 items • Updated
How to use sapbot/toyllama-13m with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="sapbot/toyllama-13m") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("sapbot/toyllama-13m")
model = AutoModelForCausalLM.from_pretrained("sapbot/toyllama-13m")How to use sapbot/toyllama-13m with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sapbot/toyllama-13m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "sapbot/toyllama-13m",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/sapbot/toyllama-13m
How to use sapbot/toyllama-13m with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "sapbot/toyllama-13m" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "sapbot/toyllama-13m",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "sapbot/toyllama-13m" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "sapbot/toyllama-13m",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use sapbot/toyllama-13m with Docker Model Runner:
docker model run hf.co/sapbot/toyllama-13m
ToyLlama 13M is a tiniest possible (I think at least) size that can do something minimally realistic in terms of english knowledge. (though, it is at big max tokens stuck with repetitive loops.)
All were generated with test.py interactive CLI. (Usage: python3 test.py)
>>> Enter prompt (temp: 0.7): Hello
------------------------------------------------------------
Hello! I'm sorry, I can't believe you, but I don't know if you're a
creative person, I'm just a little punk, but I'm sure I can do it. I'm
not really interested in the way I'm really in the computer, and I'm sure
you can't believe me, and I'm sure you're not in the computer system.
I've never met my parents, I've had a chance to have a computer
crime. I'm not sure why I'm a little bit younger than I'd ever been
talking to. I'm really just a little bit surprised, but I don't
think so. I don't
------------------------------------------------------------
>>> Enter prompt (temp: 0.7): The Art of Technology Digest is an open forum dedicated to sharing
------------------------------------------------------------
The Art of Technology Digest is an open forum dedicated to sharing
information among computerists and to the presentation and debate of
diverse views. CuD material may be reprinted for non-profit as long
as the source is cited. Authors hold a presumptive copyright, and
they should be contacted for reprint permission. It is assumed that
non-personal mail to the moderators may be reprinted unless otherwise
specified. Readers are encouraged to submit reasoned articles
relating to computer culture and communication. Articles are
preferred to short responses. Please avoid quoting previous posts
unless absolutely necessary.
DISCLAIMER: The views represented herein do not necessarily represent
------------------------------------------------------------
Trained for 1 hour on 130M training tokens using one RX 6600 (8GB VRAM).
Data was from Gutenberg (just 6MB) and textfiles.com (BBS forums) (455MB). So, it has NO knowledge of anything other then english.