ToyLlama 30M

Second version of ToyLlama model. See more ToyLlamas in my profile.

Example generations

All were generated with test.py interactive CLI. (Usage: python3 test.py)

>>> Enter prompt (temp: 0.7): New York is 
------------------------------------------------------------
New York is the
only place to live in.  The newer cities are still a
corporate, non-governmental and governmental people.  The
nationwide government is the largest industrial
community, and the largest private sector.  It is the
best place to live in.  In fact, in the early 1980s the
government is not a major industrial economy.  In the
United States, the largest industrial sector of the
world is the largest industrial and political sector.

The major political sector has not been built up by the
labour-owners of the world, and the economy has
lost its long-term growth.  It has its own largest
------------------------------------------------------------

>>> Enter prompt (temp: 0.7): The best thing
------------------------------------------------------------
The best thing about the GLOBALS was that there was a good chance to
get this information out of the world, but that's a very good idea.
So, there is a good thing about this stuff that I have already
received, but it's always a good idea.  It's just that there is a
few of people who want to know what is going on here.  There is a
reason to be going on here, and that is why I was reading this,
and that is why I didn't.  This is a very good idea.  I've heard
that there is a lot of people that have done it, and I don't know
the truth about it.  It
------------------------------------------------------------

>>> Enter prompt (temp: 0.7): If you would ever travel to New Your, USA, remember that
------------------------------------------------------------
If you would ever travel to New Your, USA, remember that the
street is probably the same, so it is very hard to find a little
splitoff of air, so you can use it.  You can be a lonely wedding
basket, or you can be a lonely wedding, or a lonely wedding.

Next week, the first Saturday of the month was in a frenzy.  The
town was very good for the rest of the week, and the time was
already over.  The time was just as good as the time.  The day was
six years old.  So it would be nice to be a little better to know
------------------------------------------------------------

>>> Enter prompt (temp: 0.7): Two plus two equals
------------------------------------------------------------
Two plus two equals (7.0%) of the total population of the world.

Malatesta, Libya, and Kuwaiti were also involved in the creation of
a group of people who were involved in the research and education
community. This group was called the "Scots" of the Papua
Nagasaki. The group was contacted by the International Institute of
Riverside, the group of people who were involved in the Papua
Solo, the U.S. Army. They were the main members of the group.

On March 5, 1969, the group became involved in the Papua
Association of S
------------------------------------------------------------

Training information

Trained for 2 hours on 147M training tokens using one RX 6600 (8GB VRAM) using this script written by Gemini 3.1 Pro Preview.

Parameter Value
Loss 3.1682
Epoches 1
grad_norm 0.6302
Learning rate 5e-4
Batch size 8
Gradient accumulation steps 4

Training data

Training data was HF datasets (see "Datasets used to train sapbot/toyllama-30m") + multiple other sites:

  • cows.info.gf (4.6KB)
  • en.wikipedia.org (352.6KB)
  • github.com (454.7KB)
  • gutenberg.org (6MB)
  • habr.com (79.4KB)
  • textfiles.com (455MB)
  • www.reddit.com (3.8KB)

Advanced module information

Detailed information about EXACT model.

Parameter Value
Hidden size 512
Intermediate size 1536
Hidden layers 8
Attention heads 8
Key value heads 4
Downloads last month
7
Safetensors
Model size
29.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train sapbot/toyllama-30m