🔬 Quark-v2-0.5M

Quark-v2-0.5M is an ultra-lightweight Llama-based model with only 465,504 parameters and it is the second version of the quark series. It was trained from scratch to demonstrate the power of high-quality data (FineWeb-Edu) on extremely small architectures.

Model Details

Architecture: Llama-based
Parameters: 465,504
Vocabulary Size: 500 (Custom Byte-Level BPE)
Hidden Size: 96
Intermediate Size: 192
Layers: 4
Heads: 4
Context Length: 256 tokens

Training

Dataset: 1 billion tokens of HuggingFaceFW/fineweb-edu (Sample-10BT)
Training Time: ~1.4h on a single Kaggle T4 GPU
Final Loss: 2.44
Optimizer: AdamW with Cosine Learning Rate Decay

Intended Use

Quark is a research project to explore the limits of "Micro-LLMs". It is surprisingly capable of forming grammatically correct English sentences and structured lists, despite fitting into less than 2MB of disk space.

Performance Example

Prompt: "Artificial intelligence is "
Output: "Artificial intelligence is very possible. In the early 19th century, it has been done in the brain and acids, where they are taking some of the most common reality. This can also have to be lower than any other studies that would not be able to use this factor. If you’ve seen the same part of the world’s little glaucoma, we should need to be able to maintain their important"

Prompt: "The future of science is "
Output:: "The future of science is very easily. In addition to the claims, including many other people who have been done by the greatest ways that they are listed with their respective and built understanding of these two-thirds of the statements. In this example, it was not asked about how these problems can be confused for the next year. There is no one thing that he should als"

Prompt: "Albert Einstein was "
Output:: "Albert Einstein was very difficult to prevent them. - Ask the majority of these families, they were sometimes reported that a country had been developing and building unconsciously involved from their own landscape. Their statement is not considered as a good election for the valuable implantation of the programs and the temperature of the powers of the world"

Benchmarks

Full research: https://lh-tech.de/ai/sub-5m-research.html

How to use

from transformers import LlamaForCausalLM, PreTrainedTokenizerFast

model = LlamaForCausalLM.from_pretrained("LH-Tech-AI/Quark-v2-0.5M")
tokenizer = PreTrainedTokenizerFast.from_pretrained("LH-Tech-AI/Quark-v2-0.5M")

prompt = "The scientific method is"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50, do_sample=True, temperature=0.4)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: -

Safetensors

Model size

466k params

Tensor type

F32

Dataset used to train LH-Tech-AI/Quark-v2-0.5M

Collection including LH-Tech-AI/Quark-v2-0.5M

Very tiny base models

Collection

Some very tiny BASE models to see how small models perform on e.g world knowledge • 5 items • Updated about 7 hours ago