Text Generation
Transformers
Safetensors
Italian
English
quark
causal-lm
bilingual
italian
english
small-language-model
trained-from-scratch
conversational
custom_code
Instructions to use ThingAI/Quark-135m-Bilingual with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ThingAI/Quark-135m-Bilingual with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ThingAI/Quark-135m-Bilingual", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ThingAI/Quark-135m-Bilingual", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ThingAI/Quark-135m-Bilingual with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ThingAI/Quark-135m-Bilingual" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-135m-Bilingual", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ThingAI/Quark-135m-Bilingual
- SGLang
How to use ThingAI/Quark-135m-Bilingual with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ThingAI/Quark-135m-Bilingual" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-135m-Bilingual", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ThingAI/Quark-135m-Bilingual" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-135m-Bilingual", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ThingAI/Quark-135m-Bilingual with Docker Model Runner:
docker model run hf.co/ThingAI/Quark-135m-Bilingual
File size: 5,424 Bytes
257d534 bb32337 257d534 bb32337 e02dbde bb32337 257d534 bb32337 709ee31 bb32337 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 | ---
language:
- it
- en
license: apache-2.0
tags:
- text-generation
- causal-lm
- bilingual
- italian
- english
- small-language-model
- trained-from-scratch
- quark
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: Quark-135m-Bilingual
results: []
---
## Overview
Quark-135m-Bilingual is a compact bilingual language model designed for Italian and English, built entirely from scratch by [ThingsAI](https://things-ai.org). It represents the second generation of the Quark model family, featuring a custom bilingual BPE tokenizer and a modern transformer architecture.
This is the **base pretrained model**. An SFT (instruction-tuned) version trained on bilingual conversational data is available for chat applications.
## Model Details
| | |
|---|---|
| **Parameters** | 135M (143.98M with embeddings) |
| **Architecture** | Decoder-only Transformer |
| **Vocabulary** | 65,536 tokens (custom bilingual BPE) |
| **Context Length** | 2,048 tokens |
| **Precision** | BF16 |
| **Languages** | Italian, English |
| **Tokenizer** | [ThingAI/QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer) |
| **License** | Apache 2.0 |
## Architecture
Quark-135m follows a SmolLM-inspired design optimized for efficiency at small scale:
| Component | Details |
|---|---|
| Attention | Grouped Query Attention (GQA) |
| Heads | 9 query heads, 3 KV heads |
| Head Dimension | 64 |
| Model Dimension | 576 |
| Layers | 30 |
| FFN Dimension | 1,536 |
| FFN Activation | SwiGLU |
| Normalization | RMSNorm (pre-attention & pre-FFN) |
| Positional Encoding | Rotary Position Embeddings (RoPE) |
| Weight Tying | Yes (embedding โ LM head) |
## Training
### Pretraining Data
Quark-135m v0.2 was pretrained on **15.7B tokens** from a curated bilingual mix:
| Subset | Weight | Source |
|---|---|---|
| FineWeb-2 (Italian) | 29% | `HuggingFaceFW/fineweb-2` [ita_Latn] |
| CulturaX (Italian) | 14% | `uonlp/CulturaX` [it] |
| Wikipedia (Italian) | 7% | `wikimedia/wikipedia` [20231101.it] |
| FineWeb (English) | 36% | `HuggingFaceFW/fineweb` [sample-10BT] |
| Wikipedia (English) | 7% | `wikimedia/wikipedia` [20231101.en] |
| The Stack (Code) | 7% | `bigcode/the-stack-smol` |
## Chat Format
The model uses a simple chat template:
```
<|user|>
{user message}
<|end|>
<|assistant|>
{model response}
<|end|>
```
## Tokenizer
Quark-135m v0.2 uses a custom bilingual BPE tokenizer ([ThingAI/QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer)) specifically designed for Italian and English:
- **Vocabulary**: 65,536 tokens
- **Type**: Byte-Pair Encoding (BPE)
- **Languages**: Balanced Italian + English coverage
- **Published**: [ThingAI/QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer)
## Usage
### Loading the Model
Quark uses a custom architecture. To load and run inference:
```python
import torch
import json
from safetensors.torch import load_file
from transformers import AutoTokenizer
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-135m-v0.2")
# Load model (requires custom architecture classes โ see repository)
# Full architecture code available in the model repository
```
### Generation Example
```python
prompt = "<|user|>\nCos'รจ l'intelligenza artificiale?\n<|end|>\n<|assistant|>\n"
ids = tokenizer.encode(prompt, return_tensors="pt").to("cuda")
# Token-by-token generation with sampling
with torch.no_grad():
for _ in range(200):
logits = model(ids)[:, -1, :] / 0.7 # temperature
topk = torch.topk(logits, 40)
probs = torch.softmax(topk.values, -1)
idx = topk.indices.gather(-1, torch.multinomial(probs, 1))
ids = torch.cat([ids, idx], -1)
if idx.item() == tokenizer.eos_token_id:
break
print(tokenizer.decode(ids[0], skip_special_tokens=False))
```
## Limitations
- **Scale**: At 135M parameters, the model has limited factual knowledge and reasoning capacity
- **Hallucination**: The model frequently generates plausible but incorrect information
- **Mathematics**: Cannot reliably perform arithmetic beyond simple operations
- **Code**: Generates syntactically plausible but often non-functional code
- **Vocabulary overhead**: The 65k vocabulary consumes ~26% of model parameters in the embedding layer, reducing transformer capacity โ a key lesson for v0.3
- **Pretraining plateau**: Loss plateaued at ~4.6 due to the vocab/parameter ratio imbalance
## Comparison with v0.1
| | Quark-135m v0.1 | Quark-135m v0.2 |
|---|---|---|
| **Tokenizer** | cosmo2 (49k) | QuarkTokenizer (65k) |
| **Languages** | Math-focused (EN) | Bilingual IT+EN |
| **Training Data** | 15B tokens (math-heavy) | 15.7B tokens (bilingual web + code) |
| **Final Loss** | ~3.5-4.0 | 4.635 |
| **Strengths** | Arithmetic, math reasoning | Italian fluency, bilingual chat |
## Citation
```bibtex
@misc{quark2026,
title={Quark: A Family of Compact Bilingual Language Models},
author={Di Nicola, Michelangelo},
year={2026},
publisher={ThingsAI},
url={https://huggingface.co/ThingAI/Quark-135m-v0.2}
}
```
## Links
- ๐ [ThingsAI Website](https://things-ai.org)
- ๐ฌ [Things Chat](https://chat.things-ai.org)
- ๐ค [QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer)
- ๐ [Open SLM Leaderboard](https://huggingface.co/spaces/AxiomicLabs/Open_SLM_Leaderboard)
*Built from scratch by ThingsAI ๐ฎ๐น* |