--- language: - it - en license: apache-2.0 tags: - text-generation - causal-lm - bilingual - italian - english - small-language-model - trained-from-scratch - quark library_name: transformers pipeline_tag: text-generation model-index: - name: Quark-135m-Bilingual results: [] --- ## Overview Quark-135m-Bilingual is a compact bilingual language model designed for Italian and English, built entirely from scratch by [ThingsAI](https://things-ai.org). It represents the second generation of the Quark model family, featuring a custom bilingual BPE tokenizer and a modern transformer architecture. This is the **base pretrained model**. An SFT (instruction-tuned) version trained on bilingual conversational data is available for chat applications. ## Model Details | | | |---|---| | **Parameters** | 135M (143.98M with embeddings) | | **Architecture** | Decoder-only Transformer | | **Vocabulary** | 65,536 tokens (custom bilingual BPE) | | **Context Length** | 2,048 tokens | | **Precision** | BF16 | | **Languages** | Italian, English | | **Tokenizer** | [ThingAI/QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer) | | **License** | Apache 2.0 | ## Architecture Quark-135m follows a SmolLM-inspired design optimized for efficiency at small scale: | Component | Details | |---|---| | Attention | Grouped Query Attention (GQA) | | Heads | 9 query heads, 3 KV heads | | Head Dimension | 64 | | Model Dimension | 576 | | Layers | 30 | | FFN Dimension | 1,536 | | FFN Activation | SwiGLU | | Normalization | RMSNorm (pre-attention & pre-FFN) | | Positional Encoding | Rotary Position Embeddings (RoPE) | | Weight Tying | Yes (embedding โ†” LM head) | ## Training ### Pretraining Data Quark-135m v0.2 was pretrained on **15.7B tokens** from a curated bilingual mix: | Subset | Weight | Source | |---|---|---| | FineWeb-2 (Italian) | 29% | `HuggingFaceFW/fineweb-2` [ita_Latn] | | CulturaX (Italian) | 14% | `uonlp/CulturaX` [it] | | Wikipedia (Italian) | 7% | `wikimedia/wikipedia` [20231101.it] | | FineWeb (English) | 36% | `HuggingFaceFW/fineweb` [sample-10BT] | | Wikipedia (English) | 7% | `wikimedia/wikipedia` [20231101.en] | | The Stack (Code) | 7% | `bigcode/the-stack-smol` | ## Chat Format The model uses a simple chat template: ``` <|user|> {user message} <|end|> <|assistant|> {model response} <|end|> ``` ## Tokenizer Quark-135m v0.2 uses a custom bilingual BPE tokenizer ([ThingAI/QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer)) specifically designed for Italian and English: - **Vocabulary**: 65,536 tokens - **Type**: Byte-Pair Encoding (BPE) - **Languages**: Balanced Italian + English coverage - **Published**: [ThingAI/QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer) ## Usage ### Loading the Model Quark uses a custom architecture. To load and run inference: ```python import torch import json from safetensors.torch import load_file from transformers import AutoTokenizer # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-135m-v0.2") # Load model (requires custom architecture classes โ€” see repository) # Full architecture code available in the model repository ``` ### Generation Example ```python prompt = "<|user|>\nCos'รจ l'intelligenza artificiale?\n<|end|>\n<|assistant|>\n" ids = tokenizer.encode(prompt, return_tensors="pt").to("cuda") # Token-by-token generation with sampling with torch.no_grad(): for _ in range(200): logits = model(ids)[:, -1, :] / 0.7 # temperature topk = torch.topk(logits, 40) probs = torch.softmax(topk.values, -1) idx = topk.indices.gather(-1, torch.multinomial(probs, 1)) ids = torch.cat([ids, idx], -1) if idx.item() == tokenizer.eos_token_id: break print(tokenizer.decode(ids[0], skip_special_tokens=False)) ``` ## Limitations - **Scale**: At 135M parameters, the model has limited factual knowledge and reasoning capacity - **Hallucination**: The model frequently generates plausible but incorrect information - **Mathematics**: Cannot reliably perform arithmetic beyond simple operations - **Code**: Generates syntactically plausible but often non-functional code - **Vocabulary overhead**: The 65k vocabulary consumes ~26% of model parameters in the embedding layer, reducing transformer capacity โ€” a key lesson for v0.3 - **Pretraining plateau**: Loss plateaued at ~4.6 due to the vocab/parameter ratio imbalance ## Comparison with v0.1 | | Quark-135m v0.1 | Quark-135m v0.2 | |---|---|---| | **Tokenizer** | cosmo2 (49k) | QuarkTokenizer (65k) | | **Languages** | Math-focused (EN) | Bilingual IT+EN | | **Training Data** | 15B tokens (math-heavy) | 15.7B tokens (bilingual web + code) | | **Final Loss** | ~3.5-4.0 | 4.635 | | **Strengths** | Arithmetic, math reasoning | Italian fluency, bilingual chat | ## Citation ```bibtex @misc{quark2026, title={Quark: A Family of Compact Bilingual Language Models}, author={Di Nicola, Michelangelo}, year={2026}, publisher={ThingsAI}, url={https://huggingface.co/ThingAI/Quark-135m-v0.2} } ``` ## Links - ๐ŸŒ [ThingsAI Website](https://things-ai.org) - ๐Ÿ’ฌ [Things Chat](https://chat.things-ai.org) - ๐Ÿ”ค [QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer) - ๐Ÿ“Š [Open SLM Leaderboard](https://huggingface.co/spaces/AxiomicLabs/Open_SLM_Leaderboard) *Built from scratch by ThingsAI ๐Ÿ‡ฎ๐Ÿ‡น*