Text Generation
Transformers
Safetensors
Italian
English
quark
causal-lm
bilingual
italian
english
small-language-model
trained-from-scratch
conversational
custom_code
Instructions to use ThingAI/Quark-270m-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ThingAI/Quark-270m-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ThingAI/Quark-270m-Base", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ThingAI/Quark-270m-Base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ThingAI/Quark-270m-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ThingAI/Quark-270m-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ThingAI/Quark-270m-Base
- SGLang
How to use ThingAI/Quark-270m-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ThingAI/Quark-270m-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ThingAI/Quark-270m-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Base", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ThingAI/Quark-270m-Base with Docker Model Runner:
docker model run hf.co/ThingAI/Quark-270m-Base
| language: | |
| - it | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - text-generation | |
| - causal-lm | |
| - bilingual | |
| - italian | |
| - english | |
| - small-language-model | |
| - trained-from-scratch | |
| - quark | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # Quark-270M Base — Bilingual Italian-English Language Model | |
| Quark-270M Base is a compact bilingual language model for Italian and English, built entirely from scratch by [ThingsAI](https://things-ai.org). This is the raw pretrained model optimized for text completion. | |
| For conversational use, see [**Quark-270M-Instruct**](https://huggingface.co/ThingAI/Quark-270m-Instruct). | |
| ## Model Details | |
| | | | | |
| |---|---| | |
| | **Parameters** | 252M (with weight tying) | | |
| | **Architecture** | Decoder-only Transformer | | |
| | **Vocabulary** | 65,537 tokens (QuarkTokenizer, bilingual BPE) | | |
| | **Context Length** | 2,048 tokens | | |
| | **Precision** | BF16 | | |
| | **Languages** | Italian, English | | |
| | **License** | Apache 2.0 | | |
| ## Architecture | |
| | Component | Details | | |
| |---|---| | |
| | Model Dimension | 768 | | |
| | Layers | 32 | | |
| | Attention | Grouped Query Attention (GQA) | | |
| | Query Heads | 12 | | |
| | KV Heads | 4 (3:1 ratio) | | |
| | Head Dimension | 64 | | |
| | FFN Dimension | 2,048 | | |
| | FFN Activation | SwiGLU | | |
| | Normalization | RMSNorm (pre-norm) | | |
| | Positional Encoding | RoPE (θ=10,000) | | |
| | Weight Tying | embed_tokens ↔ lm_head | | |
| ## Pretraining | |
| ### Data | |
| Trained on **~10B tokens** from a curated bilingual mix: | |
| | Subset | Weight | Source | | |
| |---|---|---| | |
| | FineWeb-2 (Italian) | 29% | `HuggingFaceFW/fineweb-2` [ita_Latn] | | |
| | CulturaX (Italian) | 14% | `uonlp/CulturaX` [it] | | |
| | Wikipedia (Italian) | 7% | `wikimedia/wikipedia` [20231101.it] | | |
| | FineWeb (English) | 36% | `HuggingFaceFW/fineweb` [sample-10BT] | | |
| | Wikipedia (English) | 7% | `wikimedia/wikipedia` [20231101.en] | | |
| | The Stack (Code) | 7% | `bigcode/the-stack-smol` | | |
| **Language split:** Italian 50% · English 43% · Code 7% | |
| ### Training Configuration | |
| | | | | |
| |---|---| | |
| | **Hardware** | NVIDIA B200 | | |
| | **Total Tokens** | ~10B | | |
| | **Batch Size** | 64 × 4 grad accum = 256 sequences | | |
| | **Sequence Length** | 2,048 | | |
| | **Learning Rate** | 3e-4 → 3e-5 (cosine) | | |
| | **Warmup Steps** | 1,000 | | |
| | **Optimizer** | AdamW (β₁=0.9, β₂=0.95) | | |
| | **Precision** | BF16 mixed precision | | |
| | **Throughput** | ~281k tokens/sec | | |
| | **Training Time** | ~10 hours | | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "ThingAI/Quark-270m-Base", | |
| trust_remote_code=True, | |
| torch_dtype="bfloat16" | |
| ).cuda() | |
| tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-270m-Base") | |
| inputs = tokenizer("L'Italia è un paese", return_tensors="pt").to("cuda") | |
| out = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7, top_k=40) | |
| print(tokenizer.decode(out[0])) | |
| ``` | |
| > **Note:** This is a base model for text completion. For chat and instructions, use [Quark-270M-Instruct](https://huggingface.co/ThingAI/Quark-270m-Instruct). | |
| ## Limitations | |
| - **Scale:** 252M parameters limits factual knowledge and complex reasoning | |
| - **Hallucination:** Generates plausible but often incorrect information | |
| - **Mathematics:** Limited arithmetic capabilities | |
| - **Code:** Can produce syntactically plausible but often non-functional code | |
| ## The Quark Family | |
| | Model | Parameters | Type | | |
| |---|---|---| | |
| | [Quark-50M](https://huggingface.co/ThingAI/Quark-50m) | 51M | Base | | |
| | [Quark-135M](https://huggingface.co/ThingAI/Quark-135m) | 135M | Base | | |
| | **Quark-270M Base** | **252M** | **Base** | | |
| | [Quark-270M-Instruct](https://huggingface.co/ThingAI/Quark-270m-Instruct) | 252M | Chat | | |
| ## Links | |
| - 🌐 [ThingsAI](https://things-ai.org) | |
| - 💬 [Things Chat](https://chat.things-ai.org) | |
| - 🔤 [QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer) | |
| --- | |
| *Built from scratch by ThingsAI 🇮🇹* |