Instructions to use ThingAI/Quark-270m-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ThingAI/Quark-270m-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ThingAI/Quark-270m-Base", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ThingAI/Quark-270m-Base", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ThingAI/Quark-270m-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ThingAI/Quark-270m-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ThingAI/Quark-270m-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ThingAI/Quark-270m-Base

SGLang

How to use ThingAI/Quark-270m-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ThingAI/Quark-270m-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ThingAI/Quark-270m-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ThingAI/Quark-270m-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ThingAI/Quark-270m-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ThingAI/Quark-270m-Base with Docker Model Runner:
```
docker model run hf.co/ThingAI/Quark-270m-Base
```

Quark-270m-Base / README.md

ThingsAI

Update README.md

858ce94 verified 5 days ago

preview code

raw

history blame contribute delete

3.87 kB

metadata

language:
  - it
  - en
license: apache-2.0
tags:
  - text-generation
  - causal-lm
  - bilingual
  - italian
  - english
  - small-language-model
  - trained-from-scratch
  - quark
library_name: transformers
pipeline_tag: text-generation

Quark-270M Base — Bilingual Italian-English Language Model

Quark-270M Base is a compact bilingual language model for Italian and English, built entirely from scratch by ThingsAI. This is the raw pretrained model optimized for text completion. For conversational use, see Quark-270M-Instruct.

Model Details


Parameters	252M (with weight tying)
Architecture	Decoder-only Transformer
Vocabulary	65,537 tokens (QuarkTokenizer, bilingual BPE)
Context Length	2,048 tokens
Precision	BF16
Languages	Italian, English
License	Apache 2.0

Architecture

Component	Details
Model Dimension	768
Layers	32
Attention	Grouped Query Attention (GQA)
Query Heads	12
KV Heads	4 (3:1 ratio)
Head Dimension	64
FFN Dimension	2,048
FFN Activation	SwiGLU
Normalization	RMSNorm (pre-norm)
Positional Encoding	RoPE (θ=10,000)
Weight Tying	embed_tokens ↔ lm_head

Pretraining

Data

Trained on ~10B tokens from a curated bilingual mix:

Subset	Weight	Source
FineWeb-2 (Italian)	29%	`HuggingFaceFW/fineweb-2` [ita_Latn]
CulturaX (Italian)	14%	`uonlp/CulturaX` [it]
Wikipedia (Italian)	7%	`wikimedia/wikipedia` [20231101.it]
FineWeb (English)	36%	`HuggingFaceFW/fineweb` [sample-10BT]
Wikipedia (English)	7%	`wikimedia/wikipedia` [20231101.en]
The Stack (Code)	7%	`bigcode/the-stack-smol`

Language split: Italian 50% · English 43% · Code 7%

Training Configuration


Hardware	NVIDIA B200
Total Tokens	~10B
Batch Size	64 × 4 grad accum = 256 sequences
Sequence Length	2,048
Learning Rate	3e-4 → 3e-5 (cosine)
Warmup Steps	1,000
Optimizer	AdamW (β₁=0.9, β₂=0.95)
Precision	BF16 mixed precision
Throughput	~281k tokens/sec
Training Time	~10 hours

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "ThingAI/Quark-270m-Base",
    trust_remote_code=True,
    torch_dtype="bfloat16"
).cuda()

tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-270m-Base")

inputs = tokenizer("L'Italia è un paese", return_tensors="pt").to("cuda")
out = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.7, top_k=40)
print(tokenizer.decode(out[0]))

Note: This is a base model for text completion. For chat and instructions, use Quark-270M-Instruct.

Limitations

Scale: 252M parameters limits factual knowledge and complex reasoning
Hallucination: Generates plausible but often incorrect information
Mathematics: Limited arithmetic capabilities
Code: Can produce syntactically plausible but often non-functional code

The Quark Family

Model	Parameters	Type
Quark-50M	51M	Base
Quark-135M	135M	Base
Quark-270M Base	252M	Base
Quark-270M-Instruct	252M	Chat

Links

Built from scratch by ThingsAI 🇮🇹