Text Generation
Transformers
Safetensors
Italian
English
quark
causal-lm
bilingual
italian
english
small-language-model
trained-from-scratch
instruct
sft
chat
conversational
custom_code
Instructions to use ThingAI/Quark-270m-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ThingAI/Quark-270m-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ThingAI/Quark-270m-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ThingAI/Quark-270m-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use ThingAI/Quark-270m-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ThingAI/Quark-270m-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ThingAI/Quark-270m-Instruct
- SGLang
How to use ThingAI/Quark-270m-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ThingAI/Quark-270m-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ThingAI/Quark-270m-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ThingAI/Quark-270m-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ThingAI/Quark-270m-Instruct with Docker Model Runner:
docker model run hf.co/ThingAI/Quark-270m-Instruct
| language: | |
| - it | |
| - en | |
| license: apache-2.0 | |
| tags: | |
| - text-generation | |
| - causal-lm | |
| - bilingual | |
| - italian | |
| - english | |
| - small-language-model | |
| - trained-from-scratch | |
| - quark | |
| - instruct | |
| - sft | |
| - chat | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| # Quark-270M-Instruct โ Bilingual Chat Model | |
| Quark-270M-Instruct is the **instruction-tuned** version of [Quark-270M Base](https://huggingface.co/ThingAI/Quark-270m-Base), fine-tuned for conversational use in Italian and English. Built entirely from scratch by [ThingsAI](https://things-ai.org). | |
| ## Quick Start | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "ThingAI/Quark-270m-Instruct", | |
| trust_remote_code=True, | |
| torch_dtype=torch.bfloat16 | |
| ).cuda() | |
| model.lm_head.weight = model.embed_tokens.weight # ensure weight tying | |
| tokenizer = AutoTokenizer.from_pretrained("ThingAI/Quark-270m-Instruct") | |
| prompt = "<|user|>\nCiao, come stai?\n<|end|>\n<|assistant|>\n" | |
| inputs = tokenizer(prompt, return_tensors="pt").to("cuda") | |
| out = model.generate(**inputs, max_new_tokens=150, do_sample=True, temperature=0.7, top_k=40) | |
| print(tokenizer.decode(out[0], skip_special_tokens=False)) | |
| ``` | |
| ## Chat Format | |
| ``` | |
| <|user|> | |
| {user message} | |
| <|end|> | |
| <|assistant|> | |
| {model response} | |
| <|end|> | |
| ``` | |
| Multi-turn: | |
| ``` | |
| <|user|> | |
| Ciao! | |
| <|end|> | |
| <|assistant|> | |
| Ciao! Come posso aiutarti? | |
| <|end|> | |
| <|user|> | |
| Cos'รจ l'intelligenza artificiale? | |
| <|end|> | |
| <|assistant|> | |
| ``` | |
| ## Model Details | |
| | | | | |
| |---|---| | |
| | **Base Model** | [Quark-270M Base](https://huggingface.co/ThingAI/Quark-270m-Base) | | |
| | **Parameters** | 252M (with weight tying) | | |
| | **Architecture** | Decoder-only Transformer (GQA, SwiGLU, RMSNorm, RoPE) | | |
| | **Vocabulary** | 65,537 tokens | | |
| | **Context Length** | 2,048 tokens | | |
| | **Precision** | BF16 | | |
| | **Languages** | Italian, English | | |
| ### Architecture | |
| | | | | |
| |---|---| | |
| | d_model | 768 | | |
| | Layers | 32 | | |
| | Query Heads | 12 | | |
| | KV Heads | 4 | | |
| | Head Dim | 64 | | |
| | FFN Dim | 2,048 | | |
| | Activation | SwiGLU | | |
| ## Training | |
| ### Base Pretraining | |
| ~10B tokens on a bilingual mix (Italian 50%, English 43%, Code 7%) on NVIDIA B200. See [Quark-270M Base](https://huggingface.co/ThingAI/Quark-270m-Base) for details. | |
| ### SFT (Instruction Tuning) | |
| Fine-tuned on a diverse mix of conversational and instructional data: | |
| | Dataset | Examples | Type | | |
| |---|---|---| | |
| | FreedomIntelligence/alpaca-gpt4-italian | ~52,000 | Italian instructions | | |
| | HuggingFaceH4/no_robots | ~9,500 | English conversations | | |
| | m-a-p/CodeFeedback-Filtered-Instruction | 5,000 | Code instructions | | |
| | yogeshm/text_to_bash (ร80) | ~9,900 | Terminal commands | | |
| | Custom chitchat (ร100) | ~3,000 | Identity, greetings, basic Q&A | | |
| | **Total** | **~80,000** | | | |
| | | | | |
| |---|---| | |
| | **Hardware** | NVIDIA B200 | | |
| | **Epochs** | 3 | | |
| | **Learning Rate** | 2e-5 (cosine decay) | | |
| | **Batch Size** | 16 ร 4 = 64 effective | | |
| | **Sequence Length** | 512 | | |
| ## Inference Server | |
| Quark-270M-Instruct powers [Things Chat](https://chat.things-ai.org) via a self-hosted FastAPI server with SSE streaming, conversation memory, web search, and content moderation. | |
| ## Limitations | |
| - **252M is small:** Limited factual knowledge, prone to hallucination | |
| - **Mathematics:** Unreliable beyond basic arithmetic | |
| - **Code:** Generates plausible but often non-functional code | |
| - **Context:** 2,048 token window | |
| - **No system prompt:** The model was not trained with `<|system|>` tags | |
| ### Good for | |
| - Self-hosted bilingual chatbot | |
| - Learning about LLM training from scratch | |
| - Terminal command assistance | |
| - Light conversational AI | |
| ### Not suited for | |
| - Factual Q&A requiring accuracy | |
| - Complex reasoning or math | |
| - Production-grade code generation | |
| - Safety-critical applications | |
| ## The Quark Family | |
| | Model | Parameters | Type | | |
| |---|---|---| | |
| | [Quark-50M](https://huggingface.co/ThingAI/Quark-50m) | 51M | Base | | |
| | [Quark-135M](https://huggingface.co/ThingAI/Quark-135m) | 135M | Base | | |
| | [Quark-270M Base](https://huggingface.co/ThingAI/Quark-270m-Base) | 252M | Base | | |
| | **Quark-270M-Instruct** | **252M** | **Chat** | | |
| ## Links | |
| - ๐ [ThingsAI](https://things-ai.org) | |
| - ๐ฌ [Things Chat](https://chat.things-ai.org) | |
| - ๐ค [QuarkTokenizer](https://huggingface.co/ThingAI/QuarkTokenizer) | |
| - ๐ [Open SLM Leaderboard](https://huggingface.co/spaces/AxiomicLabs/Open_SLM_Leaderboard) | |
| --- | |
| *Built from scratch by ThingsAI ๐ฎ๐น* |