Update README.md

Browse files

Files changed (1) hide show

README.md +56 -36

README.md CHANGED Viewed

@@ -1,53 +1,73 @@
----
-library_name: transformers
-model_name: sft_conv
-tags:
-- generated_from_trainer
-- trl
-- sft
-licence: license
----
-## Quick start
 ```python
-from transformers import pipeline
-question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="None", device="cuda")
-output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
-print(output["generated_text"])
-```
-## Training procedure
-This model was trained with SFT.
-### Framework versions
-- TRL: 1.2.0
-- Transformers: 5.6.1
-- Pytorch: 2.4.1+cu124
-- Datasets: 4.8.4
-- Tokenizers: 0.22.2
-## Citations
-Cite TRL as:
-```bibtex
-@software{vonwerra2020trl,
-  title   = {{TRL: Transformers Reinforcement Learning}},
-  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
-  license = {Apache-2.0},
-  url     = {https://github.com/huggingface/trl},
-  year    = {2020}
 }
-```

+# Quark (50M)
+Quark is a lightweight decoder-only language model with approximately 50 million parameters. It is designed for efficient inference on consumer hardware while maintaining reasonable language understanding and generation capabilities.
+## Model Description
+- **Architecture:** SmolLM-style (Grouped-Query Attention, SwiGLU, RMSNorm, RoPE, deep-thin)
+- **Parameters:** ~50M
+- **Context length:** 2048 tokens
+- **Vocabulary size:** 49,152 (HuggingFaceTB/cosmo2-tokenizer)
+- **Training data:** HuggingFaceTB/smollm-corpus (5B tokens total)
+  - 60% cosmopedia-v2
+  - 30% python-edu
+  - 10% fineweb-edu
+- **Hardware:** RTX 3070 (8 GB VRAM)
+- **License:** MIT
+## Intended Uses
+- Lightweight on-device chat
+- Educational experiments with small LMs
+- Fine-tuning for specific tasks (instruction following, code generation, etc.)
+## How to Use
 ```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_name = "OvercastLab/Quark-50m-Instruct"
+model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+inputs = tokenizer("Hello, how are you?", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=50)
+print(tokenizer.decode(outputs[0]))
+Training Details
+    Effective batch size: 64 sequences per step (4 micro-batches × 16 gradient accumulation)
+    Learning rate: 3e-4 (cosine decay to 3e-5)
+    Optimizer: AdamW (β1=0.9, β2=0.95, weight decay=0.1)
+    Precision: bfloat16
+    Total tokens: 5 billion
+    Training steps: ~1.2 million
+    Checkpoint frequency: every 2,000 steps
+Limitations
+    Small parameter count limits factual knowledge and reasoning depth.
+    May produce repetitive or nonsensical outputs when prompted outside its training distribution.
+    The base model is not instruction-tuned; use the -Instruct variant for conversational tasks.
+Citation
+If you use Quark in your work, please cite:
+bibtex
+@misc{quark2025,
+  author = {OvercastLab},
+  title = {Quark: A 50M Parameter Lightweight Language Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/OvercastLab/Quark-50m-Instruct}}
 }