Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -75,31 +75,6 @@ domain‑specific tasks – for instance, a customer‑support bot, a code revie
 - Short context window (2,048 tokens).
 - Small size means it can make more factual mistakes than larger models.
-## How to Get Started
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-model_name = "OvercastLab/Quark-50m-Instruct"
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
-messages = [
-    {"role": "system", "content": "You are Quark, a helpful assistant."},
-    {"role": "user", "content": "Explain group query attention in one sentence."}
-]
-inputs = tokenizer.apply_chat_template(
-    messages,
-    tokenize=True,
-    add_generation_prompt=True,
-    return_tensors="pt"
-).to(model.device)
-outputs = model.generate(inputs, max_new_tokens=128)
-print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 ## Training Details
 ### Pretraining
@@ -136,4 +111,29 @@ All data was tokenised with the official [Cosmo2 tokenizer](https://huggingface.
 ### Instruction Fine‑tuning
 The base model was fine‑tuned on a curated set of instruction‑following data (details to be released).
-The fine‑tuning used **LoRA** with the same sequence length and a lower learning rate (1e‑4) for a few thousand steps.

 - Short context window (2,048 tokens).
 - Small size means it can make more factual mistakes than larger models.
 ## Training Details
 ### Pretraining
 ### Instruction Fine‑tuning
 The base model was fine‑tuned on a curated set of instruction‑following data (details to be released).
+The fine‑tuning used **LoRA** with the same sequence length and a lower learning rate (1e‑4) for a few thousand steps.
+## How to Use
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+model_name = "OvercastLab/Quark-50m-Instruct"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
+messages = [
+    {"role": "system", "content": "You are Quark, a helpful assistant."},
+    {"role": "user", "content": "Explain group query attention in one sentence."}
+]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_tensors="pt"
+).to(model.device)
+outputs = model.generate(inputs, max_new_tokens=128)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))