Update README.md

4e29e2d verified about 1 month ago

4.15 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- causal-lm
	- text-generation
	- pretrained
	- tpu
	- sykollm
	base_model: SykoSLM/SykoLLM-V6.8
	---

	# SykoLLM-V6.9

	The most powerful model in the SykoLLM family — trained on 8 billion tokens.

	SykoLLM-V6.9 is a 391M parameter causal language model, trained from scratch on a carefully curated mixture of high-quality English datasets. It is the latest and most capable model in the SykoLLM series, surpassing all previous versions in both token count and training quality.

	---

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Architecture \| Causal Language Model (Phi-3 based) \|
	\| Parameters \| 391,857,152 \|
	\| Context Length \| 1,024 tokens \|
	\| Vocabulary Size \| 50,000 \|
	\| Hidden Size \| 1,024 \|
	\| Intermediate Size \| 3,072 \|
	\| Layers \| 24 \|
	\| Attention Heads \| 8 (GQA: 2 KV heads) \|
	\| Precision \| bfloat16 \|
	\| Language \| English only \|

	---

	## Training Details

	\| Property \| Value \|
	\|---\|---\|
	\| Total Tokens \| ~8 Billion \|
	\| Training Steps \| 30,000 \|
	\| Effective Batch Size \| 256 (16 × 2 × 8 cores) \|
	\| Learning Rate \| 4e-4 (cosine decay) \|
	\| Optimizer \| Adafactor \|
	\| Hardware \| Google TPU v5e-8 \|
	\| Precision \| bfloat16 (XLA native) \|
	\| Weight Decay \| 0.05 \|
	\| Warmup Steps \| 200 \|

	---

	## Training Data

	SykoLLM-V6.9 was trained on a curated mixture of 4 high-quality datasets, interleaved with carefully tuned sampling probabilities:

	\| Dataset \| Sampling \| Description \|
	\|---\|---\|---\|
	\| [openbmb/Ultra-FineWeb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb) \| 25% \| High-quality web text, scored and filtered \|
	\| [openbmb/Ultra-FineWeb-L3](https://huggingface.co/datasets/openbmb/Ultra-FineWeb-L3) \| 40% \| Multi-style synthetic English pretraining data \|
	\| [openbmb/UltraData-Math](https://huggingface.co/datasets/openbmb/UltraData-Math) \| 20% \| High-quality mathematical reasoning data \|
	\| [openbmb/UltraChat](https://huggingface.co/datasets/openbmb/UltraChat) \| 15% \| Multi-turn conversational data \|

	All datasets were filtered with a quality score threshold of ≥ 0.85 and additional heuristic filters to remove low-quality, noisy, or excessively long samples.

	---

	## Chat Format

	SykoLLM-V6.9 uses the following chat template:

	```
	<\|user\|>
	Your message here<\|end\|>
	<\|assistant\|>
	Model response here<\|end\|>
	```

	For multi-turn conversations:

	```
	<\|user\|>
	Hello, how are you?<\|end\|>
	<\|assistant\|>
	I'm doing great, thank you for asking!<\|end\|>
	<\|user\|>
	Can you help me with a math problem?<\|end\|>
	<\|assistant\|>
	Of course! What's the problem?<\|end\|>
	```

	---

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "SykoSLM/SykoLLM-V6.9"

	tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	trust_remote_code=True,
	)

	prompt = "<\|user\|>\nWhat is the capital of France?<\|end\|>\n<\|assistant\|>\n"
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	outputs = model.generate(
	**inputs,
	max_new_tokens=256,
	temperature=0.7,
	top_p=0.9,
	do_sample=True,
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=False))
	```

	---

	## SykoLLM Family

	\| Model \| Tokens \| Notes \|
	\|---\|---\|---\|
	\| SykoLLM-V6.9 \| ~8B \| Most powerful — current \|
	\| SykoLLM-V6.8 \| <8B \| Previous version \|
	\| SykoLLM-V6.6 \| <8B \| Earlier version \|

	---

	## Limitations

	- English only — the model was trained exclusively on English data and does not support other languages.
	- Context length is limited to 1,024 tokens.
	- As a base pretrained model, it may produce outputs that are inaccurate, biased, or inappropriate. Use with appropriate safety measures.
	- Not instruction-tuned — for best results, use the chat format described above.

	---

	## License

	This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).

	---

	Trained with ❤️ by [SykoSLM](https://huggingface.co/SykoSLM)