| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - causal-lm |
| - text-generation |
| - pretrained |
| - tpu |
| - sykollm |
| base_model: SykoSLM/SykoLLM-V6.8 |
| --- |
| |
| # SykoLLM-V6.9 |
|
|
| **The most powerful model in the SykoLLM family — trained on 8 billion tokens.** |
|
|
| SykoLLM-V6.9 is a 391M parameter causal language model, trained from scratch on a carefully curated mixture of high-quality English datasets. It is the latest and most capable model in the SykoLLM series, surpassing all previous versions in both token count and training quality. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | **Architecture** | Causal Language Model (Phi-3 based) | |
| | **Parameters** | 391,857,152 | |
| | **Context Length** | 1,024 tokens | |
| | **Vocabulary Size** | 50,000 | |
| | **Hidden Size** | 1,024 | |
| | **Intermediate Size** | 3,072 | |
| | **Layers** | 24 | |
| | **Attention Heads** | 8 (GQA: 2 KV heads) | |
| | **Precision** | bfloat16 | |
| | **Language** | English only | |
|
|
| --- |
|
|
| ## Training Details |
|
|
| | Property | Value | |
| |---|---| |
| | **Total Tokens** | ~8 Billion | |
| | **Training Steps** | 30,000 | |
| | **Effective Batch Size** | 256 (16 × 2 × 8 cores) | |
| | **Learning Rate** | 4e-4 (cosine decay) | |
| | **Optimizer** | Adafactor | |
| | **Hardware** | Google TPU v5e-8 | |
| | **Precision** | bfloat16 (XLA native) | |
| | **Weight Decay** | 0.05 | |
| | **Warmup Steps** | 200 | |
|
|
| --- |
|
|
| ## Training Data |
|
|
| SykoLLM-V6.9 was trained on a curated mixture of 4 high-quality datasets, interleaved with carefully tuned sampling probabilities: |
|
|
| | Dataset | Sampling | Description | |
| |---|---|---| |
| | [openbmb/Ultra-FineWeb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb) | 25% | High-quality web text, scored and filtered | |
| | [openbmb/Ultra-FineWeb-L3](https://huggingface.co/datasets/openbmb/Ultra-FineWeb-L3) | 40% | Multi-style synthetic English pretraining data | |
| | [openbmb/UltraData-Math](https://huggingface.co/datasets/openbmb/UltraData-Math) | 20% | High-quality mathematical reasoning data | |
| | [openbmb/UltraChat](https://huggingface.co/datasets/openbmb/UltraChat) | 15% | Multi-turn conversational data | |
|
|
| All datasets were filtered with a quality score threshold of ≥ 0.85 and additional heuristic filters to remove low-quality, noisy, or excessively long samples. |
|
|
| --- |
|
|
| ## Chat Format |
|
|
| SykoLLM-V6.9 uses the following chat template: |
|
|
| ``` |
| <|user|> |
| Your message here<|end|> |
| <|assistant|> |
| Model response here<|end|> |
| ``` |
|
|
| For multi-turn conversations: |
|
|
| ``` |
| <|user|> |
| Hello, how are you?<|end|> |
| <|assistant|> |
| I'm doing great, thank you for asking!<|end|> |
| <|user|> |
| Can you help me with a math problem?<|end|> |
| <|assistant|> |
| Of course! What's the problem?<|end|> |
| ``` |
|
|
| --- |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model_id = "SykoSLM/SykoLLM-V6.9" |
| |
| tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_id, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| trust_remote_code=True, |
| ) |
| |
| prompt = "<|user|>\nWhat is the capital of France?<|end|>\n<|assistant|>\n" |
| inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| |
| with torch.no_grad(): |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=256, |
| temperature=0.7, |
| top_p=0.9, |
| do_sample=True, |
| ) |
| |
| print(tokenizer.decode(outputs[0], skip_special_tokens=False)) |
| ``` |
|
|
| --- |
|
|
| ## SykoLLM Family |
|
|
| | Model | Tokens | Notes | |
| |---|---|---| |
| | SykoLLM-V6.9 | **~8B** | **Most powerful — current** | |
| | SykoLLM-V6.8 | <8B | Previous version | |
| | SykoLLM-V6.6 | <8B | Earlier version | |
|
|
| --- |
|
|
| ## Limitations |
|
|
| - **English only** — the model was trained exclusively on English data and does not support other languages. |
| - **Context length** is limited to 1,024 tokens. |
| - As a base pretrained model, it may produce outputs that are inaccurate, biased, or inappropriate. Use with appropriate safety measures. |
| - Not instruction-tuned — for best results, use the chat format described above. |
|
|
| --- |
|
|
| ## License |
|
|
| This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). |
|
|
| --- |
|
|
| *Trained with ❤️ by [SykoSLM](https://huggingface.co/SykoSLM)* |
|
|