| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - moe |
| - casual-llm |
| - low-latency |
| - kitefish |
| - reasoning |
| - reasoning-l1 |
| --- |
| |
| # Reasoning-L1-10B |
|
|
| **Reasoning-L1-10B** is a compact, efficient casual language model by KiteFishAI, |
| designed for low-latency inference on consumer hardware with strong reasoning |
| and agentic capabilities. |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |---|---| |
| | Model | Reasoning-L1 | |
| | Parameters | ~11.4B | |
| | Architecture | Mixture-of-Experts (MoE) | |
| | VRAM required | ~8GB | |
| | Context length | 131K tokens | |
| | License | Apache 2.0 | |
|
|
| ## Capabilities |
|
|
| - Configurable reasoning effort (low / medium / high) |
| - Full chain-of-thought support |
| - Function calling and structured outputs |
| - Web browsing and Python code execution (agentic) |
| - Multilingual support |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| import torch |
| |
| model = AutoModelForCausalLM.from_pretrained( |
| "KiteFishAI/Reasoning-L1-10B", |
| torch_dtype=torch.float16, |
| device_map="auto", |
| trust_remote_code=True, |
| ) |
| tokenizer = AutoTokenizer.from_pretrained( |
| "KiteFishAI/Reasoning-L1-10B", |
| trust_remote_code=True, |
| ) |
| |
| inputs = tokenizer("Hello! What can you do?", return_tensors="pt").to(model.device) |
| outputs = model.generate(**inputs, max_new_tokens=200) |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
| ``` |
|
|
| ## Ollama |
|
|
| ```bash |
| ollama pull KiteFishAI/Reasoning-L1-10B |
| ollama run KiteFishAI/Reasoning-L1-10B |
| ``` |
|
|
| ## License |
|
|
| Apache 2.0 — free to use, modify, and distribute commercially. |
|
|
| --- |
| *Built by [KiteFishAI](https://huggingface.co/KiteFishAI)* |
|
|