Update README.md
Browse files
README.md
CHANGED
|
@@ -22,29 +22,29 @@ datasets:
|
|
| 22 |
|
| 23 |
# Ophiuchi-Qwen3-14B-Instruct
|
| 24 |
|
| 25 |
-
> Ophiuchi-Qwen3-14B-Instruct is built upon the Qwen3-14B architecture
|
| 26 |
|
| 27 |
-
Key
|
| 28 |
|
| 29 |
-
1.
|
| 30 |
-
|
| 31 |
|
| 32 |
-
2.
|
| 33 |
-
|
| 34 |
|
| 35 |
-
3.
|
| 36 |
-
|
| 37 |
|
| 38 |
4. Long-Context Support
|
| 39 |
-
|
| 40 |
|
| 41 |
-
5. Instruction-Tuned
|
| 42 |
-
|
| 43 |
|
| 44 |
-
6. Multilingual
|
| 45 |
-
Supports over 29
|
| 46 |
|
| 47 |
-
|
| 48 |
|
| 49 |
```python
|
| 50 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
@@ -58,7 +58,8 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 58 |
)
|
| 59 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 60 |
|
| 61 |
-
prompt = "
|
|
|
|
| 62 |
messages = [
|
| 63 |
{"role": "system", "content": "You are a highly capable assistant focused on reasoning, coding, and factual precision."},
|
| 64 |
{"role": "user", "content": prompt}
|
|
@@ -84,19 +85,26 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
|
|
| 84 |
print(response)
|
| 85 |
```
|
| 86 |
|
| 87 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 88 |
|
| 89 |
-
|
| 90 |
-
* Code generation and debugging in multiple languages
|
| 91 |
-
* Technical documentation and instruction parsing
|
| 92 |
-
* Structured outputs (e.g., tables, JSON, config files)
|
| 93 |
-
* Factual Q\&A and explanation tasks
|
| 94 |
-
* Educational support and multilingual tutoring
|
| 95 |
|
| 96 |
-
|
| 97 |
|
| 98 |
-
|
| 99 |
-
* May still produce factual inaccuracies on niche or adversarial prompts
|
| 100 |
-
* Sensitive to prompt phrasing; structured prompts yield better results
|
| 101 |
-
* Long outputs may propagate initial reasoning flaws
|
| 102 |
-
* Creative tasks (e.g., fiction writing) may be less consistent
|
|
|
|
| 22 |
|
| 23 |
# Ophiuchi-Qwen3-14B-Instruct
|
| 24 |
|
| 25 |
+
> Ophiuchi-Qwen3-14B-Instruct is built upon the Qwen3-14B architecture and uses the Qwen3ForCausalLM backbone. It is instruction-tuned to enhance capabilities in mathematical reasoning, code generation, and factual accuracy. By leveraging high-quality datasets and long-context architectures, this model is designed to excel in solving complex reasoning tasks and generating accurate, structured content across multiple domains.
|
| 26 |
|
| 27 |
+
## Key Features
|
| 28 |
|
| 29 |
+
1. Mathematical and Logical Reasoning
|
| 30 |
+
Fine-tuned to perform step-by-step reasoning, symbolic logic, and advanced mathematics, supporting educational and technical use cases.
|
| 31 |
|
| 32 |
+
2. Code Generation and Understanding
|
| 33 |
+
Optimized for writing, interpreting, and debugging code across various programming languages, including Python, JavaScript, and C++.
|
| 34 |
|
| 35 |
+
3. Factual Integrity and Precision
|
| 36 |
+
Trained on curated and aligned datasets to enhance accuracy and reduce hallucination in fact-based tasks.
|
| 37 |
|
| 38 |
4. Long-Context Support
|
| 39 |
+
Capable of handling up to 128K tokens as input with output generation up to 8K tokens, enabling detailed and comprehensive responses over extended sequences.
|
| 40 |
|
| 41 |
+
5. Instruction-Tuned Alignment
|
| 42 |
+
Demonstrates a strong ability to follow multi-step instructions, maintain conversation context, and produce structured outputs across sessions.
|
| 43 |
|
| 44 |
+
6. Multilingual Proficiency
|
| 45 |
+
Supports over 29 languages including English, Chinese, French, Spanish, Arabic, Russian, Japanese, Korean, and others, enabling global communication and translation tasks.
|
| 46 |
|
| 47 |
+
## Quickstart with Transformers
|
| 48 |
|
| 49 |
```python
|
| 50 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 58 |
)
|
| 59 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
| 60 |
|
| 61 |
+
prompt = "Explain the principles of alignment in large language models."
|
| 62 |
+
|
| 63 |
messages = [
|
| 64 |
{"role": "system", "content": "You are a highly capable assistant focused on reasoning, coding, and factual precision."},
|
| 65 |
{"role": "user", "content": prompt}
|
|
|
|
| 85 |
print(response)
|
| 86 |
```
|
| 87 |
|
| 88 |
+
## Intended Use
|
| 89 |
+
|
| 90 |
+
* Mathematical and symbolic problem solving
|
| 91 |
+
* Code generation and explanation
|
| 92 |
+
* Structured response generation in JSON, Markdown, or table formats
|
| 93 |
+
* Long-form technical writing and documentation
|
| 94 |
+
* Factual question answering and fact-checking
|
| 95 |
+
* Educational assistance across STEM domains
|
| 96 |
+
* Multilingual conversation and translation tasks
|
| 97 |
+
|
| 98 |
+
## Limitations
|
| 99 |
+
|
| 100 |
+
* High computational requirements (A100/H100-class GPUs recommended)
|
| 101 |
+
* May still produce hallucinated facts on edge cases or adversarial inputs
|
| 102 |
+
* Sensitive to poorly structured or ambiguous prompts
|
| 103 |
+
* Early-stage errors may propagate in long outputs
|
| 104 |
+
* Less suitable for creative fiction or subjective narrative tasks
|
| 105 |
|
| 106 |
+
## References
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
|
| 108 |
+
1. Saxton, D., Grefenstette, E., Hill, F., & Blunsom, P. (2019). Analysing Mathematical Reasoning Abilities of Neural Models. arXiv:1904.01557. [https://arxiv.org/pdf/1904.01557](https://arxiv.org/pdf/1904.01557)
|
| 109 |
|
| 110 |
+
2. Chen, X., Zheng, S., & Liu, Z. (2023). YaRN: Efficient Context Window Extension of Large Language Models. arXiv:2309.00071. [https://arxiv.org/pdf/2309.00071](https://arxiv.org/pdf/2309.00071)
|
|
|
|
|
|
|
|
|
|
|
|