File size: 6,615 Bytes
c64e50c
 
816b0a8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c64e50c
816b0a8
c64e50c
816b0a8
c64e50c
816b0a8
c64e50c
816b0a8
c64e50c
816b0a8
 
 
 
 
c64e50c
816b0a8
c64e50c
816b0a8
c64e50c
816b0a8
c64e50c
816b0a8
 
 
 
c64e50c
816b0a8
 
 
 
 
c64e50c
816b0a8
c64e50c
816b0a8
c64e50c
816b0a8
 
 
 
c64e50c
816b0a8
c64e50c
816b0a8
c64e50c
816b0a8
 
 
 
 
c64e50c
816b0a8
c64e50c
5713665
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
816b0a8
c64e50c
816b0a8
c64e50c
816b0a8
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
library_name: transformers
tags:
- chemistry
- biology
- finance
- legal
- music
- code
- art
- climate
- medical
- agent
- text-generation-inference
- duchifat-2
- hebrew
- AI
- conversational
- chatty
license: apache-2.0
language:
- he
- en
base_model:
- Raziel1234/Duchifat-2
pipeline_tag: text-generation
---
# ๐Ÿ•Š๏ธ Duchifat-2.3-Instruct: The Paradigm Shift in Hebrew AI

**Duchifat-2.3-Instruct** is a state-of-the-art, instruction-tuned Large Language Model developed by **TopAI**. As the flagship of the Duchifat series, this model represents a fundamental breakthrough in how Hebrew is processed, reasoned, and generated in the LLM era.

## ๐Ÿ’Ž The "Language-Native" Architecture

The core innovation of **Duchifat-2.3** lies in its **Language-Native Reasoning** engine. While most models suffer from a "Translation Gap"โ€”reasoning in English and translating to Hebrewโ€”Duchifat-2.3 was architected to bridge this divide.

### ๐Ÿง  Native Cognitive Processing
By optimizing the model's internal weights and tokenizer for Hebrew-specific structures, we have achieved a system that:
- **Internalizes Hebrew Logic:** The model's "Chain of Thought" is executed natively in Hebrew, preserving the unique semantic and syntactic nuances of the language.
- **Eliminates Syntactic Artifacts:** Unlike translated models, Duchifat-2.3 produces text that flows naturally, avoiding the stiff and robotic feel of English-to-Hebrew conversion.
- **Enhanced Token Efficiency:** The specialized architecture allows for a more dense and accurate representation of Hebrew text, leading to faster inference and better context retention.

---

## ๐Ÿš€ Advanced Instruction Tuning & Alignment

Duchifat-2.3-Instruct has undergone a sophisticated Supervised Fine-Tuning (SFT) process designed to transform a raw base model into a highly capable, mission-aligned assistant.

### ๐Ÿ›ก๏ธ Ethical Generalization & Safety
One of the model's most impressive feats is its ability to generalize safety protocols. It doesn't just rely on a static list of blocked words; it understands the **intent and context** of human interaction.
- **Zero-Shot Moderation:** The model can identify and appropriately handle offensive content, slurs, and harmful prompts it has never encountered during training.
- **Value-Locked Alignment:** The "TopAI" safety standards are deeply embedded, ensuring the model remains helpful, harmless, and honest across all domains.

### ๐Ÿค– Multi-Domain Mastery
The model is tuned to excel in diverse environments:
- **Technical & Scientific Research:** Deep understanding of AI architecture, software development, and complex data analysis.
- **Creative & Cultural Context:** Native fluency in Israeli idioms, professional drafting, and nuanced storytelling.
- **Logical Reasoning:** High performance in solving complex puzzles and following multi-stage instructions.

---

## ๐ŸŽจ The Duchifat Persona: A Digital Partner

We believe that interaction is as important as information. Duchifat-2.3-Instruct carries a unique, refined persona:
- **Quirky & Engaging:** It balances professional rigor with an approachable, brand-aligned voice.
- **Adaptive Tone:** Seamlessly shifts between formal technical documentation and casual, helpful conversation.
- **Identity-Aware:** The model "knows" who it is and remains consistent in its role as a specialized AI assistant.

---

## ๐Ÿ—๏ธ Technical Specifications

- **Developer:** TopAI
- **Architecture:** Causal Decoder-Only Transformer.
- **Primary Objective:** Hebrew-Native Instruction Following.
- **Secondary Capability:** Full English Fluency and Cross-Lingual reasoning.
- **Optimization:** Optimized for high-precision inference and minimal catastrophic forgetting.

---

## ๐Ÿ“Š Benchmark Results

The following evaluation was performed using `lm-evaluation-harness` (0-shot) to assess the model's core reasoning and common-sense capabilities.

| Task | Metric | Value | Significance |
| :--- | :--- | :--- | :--- |
| **PIQA** | Accuracy | **53.65%** | Above Random Guessing |
| **WinoGrande** | Accuracy | **52.25%** | Above Random Guessing |
| **ARC-Easy** | Accuracy (Norm) | **27.86%** | Baseline Performance |
| **HellaSwag** | Accuracy | **25.94%** | Baseline Performance |

**Analysis:**
Duchifat-2.3-Instruct shows its strongest performance in binary-choice logic tasks (**PIQA** and **WinoGrande**), consistently outperforming random chance. While multi-choice benchmarks like ARC and HellaSwag remain at baseline levels, this is a common trade-off for models aggressively fine-tuned for conversational alignment and Hebrew-native reasoning.

## Use

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# ื”ื’ื“ืจื•ืช - ื˜ืขื™ื ื” ืžื”-Hub
MODEL_ID = "razielAI/Duchifat-2.3-Instruct"
device = "cuda" if torch.cuda.is_available() else "cpu"

# ื˜ืขื™ื ื”
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    trust_remote_code=True,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32
).to(device)

def chat():
    print("โœจ Duchifat-2 Online (TopAI) | Type 'exit' to quit")
    while True:
        user_input = input("\n๐Ÿ‘ค User: ")
        if user_input.lower() in ["exit", "quit", "ื™ืฆื™ืื”"]: break

        # ื‘ื ื™ื™ืช ื”ืคืจื•ืžืคื˜ ืขื ื”ื˜ื•ืงื ื™ื ื”ืžื™ื•ื—ื“ื™ื
        prompt = f"<|instruction|>\n{user_input}\n<|assistant|>\n"
        inputs = tokenizer(prompt, return_tensors="pt").to(device)

        # ื™ืฆื™ืจื”
        with torch.no_grad():
            output_tokens = model.generate(
                **inputs,
                max_new_tokens=256,
                temperature=0.7,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id,
                eos_token_id=tokenizer.encode("<|eos|>", add_special_tokens=False)[0]
            )

        # ืคื™ืขื ื•ื— ื•ื”ืฆื’ืช ื”ืชืฉื•ื‘ื” ื‘ืœื‘ื“
        decoded = tokenizer.decode(output_tokens[0], skip_special_tokens=False)
        response = decoded.split("<|assistant|>")[-1].replace("<|eos|>", "").strip()

        print(f"๐Ÿค– Duchifat-2: {response}")

if __name__ == "__main__":
    chat()
```

## ๐ŸŒ Impact and Mission

Duchifat-2.3-Instruct is more than a model; it is a statement on the future of specialized AI. By proving that a dedicated, language-native approach can outperform general-purpose "translation" models, **TopAI** is setting a new standard for the Israeli and global tech ecosystem.

---
**Developed with technical excellence and linguistic precision by TopAI.**