Instructions to use TopAI-1/Duchifat-2-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TopAI-1/Duchifat-2-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TopAI-1/Duchifat-2-Instruct", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("TopAI-1/Duchifat-2-Instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TopAI-1/Duchifat-2-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TopAI-1/Duchifat-2-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TopAI-1/Duchifat-2-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TopAI-1/Duchifat-2-Instruct

SGLang

How to use TopAI-1/Duchifat-2-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TopAI-1/Duchifat-2-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TopAI-1/Duchifat-2-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TopAI-1/Duchifat-2-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TopAI-1/Duchifat-2-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use TopAI-1/Duchifat-2-Instruct with Docker Model Runner:
```
docker model run hf.co/TopAI-1/Duchifat-2-Instruct
```

Raziel1234 commited on Mar 8

Commit

16e0c9e

verified ·

1 Parent(s): d2fc4ce

Update README.md

Browse files

Files changed (1) hide show

README.md +136 -3

README.md CHANGED Viewed

@@ -1,3 +1,136 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+language:
+- en
+- he
+base_model:
+- Raziel1234/Duchifat-2
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- chemistry
+- agent
+- medical
+- climate
+- code
+- art
+- music
+- legal
+- finance
+- biology
+- text-generation-inference
+- Pytorch
+- causal_lm
+---
+# 🚀 Duchifat-V2-Instruct (דוכיפת 2) | Official Model Card
+## 📝 Executive Summary
+**Duchifat-V2-Instruct** is a fine-tuned, instruction-following version of the Duchifat-V2 architecture (136M parameters). Developed by **TopAI**, this model is specifically optimized for creative content generation, bilingual dialogue, and task-oriented text processing.
+While the base model provides a massive knowledge foundation from 3.27 billion tokens, the **Instruct** version has undergone targeted fine-tuning to transform it from a "text completer" into a **creative writer** capable of following complex prompts with a unique, human-like voice.
+---
+## 🏗️ Technical Specifications
+| Component | Specification | Description |
+| :--- | :--- | :--- |
+| **Parameters** | 136 Million | Optimized for edge deployment and real-time inference. |
+| **Architecture** | Decoder-only Transformer | Enhanced for causal reasoning and fluency. |
+| **Layers / Heads** | 12 / 12 | Deep representation for nuanced semantics. |
+| **Context Window** | 1024 Tokens | Supports creative long-form generation. |
+| **Tokenizer** | DictaLM 2.0 | High-efficiency sub-word tokenization for Hebrew/English. |
+| **Training Phase** | Post-5 Epoch Instruct | Refined for instruction-following & EOS consistency. |
+---
+## 🎨 Model Capabilities & "The Creative Writer"
+Unlike standard small-scale models, **Duchifat-V2-Instruct** exhibits "Creative Personality." It excels at:
+* **Narrative Writing:** Crafting stories and monologues with emotional depth.
+* **Instruction Following:** Responding to specific system prompts and user constraints.
+* **Bilingual Versatility:** Seamlessly switching between Hebrew and English based on the prompt's linguistic context.
+* **Marketing & Copywriting:** Generating slogans, blog posts, and creative ads.
+> **Note:** Due to its training on the C4 corpus, the model retains a vast "general knowledge" base, allowing it to act as a sophisticated creative partner rather than a purely technical agent.
+---
+## 📊 Training Infrastructure
+* **Dataset:** Curated **C4** (3.27B Tokens) - 50% Hebrew, 50% English.
+* **Fine-Tuning:** Instruction-tuning on high-quality conversational and creative datasets.
+* **Optimization:** AdamW with a focus on preserving the pre-trained knowledge (Knowledge Retention).
+---
+## 💻 Implementation & Inference
+To utilize the Instruct capabilities, use the following structure:
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+MODEL_ID = "TopAI-1/Duchifat-2-Instruct"
+def run_duchifat_chat():
+    print("--- Loading Duchifat-2 (Post 5-Epoch Instruct Training) ---")
+    tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
+    model = AutoModelForCausalLM.from_pretrained(
+        MODEL_ID,
+        trust_remote_code=True,
+        torch_dtype=torch.bfloat16,
+        device_map="auto"
+    )
+    model.eval()
+    model.config.use_cache = False
+    chat_history = []
+    print("--- Model Ready! ---")
+    while True:
+        user_input = input("\nהכנס הוראה (או 'יציאה'): ")
+        if user_input.lower() in ["exit", "quit", "יציאה"]:
+            break
+        # Add current instruction to memory
+        chat_history.append(f"Instruction: {user_input}")
+        # Build prompt with history
+        full_prompt = "\n".join(chat_history) + "\nContent:"
+        inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
+        # Context Window Protection (Max 1024 tokens)
+        if inputs.input_ids.shape[1] > 850:
+            chat_history = chat_history[2:] # Trim oldest turn
+            full_prompt = "\n".join(chat_history) + "\nContent:"
+            inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)
+        with torch.no_grad():
+            output_tokens = model.generate(
+                input_ids=inputs.input_ids,
+                attention_mask=inputs.attention_mask,
+                max_new_tokens=300, # Increased for creative writing
+                do_sample=True,
+                temperature=0.75,
+                top_p=0.9,
+                repetition_penalty=1.15,
+                pad_token_id=tokenizer.eos_token_id,
+                use_cache=False
+            )
+        full_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
+        # Extract only the latest response
+        parts = full_text.split("Content:")
+        answer = parts[-1].strip()
+        # Save response to history for context
+        chat_history.append(f"Content: {answer}")
+        print(f"\nדוכיפת-2: {answer}")
+if __name__ == "__main__":
+    run_duchifat_chat()