Instructions to use TopAI-1/Duchifat-2-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TopAI-1/Duchifat-2-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TopAI-1/Duchifat-2-Instruct", trust_remote_code=True)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("TopAI-1/Duchifat-2-Instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TopAI-1/Duchifat-2-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TopAI-1/Duchifat-2-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TopAI-1/Duchifat-2-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/TopAI-1/Duchifat-2-Instruct

SGLang

How to use TopAI-1/Duchifat-2-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TopAI-1/Duchifat-2-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TopAI-1/Duchifat-2-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TopAI-1/Duchifat-2-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TopAI-1/Duchifat-2-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use TopAI-1/Duchifat-2-Instruct with Docker Model Runner:
```
docker model run hf.co/TopAI-1/Duchifat-2-Instruct
```

Duchifat-2-Instruct / README.md

razielAI

Update README.md

f47f1ad verified 2 months ago

preview code

raw

history blame

5.9 kB

	---
	license: apache-2.0
	language:
	- en
	- he
	base_model:
	- Raziel1234/Duchifat-2
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- chemistry
	- agent
	- medical
	- climate
	- code
	- art
	- music
	- legal
	- finance
	- biology
	- text-generation-inference
	- Pytorch
	- causal_lm
	---

	![image](https://cdn-uploads.huggingface.co/production/uploads/6883b03536f0c50bc30bda75/lQPHWEv0giJikJjItR6-X.png)

	# 🚀 Duchifat-V2-Instruct (דוכיפת 2) \| Official Model Card

	## 📝 Executive Summary
	Duchifat-V2-Instruct is a fine-tuned, instruction-following version of the Duchifat-V2 architecture (136M parameters). Developed by TopAI, this model is specifically optimized for creative content generation, bilingual dialogue, and task-oriented text processing.

	While the base model provides a massive knowledge foundation from 3.27 billion tokens, the Instruct version has undergone targeted fine-tuning to transform it from a "text completer" into a creative writer capable of following complex prompts with a unique, human-like voice.

	---

	## 🏗️ Technical Specifications
	\| Component \| Specification \| Description \|
	\| :--- \| :--- \| :--- \|
	\| Parameters \| 136 Million \| Optimized for edge deployment and real-time inference. \|
	\| Architecture \| Decoder-only Transformer \| Enhanced for causal reasoning and fluency. \|
	\| Layers / Heads \| 12 / 12 \| Deep representation for nuanced semantics. \|
	\| Context Window \| 1024 Tokens \| Supports creative long-form generation. \|
	\| Tokenizer \| DictaLM 2.0 \| High-efficiency sub-word tokenization for Hebrew/English. \|
	\| Training Phase \| Post-5 Epoch Instruct \| Refined for instruction-following & EOS consistency. \|

	---

	## Benchmarks
	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \|Value\| \|Stderr\|
	\|----------\|------:\|------\|-----:\|--------\|---\|----:\|---\|-----:\|
	\|arc_easy \| 1\|none \| 0\|acc \|↑ \| 0.28\|± \|0.0641\|
	\| \| \|none \| 0\|acc_norm\|↑ \| 0.28\|± \|0.0641\|
	\|boolq \| 2\|none \| 0\|acc \|↑ \| 0.22\|± \|0.0592\|
	\|copa \| 1\|none \| 0\|acc \|↑ \| 0.56\|± \|0.0709\|
	\|hellaswag \| 1\|none \| 0\|acc \|↑ \| 0.32\|± \|0.0666\|
	\| \| \|none \| 0\|acc_norm\|↑ \| 0.34\|± \|0.0677\|
	\|piqa \| 1\|none \| 0\|acc \|↑ \| 0.62\|± \|0.0693\|
	\| \| \|none \| 0\|acc_norm\|↑ \| 0.48\|± \|0.0714\|
	\|winogrande\| 1\|none \| 0\|acc \|↑ \| 0.52\|± \|0.0714\|

	## 🎨 Model Capabilities & "The Creative Writer"
	Unlike standard small-scale models, Duchifat-V2-Instruct exhibits "Creative Personality." It excels at:
	* Narrative Writing: Crafting stories and monologues with emotional depth.
	* Instruction Following: Responding to specific system prompts and user constraints.
	* Bilingual Versatility: Seamlessly switching between Hebrew and English based on the prompt's linguistic context.
	* Marketing & Copywriting: Generating slogans, blog posts, and creative ads.

	> Note: Due to its training on the C4 corpus, the model retains a vast "general knowledge" base, allowing it to act as a sophisticated creative partner rather than a purely technical agent.

	---

	## 📊 Training Infrastructure
	* Dataset: Curated C4 (3.27B Tokens) - 50% Hebrew, 50% English.
	* Fine-Tuning: Instruction-tuning on high-quality conversational and creative datasets.
	* Optimization: AdamW with a focus on preserving the pre-trained knowledge (Knowledge Retention).

	---

	## 💻 Implementation & Inference

	To utilize the Instruct capabilities, use the following structure:

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer

	MODEL_ID = "TopAI-1/Duchifat-2-Instruct"

	def run_duchifat_chat():
	print("--- Loading Duchifat-2 (Post 5-Epoch Instruct Training) ---")

	tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(
	MODEL_ID,
	trust_remote_code=True,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	model.eval()
	model.config.use_cache = False

	chat_history = []

	print("--- Model Ready! ---")

	while True:
	user_input = input("\nהכנס הוראה (או 'יציאה'): ")
	if user_input.lower() in ["exit", "quit", "יציאה"]:
	break

	# Add current instruction to memory
	chat_history.append(f"Instruction: {user_input}")

	# Build prompt with history
	full_prompt = "\n".join(chat_history) + "\nContent:"

	inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)

	# Context Window Protection (Max 1024 tokens)
	if inputs.input_ids.shape[1] > 850:
	chat_history = chat_history[2:] # Trim oldest turn
	full_prompt = "\n".join(chat_history) + "\nContent:"
	inputs = tokenizer(full_prompt, return_tensors="pt").to(model.device)

	with torch.no_grad():
	output_tokens = model.generate(
	input_ids=inputs.input_ids,
	attention_mask=inputs.attention_mask,
	max_new_tokens=300, # Increased for creative writing
	do_sample=True,
	temperature=0.75,
	top_p=0.9,
	repetition_penalty=1.15,
	pad_token_id=tokenizer.eos_token_id,
	use_cache=False
	)

	full_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

	# Extract only the latest response
	parts = full_text.split("Content:")
	answer = parts[-1].strip()

	# Save response to history for context
	chat_history.append(f"Content: {answer}")

	print(f"\nדוכיפת-2: {answer}")

	if __name__ == "__main__":
	run_duchifat_chat()