Upload INFERENCE.md with huggingface_hub

Browse files

Files changed (1) hide show

INFERENCE.md +84 -0

INFERENCE.md ADDED Viewed

	@@ -0,0 +1,84 @@

+# Inference Guide: Intellix-v1
+This guide provides instructions on how to run the **Intellix** model across different environments.
+## 1. Local Inference with Ollama (Recommended)
+Ollama is the easiest way to run Intellix locally with high performance.
+### Step 1: Create the Model
+Ensure you have the `intellix-Q8_0.gguf` and `Modelfile` in your current directory, then run:
+```bash
+ollama create intellix -f Modelfile
+```
+### Step 2: Run the Model
+```bash
+ollama run intellix
+>>> Hi, who are you?
+```
+### Step 3: API Integration (Next.js / Node.js)
+You can call the Ollama API directly from your application:
+```typescript
+const response = await fetch("http://localhost:11434/api/chat", {
+  method: "POST",
+  headers: { "Content-Type": "application/json" },
+  body: JSON.stringify({
+    model: "intellix",
+    messages: [{ role: "user", content: "Draft a professional update email." }],
+    stream: false
+  })
+});
+const data = await response.json();
+print(data.message.content);
+```
+---
+## 2. Python Inference with Transformers
+If you want to use the model in a Python environment for research or batch processing.
+### Installation
+```bash
+pip install transformers accelerate bitsandbytes
+```
+### Implementation
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "mediusware-ai/intellix" # Hugging Face Hub Path
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+prompt = "<|im_start|>system\nYou are Intellix, a professional AI assistant developed by Mediusware.<|im_end|>\n<|im_start|>user\nWhat is the capital of Bangladesh?<|im_end|>\n<|im_start|>assistant\n"
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+---
+## 3. Deployment Best Practices
+- **Repetition Penalty**: Set `repeat_penalty` between `1.2` and `1.5` to prevent conversational loops.
+- **Stop Tokens**: Always use `<|im_start|>`, `<|im_end|>`, `User:`, and `Assistant:` as stop tokens to ensure clean turn-taking.
+---
+## 4. Fluent Conversation Examples
+Based on the latest fine-tuning dataset, Intellix is designed to handle queries like:
+- **Identity**: "Who are you?" -> "I am Intellix, a highly capable AI assistant developed by Mediusware."
+- **Corporate Knowledge**: "Tell me about Mediusware." -> Accurate details about offices in Dhaka and South Carolina.
+- **Technical Reasoning**: "What is Python?" -> Professional explanations of its use in AI and automation.
+## 5. Troubleshooting
+- **Loops**: If the model starts repeating, ensure `repeat_penalty` is set to `1.5` in your inference engine (Ollama/Transformers).
+- **Hallucinations**: The model is trained to be professional. If it provides a generic "I don't have a name" response, verify that you are using the correct ChatML template.