mediusware-ai commited on
Commit
257c308
·
verified ·
1 Parent(s): ba7774d

Upload INFERENCE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. INFERENCE.md +84 -0
INFERENCE.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Inference Guide: Intellix-v1
2
+
3
+ This guide provides instructions on how to run the **Intellix** model across different environments.
4
+
5
+ ## 1. Local Inference with Ollama (Recommended)
6
+
7
+ Ollama is the easiest way to run Intellix locally with high performance.
8
+
9
+ ### Step 1: Create the Model
10
+ Ensure you have the `intellix-Q8_0.gguf` and `Modelfile` in your current directory, then run:
11
+ ```bash
12
+ ollama create intellix -f Modelfile
13
+ ```
14
+
15
+ ### Step 2: Run the Model
16
+ ```bash
17
+ ollama run intellix
18
+ >>> Hi, who are you?
19
+ ```
20
+
21
+ ### Step 3: API Integration (Next.js / Node.js)
22
+ You can call the Ollama API directly from your application:
23
+ ```typescript
24
+ const response = await fetch("http://localhost:11434/api/chat", {
25
+ method: "POST",
26
+ headers: { "Content-Type": "application/json" },
27
+ body: JSON.stringify({
28
+ model: "intellix",
29
+ messages: [{ role: "user", content: "Draft a professional update email." }],
30
+ stream: false
31
+ })
32
+ });
33
+ const data = await response.json();
34
+ print(data.message.content);
35
+ ```
36
+
37
+ ---
38
+
39
+ ## 2. Python Inference with Transformers
40
+
41
+ If you want to use the model in a Python environment for research or batch processing.
42
+
43
+ ### Installation
44
+ ```bash
45
+ pip install transformers accelerate bitsandbytes
46
+ ```
47
+
48
+ ### Implementation
49
+ ```python
50
+ import torch
51
+ from transformers import AutoModelForCausalLM, AutoTokenizer
52
+
53
+ model_id = "mediusware-ai/intellix" # Hugging Face Hub Path
54
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ model_id,
57
+ device_map="auto",
58
+ torch_dtype=torch.float16
59
+ )
60
+
61
+ prompt = "<|im_start|>system\nYou are Intellix, a professional AI assistant developed by Mediusware.<|im_end|>\n<|im_start|>user\nWhat is the capital of Bangladesh?<|im_end|>\n<|im_start|>assistant\n"
62
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
63
+
64
+ outputs = model.generate(**inputs, max_new_tokens=100)
65
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
66
+ ```
67
+
68
+ ---
69
+
70
+ ## 3. Deployment Best Practices
71
+
72
+ - **Repetition Penalty**: Set `repeat_penalty` between `1.2` and `1.5` to prevent conversational loops.
73
+ - **Stop Tokens**: Always use `<|im_start|>`, `<|im_end|>`, `User:`, and `Assistant:` as stop tokens to ensure clean turn-taking.
74
+ ---
75
+
76
+ ## 4. Fluent Conversation Examples
77
+ Based on the latest fine-tuning dataset, Intellix is designed to handle queries like:
78
+ - **Identity**: "Who are you?" -> "I am Intellix, a highly capable AI assistant developed by Mediusware."
79
+ - **Corporate Knowledge**: "Tell me about Mediusware." -> Accurate details about offices in Dhaka and South Carolina.
80
+ - **Technical Reasoning**: "What is Python?" -> Professional explanations of its use in AI and automation.
81
+
82
+ ## 5. Troubleshooting
83
+ - **Loops**: If the model starts repeating, ensure `repeat_penalty` is set to `1.5` in your inference engine (Ollama/Transformers).
84
+ - **Hallucinations**: The model is trained to be professional. If it provides a generic "I don't have a name" response, verify that you are using the correct ChatML template.