Instructions to use InstaDeepAI/ChatNT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use InstaDeepAI/ChatNT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="InstaDeepAI/ChatNT", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("InstaDeepAI/ChatNT", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use InstaDeepAI/ChatNT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "InstaDeepAI/ChatNT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InstaDeepAI/ChatNT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/InstaDeepAI/ChatNT
- SGLang
How to use InstaDeepAI/ChatNT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "InstaDeepAI/ChatNT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InstaDeepAI/ChatNT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "InstaDeepAI/ChatNT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "InstaDeepAI/ChatNT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use InstaDeepAI/ChatNT with Docker Model Runner:
docker model run hf.co/InstaDeepAI/ChatNT
Update chatNT.py
Browse files
chatNT.py
CHANGED
|
@@ -1661,14 +1661,24 @@ class TorchMultiModalPerceiverResamplerBlock(nn.Module):
|
|
| 1661 |
)
|
| 1662 |
|
| 1663 |
def mlp(self, x: torch.Tensor) -> torch.Tensor:
|
|
|
|
| 1664 |
x = self.norm_mlp(x)
|
|
|
|
| 1665 |
if self.use_glu_in_ffn:
|
| 1666 |
x1, x2 = torch.chunk(self.fc1(x), 2, dim=-1)
|
| 1667 |
x = self.activation_fn(x1) * x2
|
| 1668 |
else:
|
| 1669 |
-
x = self.
|
| 1670 |
-
|
|
|
|
|
|
|
| 1671 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1672 |
def forward(
|
| 1673 |
self,
|
| 1674 |
x: torch.Tensor,
|
|
@@ -1703,7 +1713,8 @@ class TorchMultiModalPerceiverResamplerBlock(nn.Module):
|
|
| 1703 |
outs_news["ATTENTION_layer3_cross_attention_layer_2"] = attn_output.clone()
|
| 1704 |
x = res + attn_output
|
| 1705 |
|
| 1706 |
-
|
|
|
|
| 1707 |
outs_news["ATTENTION_after_mlp"] = x.clone()
|
| 1708 |
|
| 1709 |
output = {}
|
|
|
|
| 1661 |
)
|
| 1662 |
|
| 1663 |
def mlp(self, x: torch.Tensor) -> torch.Tensor:
|
| 1664 |
+
outs = {}
|
| 1665 |
x = self.norm_mlp(x)
|
| 1666 |
+
outs["MLP_layer0_layer_norm"] = x.clone()
|
| 1667 |
if self.use_glu_in_ffn:
|
| 1668 |
x1, x2 = torch.chunk(self.fc1(x), 2, dim=-1)
|
| 1669 |
x = self.activation_fn(x1) * x2
|
| 1670 |
else:
|
| 1671 |
+
x = self.fc1(x)
|
| 1672 |
+
outs["MLP_layer1_fc1"] = x.clone()
|
| 1673 |
+
x = self.activation_fn(x)
|
| 1674 |
+
outs["MLP_layer2_activation"] = x.clone()
|
| 1675 |
|
| 1676 |
+
x = self.fc2(x)
|
| 1677 |
+
outs["MLP_layer3_fc2"] = x.clone()
|
| 1678 |
+
outs["x"] = x.clone()
|
| 1679 |
+
|
| 1680 |
+
return outs
|
| 1681 |
+
|
| 1682 |
def forward(
|
| 1683 |
self,
|
| 1684 |
x: torch.Tensor,
|
|
|
|
| 1713 |
outs_news["ATTENTION_layer3_cross_attention_layer_2"] = attn_output.clone()
|
| 1714 |
x = res + attn_output
|
| 1715 |
|
| 1716 |
+
mlp_output = self.mlp(x)
|
| 1717 |
+
x = x + mlp_output["x"]
|
| 1718 |
outs_news["ATTENTION_after_mlp"] = x.clone()
|
| 1719 |
|
| 1720 |
output = {}
|