Instructions to use nex-agi/Nex-N2-Pro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nex-agi/Nex-N2-Pro with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nex-agi/Nex-N2-Pro") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("nex-agi/Nex-N2-Pro") model = AutoModelForImageTextToText.from_pretrained("nex-agi/Nex-N2-Pro") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nex-agi/Nex-N2-Pro with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nex-agi/Nex-N2-Pro" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nex-agi/Nex-N2-Pro", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nex-agi/Nex-N2-Pro
- SGLang
How to use nex-agi/Nex-N2-Pro with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nex-agi/Nex-N2-Pro" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nex-agi/Nex-N2-Pro", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nex-agi/Nex-N2-Pro" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nex-agi/Nex-N2-Pro", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nex-agi/Nex-N2-Pro with Docker Model Runner:
docker model run hf.co/nex-agi/Nex-N2-Pro
Create GGUFs if possible?
Hello,
Thank you guys for the work that you do! I was wondering if it would be possible to release various sized GGUF quants for people to run under llamacpp, as it would be a great way to test these models?
Thank you.
I have tried to use a quant made myself for llamacpp, but I had recieved this error in the beginning of model loading, would you guys know of a solution?
llama_model_load: error loading model: missing tensor 'blk.60.attn_norm.weight'
while converting, I did notice that the layers went from 0-59, but llamacpp is oddly expecting an extra layer 60
I had the same issue with the nex-agi/Nex-N2-mini and vibecoded it. So I can't tell you what my agent exactly did but you can try it too to get a working gguf.
I had the same issue with the nex-agi/Nex-N2-mini and vibecoded it. So I can't tell you what my agent exactly did but you can try it too to get a working gguf.
did you have to regenerate the GGUF or just make a patch in the llamacpp project to get your current gguf to work? If its the patch, do you think you can upload your version of llamacpp onto github for me to try as well?
Thank you!
config.json advertises mtp_num_hidden_layers: 1, but this uploaded model does not ship the corresponding MTP (Multi-Token Prediction) tensors. Try to call convert_hf_to_gguf.py with the --no-mtp parameter
Guys, we need IQ3_XXS and IQ4_XS <3