Instructions to use masato25/Cosmos-Reason2-2B-Arbor-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use masato25/Cosmos-Reason2-2B-Arbor-4bit with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("masato25/Cosmos-Reason2-2B-Arbor-4bit") config = load_config("masato25/Cosmos-Reason2-2B-Arbor-4bit") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Cosmos
How to use masato25/Cosmos-Reason2-2B-Arbor-4bit with Cosmos:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use masato25/Cosmos-Reason2-2B-Arbor-4bit with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "masato25/Cosmos-Reason2-2B-Arbor-4bit"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "masato25/Cosmos-Reason2-2B-Arbor-4bit" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use masato25/Cosmos-Reason2-2B-Arbor-4bit with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "masato25/Cosmos-Reason2-2B-Arbor-4bit"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default masato25/Cosmos-Reason2-2B-Arbor-4bit
Run Hermes
hermes
Cosmos-Reason2-2B-Arbor-4bit
masato25/Cosmos-Reason2-2B-Arbor-4bit is an Apple MLX 4-bit quantized derivative of nvidia/Cosmos-Reason2-2B.
This conversion is intended for local inference on Apple Silicon using MLX / MLX-VLM. It preserves the upstream model architecture and tokenizer/processor files where applicable, while quantizing supported linear weights to 4-bit for smaller memory footprint.
Attribution, license, and required notices
- Base model:
nvidia/Cosmos-Reason2-2B - License: NVIDIA Open Model License
- Required attribution: Licensed by NVIDIA Corporation under the NVIDIA Open Model License
- Required Cosmos notice: Built on NVIDIA Cosmos
- This repository is a derivative conversion/quantization and is not the original NVIDIA release.
A NOTICE file is included in this repository. By using, copying, modifying, redistributing, deploying, or making this derivative model available to others, you are responsible for complying with all applicable terms, including but not limited to:
- the NVIDIA Open Model License;
- NVIDIA Trustworthy AI terms and any responsible-use, prohibited-use, or acceptable-use restrictions that apply to the upstream model;
- export-control, sanctions, and other applicable laws and regulations;
- any additional terms, notices, access requirements, or usage instructions presented on the original NVIDIA model page.
If the upstream license, notices, or model-page terms are updated, those upstream terms may impose additional or different obligations. Please review the upstream model page and license before use or redistribution.
No affiliation, sponsorship, endorsement, or trademark grant
This repository is independently prepared and published by the repository owner. It is not affiliated with, sponsored by, approved by, or endorsed by NVIDIA Corporation unless NVIDIA explicitly states otherwise.
The names "NVIDIA", "Cosmos", and other NVIDIA marks are trademarks or registered trademarks of NVIDIA Corporation in the United States and/or other jurisdictions. They are used here only for reasonable descriptive attribution and identification of the upstream base model. No trademark license or other rights in NVIDIA marks are granted by this repository.
Conversion details
- Source:
nvidia/Cosmos-Reason2-2B - Format: MLX / MLX-VLM
- Quantization: 4-bit affine weight quantization
- Group size: 64
- Conversion command:
python3 -m mlx_vlm.convert \
--hf-path nvidia/Cosmos-Reason2-2B \
--mlx-path Cosmos-Reason2-2B-Arbor-4bit \
-q \
--q-bits 4 \
--q-group-size 64 \
--trust-remote-code
Usage
Install MLX-VLM:
pip install -U mlx-vlm
Example text prompt:
python -m mlx_vlm.generate \
--model masato25/Cosmos-Reason2-2B-Arbor-4bit \
--prompt "Explain physical common sense reasoning in 3 bullets." \
--max-tokens 256 \
--temperature 0.0
Example image prompt:
python -m mlx_vlm.generate \
--model masato25/Cosmos-Reason2-2B-Arbor-4bit \
--image /path/to/image.jpg \
--prompt "Describe the scene and reason about what may happen next." \
--max-tokens 256 \
--temperature 0.0
Intended use
This quantized derivative is intended for experimentation, prototyping, and deployment scenarios where the upstream license permits use and where Apple Silicon local inference is desirable.
You should evaluate whether your intended use is permitted under the upstream NVIDIA Open Model License and related terms. This repository does not expand, waive, or modify any upstream restrictions.
Limitations and safety
Quantization can change output quality, numerical behavior, robustness, and safety characteristics compared with the original model. This repository does not claim improved accuracy, safety, bias mitigation, alignment, or suitability for any particular purpose over the upstream NVIDIA model.
Model outputs may be inaccurate, unsafe, biased, offensive, incomplete, or otherwise unsuitable for your use case. Do not rely on the model as the sole source of truth for medical, legal, financial, safety-critical, or other high-stakes decisions. Evaluate the model for your own task, domain, jurisdiction, and risk profile before deployment or redistribution.
Disclaimer
This derivative model is provided as-is and without warranties or conditions of any kind, express or implied, including without limitation warranties of merchantability, fitness for a particular purpose, title, non-infringement, accuracy, availability, or error-free operation.
To the maximum extent permitted by applicable law, the repository owner is not liable for any direct, indirect, incidental, special, consequential, exemplary, punitive, or other damages arising from or related to use of this repository, the derivative model, or model outputs.
Nothing in this README is legal advice. You are responsible for reviewing and complying with the applicable license terms and laws.
- Downloads last month
- 43
4-bit