embedl
/

Cosmos-Reason2-2B-W4A16

@@ -60,41 +60,11 @@ significantly reducing the memory footprint of model weights.
 ---
-## Accuracy
-For comparative evaluation, we report benchmark scores using the [Physical AI Bench Reason Task](https://huggingface.co/spaces/shi-labs/physical-ai-bench-leaderboard).
-> [!WARNING]
-> We have not been able to reproduce the baseline benchmarks reported by [nvidia/Cosmos-Reason2-2B](https://huggingface.co/nvidia/Cosmos-Reason2-2B)
-> on the [Physical AI Bench Leaderboard](https://huggingface.co/spaces/shi-labs/physical-ai-bench-leaderboard),
-> see related issue: https://github.com/nvidia-cosmos/cosmos-reason2/issues/52
-### Overall + Category Scores
-| Model                                                                                             | Overall | Embodied Reasoning | Common Sense |
-|---------------------------------------------------------------------------------------------------|--------:|-------------------:|-------------:|
-| [nvidia/Cosmos-Reason2-2B](https://huggingface.co/nvidia/Cosmos-Reason2-2B) | 50.60 | 53.93 | 47.19 |
-| [embedl/Cosmos-Reason2-2B-NVFP4A16](https://huggingface.co/embedl/Cosmos-Reason2-2B-NVFP4A16) |   49.84 |              50.16 |        49.50 |
-| [**embedl/Cosmos-Reason2-2B-W4A16**](https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16)           |   48.68 |              50.49 |        46.85 |
-| [embedl/Cosmos-Reason2-2B-W4A16-Edge2](https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16-Edge2)   |   50.58 |              53.61 |        47.52 |
-### Subcategory Scores
-| Model                                                                                             |    AV | Physical World |  Time | Space | Agibot | HoloAssist | RoboFail | RoboVQA | BridgeData V2 |
-|---------------------------------------------------------------------------------------------------|------:|---------------:|------:|------:|-------:|-----------:|---------:|--------:|--------------:|
-| [nvidia/Cosmos-Reason2-2B](https://huggingface.co/nvidia/Cosmos-Reason2-2B) | 44.00 | 46.90 | 45.30 | 55.00 | 34.00 | 60.00 | 49.00 | 90.91 | 42.00 |
-| [embedl/Cosmos-Reason2-2B-NVFP4A16](https://huggingface.co/embedl/Cosmos-Reason2-2B-NVFP4A16) | 44.00 |          45.13 | 52.01 | 52.50 |  28.00 |      58.00 |    51.00 |   84.55 |         32.00 |
-| [**embedl/Cosmos-Reason2-2B-W4A16**](https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16)           | 36.00 |          47.79 | 44.30 | 53.75 |  36.00 |      61.00 |    42.00 |   80.91 |         44.00 |
-| [embedl/Cosmos-Reason2-2B-W4A16-Edge2](https://huggingface.co/embedl/Cosmos-Reason2-2B-W4A16-Edge2)  | 45.00 |          44.25 | 48.66 | 52.50 |  32.00 |      59.00 |    54.00 |   85.45 |         43.00 |
----
-## Performance
-On-device performance benchmarks can be explored on [embedl/Edge-Inference-Benchmarks](https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks).
 <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/Cosmos-Reason2-2B-W4A16/screenshot_edge_inference_benchmarks.png" alt="Screenshot Edge Inference Benchmarks" width="75%">
 ---

 ---
+## Benchmarks
+Accuracy and on-device latency benchmarks can be explored on [embedl/Edge-Inference-Benchmarks](https://huggingface.co/spaces/embedl/Edge-Inference-Benchmarks).
 <img src="https://huggingface.co/datasets/embedl/documentation-images/resolve/main/Cosmos-Reason2-2B-W4A16/screenshot_edge_inference_benchmarks.png" alt="Screenshot Edge Inference Benchmarks" width="75%">
 ---