Cosmos-Tokenizer-Surg / explainability.md
javirk1's picture
Update explainability.md
97757e2 verified

Explainability Subcard

intended_domain

Image compression and reconstruction

Model Type

Convolutional with quantization

Intended Users

Surgeons, Telemedicine Professionals, Medical Robotics Engineers

Output

Types: Image. Formats: Red, Green, Blue (RGB)

Describe how the model works:

Type: Convolutional Neural Network with Residual and Attention Blocks (distilled from Wan2.1 with 2D Convolutions), It works as an autoencoder and quantizes the latent space.

Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:

None

Technical Limitations & Mitigation:

This model may struggle when confronted with high spatial frequency data, such as text in the images. As the model has not been explicitly evaluated across a substantial and diverse population, its performance may vary and should be evaluated by qualified experts for use in clinical settings. Mitigation: For scenarios where accurate text reconstruction is critical, it is recommended to preprocess images to exclude textual content, use supplementary OCR models to extract and transmit text separately, or increase the image resolution if hardware constraints allow.

Verified to have met prescribed NVIDIA quality standards:

Yes

Performance Metrics:

PSNR 34.61, LPIPS 0.10, 0.5 bpp, latency

Potential Known Risks:

This model may inaccurately reproduce text, making it illegible.

Licensing:

Use of this model is governed by the NVIDIA Open Model License.