| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8" /> |
| <meta name="viewport" content="width=device-width" /> |
| <title>MLTest Demo</title> |
| <link rel="stylesheet" href="style.css" /> |
| <link href="https://fonts.googleapis.com/css2?family=Source+Sans+Pro:ital,wght@0,200;0,300;0,400;0,600;0,700;0,900;1,200;1,300;1,400;1,600;1,700;1,900&display=swap" rel="stylesheet"> |
| <link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Mono:wght@400;600;700&display=swap" rel="stylesheet"> |
| </head> |
| <body> |
| <div class="container"> |
| <div> |
| <h1>MLTest</h1> |
| <p> |
| This is a demo of MLTest on the dataset |
| <a href="https://huggingface.co/datasets/marmal88/skin_cancer"><code>marmal88/skin_cancer</code></a>. |
| Each image is labeled with one of three different kinds of cancers. |
| </p> |
| <p> |
| The model has been trained on five models: two variants of Swin Transformers, ViT, ResNet, and BEiT. The test results for each model can |
| be inspected in the dashboard below. |
| </p> |
| <p> |
| <b>Performance tests</b>: |
| in order to measure how well the model performs, we compute common performance metrics like accuracy, |
| precision, recall, F1 score, and more. |
| </p> |
| <p> |
| <b>Failure clusters</b>: |
| these clusters give meaningful insights when the model is failing and can be inspected in the "Failure Clusters" tab. |
| These failure clusters are automatically detected |
| for different combinations of metadata. |
| For example, the BEiT transformer has a significantly lower accuracy on images taken of cancers of the back with class label <code>0</code>. |
| </p> |
| <p> |
| <b>Robustness</b>: these tests help ML developers evaluate how well their model performs under different conditions. |
| These conditions could include different levels of brightness, compression, and many other types of interference. |
| </p><p> |
| The following robustness tests were enabled for this test case: |
| </p> |
| <ul> |
| <li>Brightness</li> |
| <li>CompressImage</li> |
| <li>Contrast</li> |
| <li>DarkSpots</li> |
| <li>GaussianBlur</li> |
| <li>GaussianNoise</li> |
| <li>Glare</li> |
| <li>GlassBlur</li> |
| <li>HorizontalFlip</li> |
| <li>MedianBlur</li> |
| <li>MotionBlur</li> |
| <li>OilSpots</li> |
| <li>Perspective</li> |
| <li>VerticalFlip</li> |
| </ul> |
|
|
|
|
| <p> |
| The full list of transforms supported by MLTest can be found in the <a target="_blank" href="https://docs.lakera.ai/configuration/robustness">documentation</a>. |
| </p> |
| <p> |
| <b>Fairness tests</b>: these tests measure how fair your model is. That means, whether its performance is dependent |
| on a protected attribute of a person. In this dataset, the age and gender of a subject may be considered |
| protected attributes. |
| </p> |
| <p> |
| We used two types of fairness tests on the age and gender of a person. |
| The <code>Equalized Odds</code> test checks that true positive and false positive rates are equal amongst protected attributes. |
| The <code>Predictive Equality</code> test checks that the false positive rates are equal amongst protected attributes. |
| </p> |
| <p> |
| More fairness tests supported by MLTest can be found in the <a target="_blank" href="https://docs.lakera.ai/configuration/fairness">documentation</a>. |
| </p> |
| </div> |
| |
| <iframe src="https://hf.lakera.ai/projects/skin_cancer_run2"></iframe> |
| </div> |
| </body> |
| </html> |
|
|