Spaces:

Groq
/

mlagility

Runtime error

App Files Files Community

danielhn commited on Mar 8, 2023

Commit

1158f50

1 Parent(s): ca29c17

Updated to latest page version

Browse files

Files changed (2) hide show

app.py +96 -16
graphs.py +185 -10

app.py CHANGED Viewed

@@ -6,13 +6,13 @@ import graphs
 from streamlit_helpers import add_filter, slider_filter, Collapsable
 st.set_page_config(
-    page_title="ML Agility tracker",
     page_icon="⚡",
     layout="wide",
 )
 # dashboard title
-st.title("ML Agility tracker ⚡")
 def add_faq() -> None:
@@ -21,10 +21,92 @@ def add_faq() -> None:
     """
     faq = Collapsable()
     faq.add_section(
-        "Why is this so empty?",
         (
-            "Because the FAQ of huggingface website still needs to be written. "
-            "We don't use the same FAQ as in our internal dashboard."
         ),
     )
@@ -72,6 +154,13 @@ with st.sidebar:
 st.markdown("## Summary Results")
 cols = st.columns(2)
 with cols[0]:
     st.markdown("""#### Workload origin""")
@@ -81,18 +170,9 @@ with cols[1]:
     st.markdown("""#### Parameter Size Distribution""")
     graphs.parameter_histogram(report, show_assembled=False)
-st.markdown("""#### Benchmark results""")
-baseline = st.selectbox("Baseline", ("x86", "nvidia", "groq"))
-graphs.speedup_text_summary(report, baseline)
-graphs.speedup_bar_chart(report, baseline)
 # FAQ Block
-cols = st.columns(2)
-with cols[0]:
-    st.markdown("""## About this workload analysis (FAQ)""")
-    add_faq()
 # Detailed data view (table)
 st.markdown("## Detailed Data View")

 from streamlit_helpers import add_filter, slider_filter, Collapsable
 st.set_page_config(
+    page_title="MLAgility tracker",
     page_icon="⚡",
     layout="wide",
 )
 # dashboard title
+st.title("MLAgility tracker ⚡")
 def add_faq() -> None:
     """
     faq = Collapsable()
     faq.add_section(
+        "How is MLAgility different from MLPerf?",
         (
+            "Deep learning pioneers have been judging their progress with the Machine Learning "
+            "Performance (MLPerf) inference benchmark, but have found that the corpus of models "
+            "is small enough that it allows vendors to primarily compete by hand-optimizing "
+            "kernels. MLAgility offers a complementary approach to MLPerf by examining the "
+            "capability of vendors to provide turnkey solutions to a larger corpus of "
+            "off-the-shelf models. By providing a workflow that is representative of the "
+            "mass adoption customer on a variety of ML accelerators and effectively disallowing "
+            "hand-crafted kernels, MLAgility bridges the gap between MLPerf and the mass adoption "
+            "of hardware acceleration."
+        ),
+    )
+    faq.add_section(
+        "Why now for MLAgility?",
+        (
+            "Deep learning algorithms and their associated DL hardware accelerators are "
+            "transitioning from early adoption into mass adoption. Production DL is now "
+            "becoming available to the masses, with a desire to customize models to tackle "
+            "their specific problems, and then take the path of least resistance into "
+            "production. A market for turnkey solutions, starting with a model as input and "
+            "provision a cost- and latency-effective acceleration solution, often in the cloud, "
+            "as output, has emerged."
+        ),
+    )
+    faq.add_section(
+        "Which tool was used to generate those results?",
+        (
+            "All MLAgility results have been generated using the <b>benchit</b> tool v1.0.0, which is part "
+            "of the MLAgility Github Repository. You can learn more about it "
+            '<a href="https://github.com/groq/mlagility">here</a>.'
+        ),
+    )
+    faq.add_section(
+        "What is the experimental setup for each of the devices?",
+        [
+            "<b>x86</b>: Intel(R) Xeon(R) X40 CPU @ 2.00GHz on Google Cloud (custom: n2, 80 vCPU, 64.00 GiB) and OnnxRuntime version 1.14.0.",
+            "<b>nvidia</b>: NVIDIA A100 40GB on Google Cloud (a2-highgpu-1g) and TensorRT version 22.12-py3.",
+            "<b>groq</b>: GroqChip 1 on selfhosted GroqNode server, GroqFlow version 3.0.2 TestPyPI package, and GroqWare™ Suite version 0.9.2.",
+            (
+                "You can find more details about the methodology "
+                '<a href="https://github.com/groq/mlagility/blob/main/docs/tools_user_guide.md">here</a>.'
+            ),
+        ],
+    )
+    faq.add_section(
+        "What are the current key limitations of those results?",
+        [
+            (
+                "Groq's latency is computed using GroqModel.estimate_latency(), which takes"
+                " into account deterministic compute time and estimates an ideal runtime with"
+                " ideal I/O time. It does not take into account runtime performance."
+            ),
+            "Results currently only represent batch 1 performance on a limited number of models, "
+            "devices, vendors, and runtimes. You can learn more about future directions by reading "
+            'the "What are the future directions of MLAgility?" FAQ section.',
+        ],
+    )
+    faq.add_section(
+        "What are the future directions of MLAgility?",
+        [
+            "Include additional classes of models (e.g. LLMs, GNNs, DLRMs).",
+            "Perform experiments that include sweeps over batch and input sizes.",
+            "Increase the number of devices from existing vendors (e.g. T4, A10, and H100).",
+            "Include devices from additional vendors (e.g. ARM, and AMD)."
+            "Include the number of runtimes supported (e.g. ORT and PyTorch for CUDA, PyTorch for x86).",
+        ],
+    )
+    faq.add_section(
+        "Who runs MLAgility?",
+        (
+            "MLAgility is currently maintained by the following individuals (in alphabetical order): "
+            "Daniel Holanda Noronha, Jeremy Fowers, Kalin Ovtcharov, and Ramakrishnan Sivakumar. We are actively seeking collaborators from across the industry."
+        ),
+    )
+    faq.add_section(
+        "License and Liability",
+        (
+            'THE MLAGILITY BENCHMARK IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR '
+            "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, "
+            "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE "
+            "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER "
+            "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, "
+            "OUT OF OR IN CONNECTION WITH THE BENCHMARK OR THE USE OR OTHER DEALINGS IN THE "
+            "BENCHMARK. Read more about it "
+            '<a href="https://github.com/groq/mlagility/blob/main/LICENSE">here</a>.'
         ),
     )
 st.markdown("## Summary Results")
+graphs.device_funnel(report)
+st.markdown("""#### Benchmark results""")
+baseline = st.selectbox("Baseline", ("x86", "nvidia", "groq"))
+graphs.speedup_text_summary(report, baseline)
+graphs.speedup_bar_chart(report, baseline)
 cols = st.columns(2)
 with cols[0]:
     st.markdown("""#### Workload origin""")
     st.markdown("""#### Parameter Size Distribution""")
     graphs.parameter_histogram(report, show_assembled=False)
 # FAQ Block
+st.markdown("""## About this workload analysis (FAQ)""")
+add_faq()
 # Detailed data view (table)
 st.markdown("## Detailed Data View")

graphs.py CHANGED Viewed

@@ -18,9 +18,9 @@ colors = {
     "ocean_green": "#3ba272",
 }
 device_colors = {
-    "x86": colors["blue"],
-    "nvidia": colors["green"],
-    "groq": colors["orange"],
 }
@@ -35,6 +35,19 @@ class StageCount:
         self.assembles = int(np.sum(df["assembles"]))
 def stages_count_summary(current_df: pd.DataFrame, prev_df: pd.DataFrame) -> None:
     """
     Show count of how many models compile, assemble, etc
@@ -476,14 +489,14 @@ def speedup_bar_chart(df: pd.DataFrame, baseline) -> None:
         )
-def kpi_to_markdown(compute_ratio, device, is_baseline=False, color="blue"):
     title = f"""<br><br>
     <p style="font-family:sans-serif; font-size: 20px;text-align: center;">Median {device} Acceleration ({len(compute_ratio)} models):</p>"""
     if is_baseline:
         return (
             title
-            + f"""<p style="font-family:sans-serif; color:{colors[color]}; font-size: 26px;text-align: center;"> {1}x (Baseline)</p>"""
         )
     if len(compute_ratio) > 0:
@@ -497,8 +510,8 @@ def kpi_to_markdown(compute_ratio, device, is_baseline=False, color="blue"):
     return (
         title
-        + f"""<p style="font-family:sans-serif; color:{colors[color]}; font-size: 26px;text-align: center;"> {kpi_median}x</p>
-    <p style="font-family:sans-serif; color:{colors[color]}; font-size: 20px;text-align: center;"> min {kpi_min}x; max {kpi_max}x</p>
     """
     )
@@ -523,19 +536,19 @@ def speedup_text_summary(df: pd.DataFrame, baseline) -> None:
     x86_text = kpi_to_markdown(
         x86_compute_ratio,
         device="Intel(R) Xeon(R) X40 CPU @ 2.00GHz",
-        color="blue",
         is_baseline=baseline == "x86",
     )
     groq_text = kpi_to_markdown(
         groq_compute_ratio,
         device="GroqChip 1",
-        color="orange",
         is_baseline=baseline == "groq",
     )
     nvidia_text = kpi_to_markdown(
         nvidia_compute_ratio,
         device="NVIDIA A100-PCIE-40GB",
-        color="green",
         is_baseline=baseline == "nvidia",
     )
@@ -613,3 +626,165 @@ def results_table(df: pd.DataFrame):
         df = df[[model_name in x for x in df["Model Name"]]]
     st.dataframe(df, height=min((len(df) + 1) * 35, 35 * 21))

     "ocean_green": "#3ba272",
 }
 device_colors = {
+    "x86": "#0071c5",
+    "nvidia": "#76b900",
+    "groq": "#F55036",
 }
         self.assembles = int(np.sum(df["assembles"]))
+class DeviceStageCount:
+    def __init__(self, df: pd.DataFrame) -> None:
+        self.all_models = len(df)
+        self.base_onnx = int(np.sum(df["onnx_exported"]))
+        self.optimized_onnx = int(np.sum(df["onnx_optimized"]))
+        self.fp16_onnx = int(np.sum(df["onnx_converted"]))
+        self.x86 = df.loc[df.x86_latency != "-", "x86_latency"].count()
+        self.nvidia = df.loc[df.nvidia_latency != "-", "nvidia_latency"].count()
+        self.groq = df.loc[
+            df.groq_estimated_latency != "-", "groq_estimated_latency"
+        ].count()
 def stages_count_summary(current_df: pd.DataFrame, prev_df: pd.DataFrame) -> None:
     """
     Show count of how many models compile, assemble, etc
         )
+def kpi_to_markdown(compute_ratio, device, is_baseline=False, color="#FFFFFF"):
     title = f"""<br><br>
     <p style="font-family:sans-serif; font-size: 20px;text-align: center;">Median {device} Acceleration ({len(compute_ratio)} models):</p>"""
     if is_baseline:
         return (
             title
+            + f"""<p style="font-family:sans-serif; color:{color}; font-size: 26px;text-align: center;"> {1}x (Baseline)</p>"""
         )
     if len(compute_ratio) > 0:
     return (
         title
+        + f"""<p style="font-family:sans-serif; color:{color}; font-size: 26px;text-align: center;"> {kpi_median}x</p>
+    <p style="font-family:sans-serif; color:{color}; font-size: 20px;text-align: center;"> min {kpi_min}x; max {kpi_max}x</p>
     """
     )
     x86_text = kpi_to_markdown(
         x86_compute_ratio,
         device="Intel(R) Xeon(R) X40 CPU @ 2.00GHz",
+        color=device_colors["x86"],
         is_baseline=baseline == "x86",
     )
     groq_text = kpi_to_markdown(
         groq_compute_ratio,
         device="GroqChip 1",
+        color=device_colors["groq"],
         is_baseline=baseline == "groq",
     )
     nvidia_text = kpi_to_markdown(
         nvidia_compute_ratio,
         device="NVIDIA A100-PCIE-40GB",
+        color=device_colors["nvidia"],
         is_baseline=baseline == "nvidia",
     )
         df = df[[model_name in x for x in df["Model Name"]]]
     st.dataframe(df, height=min((len(df) + 1) * 35, 35 * 21))
+def device_funnel(df: pd.DataFrame) -> None:
+    """
+    Show count of how many models compile, assemble, etc
+    """
+    summ = DeviceStageCount(df)
+    stages = [
+        "All models",
+        "Export to ONNX",
+        "Optimize ONNX file",
+        "Convert to FP16",
+        "Acquire Performance",
+    ]
+    cols = st.columns(len(stages))
+    for idx, stage in enumerate(stages):
+        with cols[idx]:
+            st.markdown(stage)
+    # Show Sankey graph with percentages
+    sk_val = {
+        "All models": f"{summ.all_models} models - 100%",
+        "Convert to ONNX": f"{summ.base_onnx} models - "
+        + str(int(100 * summ.base_onnx / summ.all_models))
+        + "%",
+        "Optimize ONNX file": f"{summ.optimized_onnx} models - "
+        + str(int(100 * summ.optimized_onnx / summ.all_models))
+        + "%",
+        "Converts to FP16": f"{summ.fp16_onnx} models - "
+        + str(int(100 * summ.fp16_onnx / summ.all_models))
+        + "%",
+        "Acquires Nvidia Perf": f"{summ.nvidia} models - "
+        + str(int(100 * summ.nvidia / summ.all_models))
+        + "% (Nvidia)",
+        "Acquires Groq Perf": f"{summ.groq} models - "
+        + str(int(100 * summ.groq / summ.all_models))
+        + "% (Groq)",
+        "Acquires x86 Perf": f"{summ.x86} models - "
+        + str(int(100 * summ.x86 / summ.all_models))
+        + "% (x86)",
+    }
+    option = {
+        "series": {
+            "type": "sankey",
+            "animationDuration": 1,
+            "top": "0%",
+            "bottom": "20%",
+            "left": "0%",
+            "right": "19%",
+            "darkMode": "true",
+            "nodeWidth": 2,
+            "textStyle": {"fontSize": 16},
+            "nodeAlign": "left",
+            "lineStyle": {"curveness": 0},
+            "layoutIterations": 0,
+            "nodeGap": 12,
+            "layout": "none",
+            "emphasis": {"focus": "adjacency"},
+            "data": [
+                {
+                    "name": "All models",
+                    "value": sk_val["All models"],
+                    "itemStyle": {"color": "white", "borderColor": "white"},
+                },
+                {
+                    "name": "Convert to ONNX",
+                    "value": sk_val["Convert to ONNX"],
+                    "itemStyle": {"color": "white", "borderColor": "white"},
+                },
+                {
+                    "name": "Optimize ONNX file",
+                    "value": sk_val["Optimize ONNX file"],
+                    "itemStyle": {"color": "white", "borderColor": "white"},
+                },
+                {
+                    "name": "Converts to FP16",
+                    "value": sk_val["Converts to FP16"],
+                    "itemStyle": {"color": "white", "borderColor": "white"},
+                },
+                {
+                    "name": "Acquires Nvidia Perf",
+                    "value": sk_val["Acquires Nvidia Perf"],
+                    "itemStyle": {
+                        "color": device_colors["nvidia"],
+                        "borderColor": device_colors["nvidia"],
+                    },
+                },
+                {
+                    "name": "Acquires Groq Perf",
+                    "value": sk_val["Acquires Groq Perf"],
+                    "itemStyle": {
+                        "color": device_colors["groq"],
+                        "borderColor": device_colors["groq"],
+                    },
+                },
+                {
+                    "name": "Acquires x86 Perf",
+                    "value": sk_val["Acquires x86 Perf"],
+                    "itemStyle": {
+                        "color": device_colors["x86"],
+                        "borderColor": device_colors["x86"],
+                    },
+                },
+            ],
+            "label": {
+                "position": "insideTopLeft",
+                "borderWidth": 0,
+                "fontSize": 16,
+                "color": "white",
+                "textBorderWidth": 0,
+                "formatter": "{c}",
+            },
+            "links": [
+                {
+                    "source": "All models",
+                    "target": "Convert to ONNX",
+                    "value": summ.all_models,
+                },
+                {
+                    "source": "Convert to ONNX",
+                    "target": "Optimize ONNX file",
+                    "value": summ.optimized_onnx,
+                },
+                {
+                    "source": "Optimize ONNX file",
+                    "target": "Converts to FP16",
+                    "value": summ.fp16_onnx,
+                },
+                {
+                    "source": "Converts to FP16",
+                    "target": "Acquires Nvidia Perf",
+                    "value": int(
+                        summ.nvidia
+                        * summ.fp16_onnx
+                        / (summ.x86 + summ.nvidia + summ.groq)
+                    ),
+                },
+                {
+                    "source": "Converts to FP16",
+                    "target": "Acquires Groq Perf",
+                    "value": int(
+                        summ.groq
+                        * summ.fp16_onnx
+                        / (summ.x86 + summ.nvidia + summ.groq)
+                    ),
+                },
+                {
+                    "source": "Converts to FP16",
+                    "target": "Acquires x86 Perf",
+                    "value": int(
+                        summ.x86 * summ.fp16_onnx / (summ.x86 + summ.nvidia + summ.groq)
+                    ),
+                },
+            ],
+        }
+    }
+    st_echarts(
+        options=option,
+        height="70px",
+    )