Spaces:
Sleeping
Sleeping
| # ai-provider-test-report | |
| **Generated:** 2026-04-05 02:23:10 | |
| **Test Duration:** 23.50s | |
| ## summary | |
| - **Total Tests:** 10 | |
| - **Passed:** 9 | |
| - **Failed:** 1 | |
| - **Success Rate:** 90.0% | |
| ## test-results | |
| ### 1-code-generation-pass | |
| **Task Type:** code | |
| **Provider:** nvidia | |
| **Model:** llama-3.3-70b | |
| **Duration:** 5.41s | |
| **Prompt:** | |
| ``` | |
| Write a Python function to calculate fibonacci numbers recursively. | |
| ``` | |
| **Response:** | |
| ``` | |
| ### Recursive Fibonacci Function in Python | |
| #### Overview | |
| The Fibonacci sequence is a series of numbers where a number is the sum of the two preceding ones, usually starting with 0 and 1. This function calculates the nth Fibonacci number using recursion. | |
| #### Code | |
| ```python | |
| def fibonacci(n): | |
| """ | |
| Calculate the nth Fibonacci number recursively. | |
| Args: | |
| n (int): The position of the Fibonacci number to calculate. | |
| Returns: | |
| int: The nth Fibonacci number. | |
| Raises: | |
| ... | |
| ``` | |
| **Metadata:** | |
| - model_used: llama-3.3-70b | |
| - provider_used: nvidia | |
| - tokens: 429 | |
| --- | |
| ### 2. Data Extraction PASS | |
| **Task Type:** extraction | |
| **Provider:** groq | |
| **Model:** llama-3.3-70b-versatile | |
| **Duration:** 0.78s | |
| **Prompt:** | |
| ``` | |
| Extract the key information from this text: 'John Doe, age 35, lives in New York and works as a software engineer at Tech Corp since 2020.' | |
| ``` | |
| **Response:** | |
| ``` | |
| The key information extracted from the text is: | |
| 1. **Name**: John Doe | |
| 2. **Age**: 35 | |
| 3. **Location**: New York | |
| 4. **Occupation**: Software Engineer | |
| 5. **Employer**: Tech Corp | |
| 6. **Employment Start Date**: 2020 | |
| ``` | |
| **Metadata:** | |
| - model_used: llama-3.3-70b-versatile | |
| - provider_used: groq | |
| - tokens: 132 | |
| --- | |
| ### 3. Reasoning Task PASS | |
| **Task Type:** reasoning | |
| **Provider:** nvidia | |
| **Model:** devstral-2-123b | |
| **Duration:** 5.25s | |
| **Prompt:** | |
| ``` | |
| If a train travels 120 miles in 2 hours, and another train travels 180 miles in 3 hours, which train is faster and by how much? | |
| ``` | |
| **Response:** | |
| ``` | |
| To determine which train is faster and by how much, we'll calculate the speed of each train using the formula: | |
| \[ | |
| \text{Speed} = \frac{\text{Distance}}{\text{Time}} | |
| \] | |
| ### first-train | |
| - **Distance:** 120 miles | |
| - **Time:** 2 hours | |
| \[ | |
| \text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph} | |
| \] | |
| ### second-train | |
| - **Distance:** 180 miles | |
| - **Time:** 3 hours | |
| \[ | |
| \text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph} | |
| \] | |
| ### comparison | |
| Both tr... | |
| ``` | |
| **Metadata:** | |
| - model_used: devstral-2-123b | |
| - provider_used: nvidia | |
| - tokens: 251 | |
| --- | |
| ### 4. General Question PASS | |
| **Task Type:** general | |
| **Provider:** groq | |
| **Model:** llama-3.3-70b-versatile | |
| **Duration:** 0.92s | |
| **Prompt:** | |
| ``` | |
| What are the three primary colors? | |
| ``` | |
| **Response:** | |
| ``` | |
| The three primary colors are: | |
| 1. Red | |
| 2. Blue | |
| 3. Yellow | |
| These colors cannot be created by mixing other colors together, and they are the base for creating all other colors. | |
| ``` | |
| **Metadata:** | |
| - model_used: llama-3.3-70b-versatile | |
| - provider_used: groq | |
| - tokens: 83 | |
| --- | |
| ### 5. JSON Generation PASS | |
| **Task Type:** code | |
| **Provider:** nvidia | |
| **Model:** devstral-2-123b | |
| **Duration:** 4.44s | |
| **Prompt:** | |
| ``` | |
| Generate a JSON object representing a user profile with name, email, age, and interests (array). | |
| ``` | |
| **Response:** | |
| ``` | |
| Here's a JSON object representing a user profile with the specified fields: | |
| ```json | |
| { | |
| "userProfile": { | |
| "name": "John Doe", | |
| "email": "john.doe@example.com", | |
| "age": 30, | |
| "interests": ["hiking", "photography", "reading", "cooking"] | |
| } | |
| } | |
| ``` | |
| ### explanation | |
| - **"name"**: A string representing the user's full name. | |
| - **"email"**: A string representing the user's email address. | |
| - **"age"**: A number representing the user's age. | |
| - **"interests"**: An array of strings listing the us... | |
| ``` | |
| **Metadata:** | |
| - model_used: devstral-2-123b | |
| - provider_used: nvidia | |
| - tokens: 187 | |
| --- | |
| ### 6. Text Summarization PASS | |
| **Task Type:** general | |
| **Provider:** groq | |
| **Model:** llama-3.3-70b-versatile | |
| **Duration:** 0.33s | |
| **Prompt:** | |
| ``` | |
| Summarize in one sentence: Artificial intelligence is transforming industries by automating tasks, improving decision-making, and creating new opportunities for innovation. | |
| ``` | |
| **Response:** | |
| ``` | |
| Artificial intelligence is revolutionizing various industries by automating tasks, enhancing decision-making, and driving innovation, ultimately transforming the way businesses operate and creating new opportunities. | |
| ``` | |
| **Metadata:** | |
| - model_used: llama-3.3-70b-versatile | |
| - provider_used: groq | |
| - tokens: 96 | |
| --- | |
| ### 7. Math Problem PASS | |
| **Task Type:** reasoning | |
| **Provider:** nvidia | |
| **Model:** llama-3.3-70b | |
| **Duration:** 2.71s | |
| **Prompt:** | |
| ``` | |
| Solve: 2x + 5 = 15. Show your work. | |
| ``` | |
| **Response:** | |
| ``` | |
| To solve for x, we need to isolate x on one side of the equation. | |
| 2x + 5 = 15 | |
| First, subtract 5 from both sides of the equation: | |
| 2x + 5 - 5 = 15 - 5 | |
| 2x = 10 | |
| Next, divide both sides of the equation by 2: | |
| 2x / 2 = 10 / 2 | |
| x = 5 | |
| Therefore, the value of x is 5. | |
| ``` | |
| **Metadata:** | |
| - model_used: llama-3.3-70b | |
| - provider_used: nvidia | |
| - tokens: 152 | |
| --- | |
| ### 8. Creative Writing PASS | |
| **Task Type:** general | |
| **Provider:** nvidia | |
| **Model:** llama-3.3-70b | |
| **Duration:** 1.33s | |
| **Prompt:** | |
| ``` | |
| Write a haiku about coding at night. | |
| ``` | |
| **Response:** | |
| ``` | |
| Moonlit code unfurls | |
| Midnight keyboard symphony | |
| Glowing screen delight | |
| ``` | |
| **Metadata:** | |
| - model_used: llama-3.3-70b | |
| - provider_used: nvidia | |
| - tokens: 62 | |
| --- | |
| ### 9. Code Debug PASS | |
| **Task Type:** code | |
| **Provider:** groq | |
| **Model:** llama-3.3-70b-versatile | |
| **Duration:** 0.76s | |
| **Prompt:** | |
| ``` | |
| Find the bug in this code: def add(a, b): return a + b + 1 | |
| ``` | |
| **Response:** | |
| ``` | |
| **Bug Identification** | |
| The bug in the given code is that the function `add(a, b)` is supposed to return the sum of two numbers `a` and `b`, but it is currently adding an extra `1` to the result. | |
| **Corrected Code** | |
| ```python | |
| def add(a, b): | |
| """Returns the sum of two numbers.""" | |
| return a + b | |
| ``` | |
| **Explanation** | |
| The original code `return a + b + 1` is incrementing the sum by `1`, which is not the expected behavior of an `add` function. The corrected code simply returns the sum of `a` an... | |
| ``` | |
| **Metadata:** | |
| - model_used: llama-3.3-70b-versatile | |
| - provider_used: groq | |
| - tokens: 219 | |
| --- | |
| ### 10. Complex Reasoning FAIL | |
| **Task Type:** reasoning | |
| **Provider:** nvidia | |
| **Model:** devstral-2-123b | |
| **Duration:** 1.56s | |
| **Prompt:** | |
| ``` | |
| If all roses are flowers, and some flowers fade quickly, can we conclude that some roses fade quickly? | |
| ``` | |
| **Error:** | |
| ``` | |
| [router] All models failed. Last error: [nvidia] NVIDIA API error (500): {"error":{"message":"EngineCore encountered an issue. See stack trace (above) for the root cause.","type":"Internal Server Error","param":null,"code":500}} | |
| ``` | |
| --- | |
| ## Provider Performance | |
| | Provider | Tests | Passed | Failed | Success Rate | Avg Duration | | |
| |----------|-------|--------|--------|--------------|-------------| | |
| | groq | 4 | 4 | 0 | 100.0% | 0.70s | | |
| | nvidia | 6 | 5 | 1 | 83.3% | 3.45s | | |
| ## document-flow | |
| ```mermaid | |
| flowchart TD | |
| A[document] --> B[key-sections] | |
| B --> C[implementation] | |
| B --> D[operations] | |
| B --> E[validation] | |
| ``` | |
| ## related-api-reference | |
| | item | value | | |
| | --- | --- | | |
| | api-reference | `api-reference.md` | | |