scrapeRL / docs /test /ai-provider-test-report.md
NeerajCodz's picture
docs: init proto
24f0bf0
# ai-provider-test-report
**Generated:** 2026-04-05 02:23:10
**Test Duration:** 23.50s
## summary
- **Total Tests:** 10
- **Passed:** 9
- **Failed:** 1
- **Success Rate:** 90.0%
## test-results
### 1-code-generation-pass
**Task Type:** code
**Provider:** nvidia
**Model:** llama-3.3-70b
**Duration:** 5.41s
**Prompt:**
```
Write a Python function to calculate fibonacci numbers recursively.
```
**Response:**
```
### Recursive Fibonacci Function in Python
#### Overview
The Fibonacci sequence is a series of numbers where a number is the sum of the two preceding ones, usually starting with 0 and 1. This function calculates the nth Fibonacci number using recursion.
#### Code
```python
def fibonacci(n):
"""
Calculate the nth Fibonacci number recursively.
Args:
n (int): The position of the Fibonacci number to calculate.
Returns:
int: The nth Fibonacci number.
Raises:
...
```
**Metadata:**
- model_used: llama-3.3-70b
- provider_used: nvidia
- tokens: 429
---
### 2. Data Extraction PASS
**Task Type:** extraction
**Provider:** groq
**Model:** llama-3.3-70b-versatile
**Duration:** 0.78s
**Prompt:**
```
Extract the key information from this text: 'John Doe, age 35, lives in New York and works as a software engineer at Tech Corp since 2020.'
```
**Response:**
```
The key information extracted from the text is:
1. **Name**: John Doe
2. **Age**: 35
3. **Location**: New York
4. **Occupation**: Software Engineer
5. **Employer**: Tech Corp
6. **Employment Start Date**: 2020
```
**Metadata:**
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 132
---
### 3. Reasoning Task PASS
**Task Type:** reasoning
**Provider:** nvidia
**Model:** devstral-2-123b
**Duration:** 5.25s
**Prompt:**
```
If a train travels 120 miles in 2 hours, and another train travels 180 miles in 3 hours, which train is faster and by how much?
```
**Response:**
```
To determine which train is faster and by how much, we'll calculate the speed of each train using the formula:
\[
\text{Speed} = \frac{\text{Distance}}{\text{Time}}
\]
### first-train
- **Distance:** 120 miles
- **Time:** 2 hours
\[
\text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
\]
### second-train
- **Distance:** 180 miles
- **Time:** 3 hours
\[
\text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
\]
### comparison
Both tr...
```
**Metadata:**
- model_used: devstral-2-123b
- provider_used: nvidia
- tokens: 251
---
### 4. General Question PASS
**Task Type:** general
**Provider:** groq
**Model:** llama-3.3-70b-versatile
**Duration:** 0.92s
**Prompt:**
```
What are the three primary colors?
```
**Response:**
```
The three primary colors are:
1. Red
2. Blue
3. Yellow
These colors cannot be created by mixing other colors together, and they are the base for creating all other colors.
```
**Metadata:**
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 83
---
### 5. JSON Generation PASS
**Task Type:** code
**Provider:** nvidia
**Model:** devstral-2-123b
**Duration:** 4.44s
**Prompt:**
```
Generate a JSON object representing a user profile with name, email, age, and interests (array).
```
**Response:**
```
Here's a JSON object representing a user profile with the specified fields:
```json
{
"userProfile": {
"name": "John Doe",
"email": "john.doe@example.com",
"age": 30,
"interests": ["hiking", "photography", "reading", "cooking"]
}
}
```
### explanation
- **"name"**: A string representing the user's full name.
- **"email"**: A string representing the user's email address.
- **"age"**: A number representing the user's age.
- **"interests"**: An array of strings listing the us...
```
**Metadata:**
- model_used: devstral-2-123b
- provider_used: nvidia
- tokens: 187
---
### 6. Text Summarization PASS
**Task Type:** general
**Provider:** groq
**Model:** llama-3.3-70b-versatile
**Duration:** 0.33s
**Prompt:**
```
Summarize in one sentence: Artificial intelligence is transforming industries by automating tasks, improving decision-making, and creating new opportunities for innovation.
```
**Response:**
```
Artificial intelligence is revolutionizing various industries by automating tasks, enhancing decision-making, and driving innovation, ultimately transforming the way businesses operate and creating new opportunities.
```
**Metadata:**
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 96
---
### 7. Math Problem PASS
**Task Type:** reasoning
**Provider:** nvidia
**Model:** llama-3.3-70b
**Duration:** 2.71s
**Prompt:**
```
Solve: 2x + 5 = 15. Show your work.
```
**Response:**
```
To solve for x, we need to isolate x on one side of the equation.
2x + 5 = 15
First, subtract 5 from both sides of the equation:
2x + 5 - 5 = 15 - 5
2x = 10
Next, divide both sides of the equation by 2:
2x / 2 = 10 / 2
x = 5
Therefore, the value of x is 5.
```
**Metadata:**
- model_used: llama-3.3-70b
- provider_used: nvidia
- tokens: 152
---
### 8. Creative Writing PASS
**Task Type:** general
**Provider:** nvidia
**Model:** llama-3.3-70b
**Duration:** 1.33s
**Prompt:**
```
Write a haiku about coding at night.
```
**Response:**
```
Moonlit code unfurls
Midnight keyboard symphony
Glowing screen delight
```
**Metadata:**
- model_used: llama-3.3-70b
- provider_used: nvidia
- tokens: 62
---
### 9. Code Debug PASS
**Task Type:** code
**Provider:** groq
**Model:** llama-3.3-70b-versatile
**Duration:** 0.76s
**Prompt:**
```
Find the bug in this code: def add(a, b): return a + b + 1
```
**Response:**
```
**Bug Identification**
The bug in the given code is that the function `add(a, b)` is supposed to return the sum of two numbers `a` and `b`, but it is currently adding an extra `1` to the result.
**Corrected Code**
```python
def add(a, b):
"""Returns the sum of two numbers."""
return a + b
```
**Explanation**
The original code `return a + b + 1` is incrementing the sum by `1`, which is not the expected behavior of an `add` function. The corrected code simply returns the sum of `a` an...
```
**Metadata:**
- model_used: llama-3.3-70b-versatile
- provider_used: groq
- tokens: 219
---
### 10. Complex Reasoning FAIL
**Task Type:** reasoning
**Provider:** nvidia
**Model:** devstral-2-123b
**Duration:** 1.56s
**Prompt:**
```
If all roses are flowers, and some flowers fade quickly, can we conclude that some roses fade quickly?
```
**Error:**
```
[router] All models failed. Last error: [nvidia] NVIDIA API error (500): {"error":{"message":"EngineCore encountered an issue. See stack trace (above) for the root cause.","type":"Internal Server Error","param":null,"code":500}}
```
---
## Provider Performance
| Provider | Tests | Passed | Failed | Success Rate | Avg Duration |
|----------|-------|--------|--------|--------------|-------------|
| groq | 4 | 4 | 0 | 100.0% | 0.70s |
| nvidia | 6 | 5 | 1 | 83.3% | 3.45s |
## document-flow
```mermaid
flowchart TD
A[document] --> B[key-sections]
B --> C[implementation]
B --> D[operations]
B --> E[validation]
```
## related-api-reference
| item | value |
| --- | --- |
| api-reference | `api-reference.md` |