Spaces:

NeerajCodz
/

scrapeRL

Sleeping

App Files Files Community

scrapeRL / docs /test /ai-provider-test-report.md

NeerajCodz

docs: init proto

24f0bf0 about 2 months ago

preview code

raw

history blame contribute delete

7.3 kB

	# ai-provider-test-report

	Generated: 2026-04-05 02:23:10
	Test Duration: 23.50s

	## summary

	- Total Tests: 10
	- Passed: 9
	- Failed: 1
	- Success Rate: 90.0%

	## test-results

	### 1-code-generation-pass

	Task Type: code
	Provider: nvidia
	Model: llama-3.3-70b
	Duration: 5.41s

	Prompt:
	```
	Write a Python function to calculate fibonacci numbers recursively.
	```

	Response:
	```
	### Recursive Fibonacci Function in Python
	#### Overview

	The Fibonacci sequence is a series of numbers where a number is the sum of the two preceding ones, usually starting with 0 and 1. This function calculates the nth Fibonacci number using recursion.

	#### Code

	```python
	def fibonacci(n):
	"""
	Calculate the nth Fibonacci number recursively.

	Args:
	n (int): The position of the Fibonacci number to calculate.

	Returns:
	int: The nth Fibonacci number.

	Raises:
	...
	```

	Metadata:
	- model_used: llama-3.3-70b
	- provider_used: nvidia
	- tokens: 429

	---

	### 2. Data Extraction PASS

	Task Type: extraction
	Provider: groq
	Model: llama-3.3-70b-versatile
	Duration: 0.78s

	Prompt:
	```
	Extract the key information from this text: 'John Doe, age 35, lives in New York and works as a software engineer at Tech Corp since 2020.'
	```

	Response:
	```
	The key information extracted from the text is:

	1. Name: John Doe
	2. Age: 35
	3. Location: New York
	4. Occupation: Software Engineer
	5. Employer: Tech Corp
	6. Employment Start Date: 2020
	```

	Metadata:
	- model_used: llama-3.3-70b-versatile
	- provider_used: groq
	- tokens: 132

	---

	### 3. Reasoning Task PASS

	Task Type: reasoning
	Provider: nvidia
	Model: devstral-2-123b
	Duration: 5.25s

	Prompt:
	```
	If a train travels 120 miles in 2 hours, and another train travels 180 miles in 3 hours, which train is faster and by how much?
	```

	Response:
	```
	To determine which train is faster and by how much, we'll calculate the speed of each train using the formula:

	\[
	\text{Speed} = \frac{\text{Distance}}{\text{Time}}
	\]

	### first-train
	- Distance: 120 miles
	- Time: 2 hours

	\[
	\text{Speed}_1 = \frac{120 \text{ miles}}{2 \text{ hours}} = 60 \text{ mph}
	\]

	### second-train
	- Distance: 180 miles
	- Time: 3 hours

	\[
	\text{Speed}_2 = \frac{180 \text{ miles}}{3 \text{ hours}} = 60 \text{ mph}
	\]

	### comparison
	Both tr...
	```

	Metadata:
	- model_used: devstral-2-123b
	- provider_used: nvidia
	- tokens: 251

	---

	### 4. General Question PASS

	Task Type: general
	Provider: groq
	Model: llama-3.3-70b-versatile
	Duration: 0.92s

	Prompt:
	```
	What are the three primary colors?
	```

	Response:
	```
	The three primary colors are:
	1. Red
	2. Blue
	3. Yellow

	These colors cannot be created by mixing other colors together, and they are the base for creating all other colors.
	```

	Metadata:
	- model_used: llama-3.3-70b-versatile
	- provider_used: groq
	- tokens: 83

	---

	### 5. JSON Generation PASS

	Task Type: code
	Provider: nvidia
	Model: devstral-2-123b
	Duration: 4.44s

	Prompt:
	```
	Generate a JSON object representing a user profile with name, email, age, and interests (array).
	```

	Response:
	```
	Here's a JSON object representing a user profile with the specified fields:

	```json
	{
	"userProfile": {
	"name": "John Doe",
	"email": "john.doe@example.com",
	"age": 30,
	"interests": ["hiking", "photography", "reading", "cooking"]
	}
	}
	```

	### explanation
	- "name": A string representing the user's full name.
	- "email": A string representing the user's email address.
	- "age": A number representing the user's age.
	- "interests": An array of strings listing the us...
	```

	Metadata:
	- model_used: devstral-2-123b
	- provider_used: nvidia
	- tokens: 187

	---

	### 6. Text Summarization PASS

	Task Type: general
	Provider: groq
	Model: llama-3.3-70b-versatile
	Duration: 0.33s

	Prompt:
	```
	Summarize in one sentence: Artificial intelligence is transforming industries by automating tasks, improving decision-making, and creating new opportunities for innovation.
	```

	Response:
	```
	Artificial intelligence is revolutionizing various industries by automating tasks, enhancing decision-making, and driving innovation, ultimately transforming the way businesses operate and creating new opportunities.
	```

	Metadata:
	- model_used: llama-3.3-70b-versatile
	- provider_used: groq
	- tokens: 96

	---

	### 7. Math Problem PASS

	Task Type: reasoning
	Provider: nvidia
	Model: llama-3.3-70b
	Duration: 2.71s

	Prompt:
	```
	Solve: 2x + 5 = 15. Show your work.
	```

	Response:
	```
	To solve for x, we need to isolate x on one side of the equation.

	2x + 5 = 15

	First, subtract 5 from both sides of the equation:
	2x + 5 - 5 = 15 - 5
	2x = 10

	Next, divide both sides of the equation by 2:
	2x / 2 = 10 / 2
	x = 5

	Therefore, the value of x is 5.
	```

	Metadata:
	- model_used: llama-3.3-70b
	- provider_used: nvidia
	- tokens: 152

	---

	### 8. Creative Writing PASS

	Task Type: general
	Provider: nvidia
	Model: llama-3.3-70b
	Duration: 1.33s

	Prompt:
	```
	Write a haiku about coding at night.
	```

	Response:
	```
	Moonlit code unfurls
	Midnight keyboard symphony
	Glowing screen delight
	```

	Metadata:
	- model_used: llama-3.3-70b
	- provider_used: nvidia
	- tokens: 62

	---

	### 9. Code Debug PASS

	Task Type: code
	Provider: groq
	Model: llama-3.3-70b-versatile
	Duration: 0.76s

	Prompt:
	```
	Find the bug in this code: def add(a, b): return a + b + 1
	```

	Response:
	```
	Bug Identification

	The bug in the given code is that the function `add(a, b)` is supposed to return the sum of two numbers `a` and `b`, but it is currently adding an extra `1` to the result.

	Corrected Code

	```python
	def add(a, b):
	"""Returns the sum of two numbers."""
	return a + b
	```

	Explanation

	The original code `return a + b + 1` is incrementing the sum by `1`, which is not the expected behavior of an `add` function. The corrected code simply returns the sum of `a` an...
	```

	Metadata:
	- model_used: llama-3.3-70b-versatile
	- provider_used: groq
	- tokens: 219

	---

	### 10. Complex Reasoning FAIL

	Task Type: reasoning
	Provider: nvidia
	Model: devstral-2-123b
	Duration: 1.56s

	Prompt:
	```
	If all roses are flowers, and some flowers fade quickly, can we conclude that some roses fade quickly?
	```

	Error:
	```
	[router] All models failed. Last error: [nvidia] NVIDIA API error (500): {"error":{"message":"EngineCore encountered an issue. See stack trace (above) for the root cause.","type":"Internal Server Error","param":null,"code":500}}
	```

	---

	## Provider Performance

	\| Provider \| Tests \| Passed \| Failed \| Success Rate \| Avg Duration \|
	\|----------\|-------\|--------\|--------\|--------------\|-------------\|
	\| groq \| 4 \| 4 \| 0 \| 100.0% \| 0.70s \|
	\| nvidia \| 6 \| 5 \| 1 \| 83.3% \| 3.45s \|

	## document-flow

	```mermaid
	flowchart TD
	A[document] --> B[key-sections]
	B --> C[implementation]
	B --> D[operations]
	B --> E[validation]
	```
	## related-api-reference

	\| item \| value \|
	\| --- \| --- \|
	\| api-reference \| `api-reference.md` \|