Update README.md
Browse files
README.md
CHANGED
|
@@ -23,9 +23,9 @@ To evaluate both the general reasoning abilities and the domain-specific perform
|
|
| 23 |
| Llama-3-8B (base) | 51.4% | 59.9% | 73.1% |
|
| 24 |
| Hobby_Recommendation model | 53.7% | 59.9% | 73.2% |
|
| 25 |
| Falcon-7B-Instruct | 40.2% | 57.7% | 67.6% |
|
| 26 |
-
| Mistral-7B-Instruct |
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
## Usage and Intended Uses
|
| 31 |
|
|
|
|
| 23 |
| Llama-3-8B (base) | 51.4% | 59.9% | 73.1% |
|
| 24 |
| Hobby_Recommendation model | 53.7% | 59.9% | 73.2% |
|
| 25 |
| Falcon-7B-Instruct | 40.2% | 57.7% | 67.6% |
|
| 26 |
+
| Mistral-7B-Instruct | 49.8% | 56.3% | 69.6% |
|
| 27 |
|
| 28 |
+
The fine-tuned Hobby_Recommendation model showed AN improvements over the base LLaMA-3.1-8B model on some benchmarks and maintained similar performance on HellaSwag. Both Falcon-7B-Instruct and Mistral-7B-Instruct had lower scores than the base model across most benchmarks. While general reasoning ability remained stable, the fine-tuned model performed especially well on hobby-related prompts, suggesting that training on personalized synthetic data helped the model better handle specific recommendation tasks.
|
| 29 |
|
| 30 |
## Usage and Intended Uses
|
| 31 |
|