enemydw commited on
Commit
027eb28
·
verified ·
1 Parent(s): dc8e756

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -23,9 +23,9 @@ To evaluate both the general reasoning abilities and the domain-specific perform
23
  | Llama-3-8B (base) | 51.4% | 59.9% | 73.1% |
24
  | Hobby_Recommendation model | 53.7% | 59.9% | 73.2% |
25
  | Falcon-7B-Instruct | 40.2% | 57.7% | 67.6% |
26
- | Mistral-7B-Instruct | --.-% | --.-% | --.-% |
27
 
28
- **Model Performance Summary**
29
 
30
  ## Usage and Intended Uses
31
 
 
23
  | Llama-3-8B (base) | 51.4% | 59.9% | 73.1% |
24
  | Hobby_Recommendation model | 53.7% | 59.9% | 73.2% |
25
  | Falcon-7B-Instruct | 40.2% | 57.7% | 67.6% |
26
+ | Mistral-7B-Instruct | 49.8% | 56.3% | 69.6% |
27
 
28
+ The fine-tuned Hobby_Recommendation model showed AN improvements over the base LLaMA-3.1-8B model on some benchmarks and maintained similar performance on HellaSwag. Both Falcon-7B-Instruct and Mistral-7B-Instruct had lower scores than the base model across most benchmarks. While general reasoning ability remained stable, the fine-tuned model performed especially well on hobby-related prompts, suggesting that training on personalized synthetic data helped the model better handle specific recommendation tasks.
29
 
30
  ## Usage and Intended Uses
31