enemydw
/

Hobby_Recommendation

PEFT

Safetensors

Model card Files Files and versions

xet

Community

enemydw commited on Apr 30, 2025

Commit

5eaab69

verified ·

1 Parent(s): 4c2566c

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -7

README.md CHANGED Viewed

@@ -18,12 +18,12 @@ The base LLaMA 3.1-8B model was fine-tuned using the parameter-efficient Low-Ran
 ## Evaluation
 To evaluate both the general reasoning abilities and the domain-specific performance of the model, three benchmarks from the lm_eval suite were selected: ARC-Challenge, HellaSwag, and Winogrande. ARC-Challenge tests the model's advanced reasoning and problem-solving skills on complex grade-school science questions. HellaSwag measures the ability to select contextually appropriate continuations, which reflects commonsense reasoning. Winogrande evaluates how well the model understands language and context by testing its ability to determine which person or object a pronoun (like “he,” “she,” or “it”) refers to in challenging sentences. These benchmarks were chosen because they are standard in the field for assessing language model capabilities and provide a meaningful comparison of general language understanding, reasoning, and commonsense knowledge. Additionally, a domain-specific Hobby Test Set was included to measure how well the model generates personalized hobby recommendations.
-| Model Name                   | ARC-Challenge | HellaSwag | Winogrande | Hobby Test Set |
-|------------------------------|:-------------:|:---------:|:----------:|:--------------:|
-| Llama-3-8B (base)            |    51.4%      |   59.9%   |   73.1%    |     62.1%      |
-| Hobby_Recommendation model   |    53.7%      |   59.9%   |   73.2%    |     88.5%      |
-| Falcon-7B-Instruct           |    --.-%      |   --.-%   |   --.-%    |     --.-%      |
-| Mistral-7B-Instruct          |    --.-%      |   --.-%   |   --.-%    |     --.-%      |
 **Model Performance Summary**
@@ -59,7 +59,7 @@ Hobby Recommendation:
 ```
 ## Expected Output Format
-The expected output from the model is a concise text string that contains only the name of the recommended hobby. The model does not provide reasoning or additional explanation—only the specific hobby recommendation based on the user profile.
 ```coding, board games```

 ## Evaluation
 To evaluate both the general reasoning abilities and the domain-specific performance of the model, three benchmarks from the lm_eval suite were selected: ARC-Challenge, HellaSwag, and Winogrande. ARC-Challenge tests the model's advanced reasoning and problem-solving skills on complex grade-school science questions. HellaSwag measures the ability to select contextually appropriate continuations, which reflects commonsense reasoning. Winogrande evaluates how well the model understands language and context by testing its ability to determine which person or object a pronoun (like “he,” “she,” or “it”) refers to in challenging sentences. These benchmarks were chosen because they are standard in the field for assessing language model capabilities and provide a meaningful comparison of general language understanding, reasoning, and commonsense knowledge. Additionally, a domain-specific Hobby Test Set was included to measure how well the model generates personalized hobby recommendations.
+| Model Name                   | ARC-Challenge | HellaSwag | Winogrande |
+|------------------------------|:-------------:|:---------:|:----------:|
+| Llama-3-8B (base)            |    51.4%      |   59.9%   |   73.1%    |
+| Hobby_Recommendation model   |    53.7%      |   59.9%   |   73.2%    |
+| Falcon-7B-Instruct           |    --.-%      |   --.-%   |   --.-%    |
+| Mistral-7B-Instruct          |    --.-%      |   --.-%   |   --.-%    |
 **Model Performance Summary**
 ```
 ## Expected Output Format
+The expected output from the model typically includes the name of a recommended hobby, often followed by a brief explanation or justification. Although the instruction requests only the hobby name, the model may generate a full sentence or paragraph explaining why the hobby fits the user's profile.
 ```coding, board games```