Update README.md
Browse files
README.md
CHANGED
|
@@ -18,12 +18,12 @@ The base LLaMA 3.1-8B model was fine-tuned using the parameter-efficient Low-Ran
|
|
| 18 |
## Evaluation
|
| 19 |
|
| 20 |
To evaluate both the general reasoning abilities and the domain-specific performance of the model, three benchmarks from the lm_eval suite were selected: ARC-Challenge, HellaSwag, and Winogrande. ARC-Challenge tests the model's advanced reasoning and problem-solving skills on complex grade-school science questions. HellaSwag measures the ability to select contextually appropriate continuations, which reflects commonsense reasoning. Winogrande evaluates how well the model understands language and context by testing its ability to determine which person or object a pronoun (like “he,” “she,” or “it”) refers to in challenging sentences. These benchmarks were chosen because they are standard in the field for assessing language model capabilities and provide a meaningful comparison of general language understanding, reasoning, and commonsense knowledge. Additionally, a domain-specific Hobby Test Set was included to measure how well the model generates personalized hobby recommendations.
|
| 21 |
-
| Model Name | ARC-Challenge | HellaSwag | Winogrande |
|
| 22 |
-
|
| 23 |
-
| Llama-3-8B (base) | 51.4% | 59.9% | 73.1% |
|
| 24 |
-
| Hobby_Recommendation model | 53.7% | 59.9% | 73.2% |
|
| 25 |
-
| Falcon-7B-Instruct | --.-% | --.-% | --.-% |
|
| 26 |
-
| Mistral-7B-Instruct | --.-% | --.-% | --.-% |
|
| 27 |
|
| 28 |
**Model Performance Summary**
|
| 29 |
|
|
@@ -59,7 +59,7 @@ Hobby Recommendation:
|
|
| 59 |
```
|
| 60 |
## Expected Output Format
|
| 61 |
|
| 62 |
-
The expected output from the model
|
| 63 |
|
| 64 |
```coding, board games```
|
| 65 |
|
|
|
|
| 18 |
## Evaluation
|
| 19 |
|
| 20 |
To evaluate both the general reasoning abilities and the domain-specific performance of the model, three benchmarks from the lm_eval suite were selected: ARC-Challenge, HellaSwag, and Winogrande. ARC-Challenge tests the model's advanced reasoning and problem-solving skills on complex grade-school science questions. HellaSwag measures the ability to select contextually appropriate continuations, which reflects commonsense reasoning. Winogrande evaluates how well the model understands language and context by testing its ability to determine which person or object a pronoun (like “he,” “she,” or “it”) refers to in challenging sentences. These benchmarks were chosen because they are standard in the field for assessing language model capabilities and provide a meaningful comparison of general language understanding, reasoning, and commonsense knowledge. Additionally, a domain-specific Hobby Test Set was included to measure how well the model generates personalized hobby recommendations.
|
| 21 |
+
| Model Name | ARC-Challenge | HellaSwag | Winogrande |
|
| 22 |
+
|------------------------------|:-------------:|:---------:|:----------:|
|
| 23 |
+
| Llama-3-8B (base) | 51.4% | 59.9% | 73.1% |
|
| 24 |
+
| Hobby_Recommendation model | 53.7% | 59.9% | 73.2% |
|
| 25 |
+
| Falcon-7B-Instruct | --.-% | --.-% | --.-% |
|
| 26 |
+
| Mistral-7B-Instruct | --.-% | --.-% | --.-% |
|
| 27 |
|
| 28 |
**Model Performance Summary**
|
| 29 |
|
|
|
|
| 59 |
```
|
| 60 |
## Expected Output Format
|
| 61 |
|
| 62 |
+
The expected output from the model typically includes the name of a recommended hobby, often followed by a brief explanation or justification. Although the instruction requests only the hobby name, the model may generate a full sentence or paragraph explaining why the hobby fits the user's profile.
|
| 63 |
|
| 64 |
```coding, board games```
|
| 65 |
|