bfuzzy1
/

acheron

Safetensors

llama

Model card Files Files and versions

xet

bfuzzy1 commited on Oct 28, 2024

Commit

a9fd896

verified ·

1 Parent(s): 5763cd8

Update README.md

Browse files

Files changed (1) hide show

README.md +4 -57

README.md CHANGED Viewed

@@ -18,31 +18,6 @@ dtype: float16
 ```
-Here’s a draft Model Card for your new model based on the evaluations you have provided.
-Model Card: Custom AI Model (514M Parameters)
-Model Details
-	•	Architecture: This model is based on a fine-tuned large language model with 514M parameters, designed for handling a variety of commonsense reasoning tasks and general knowledge. The model has undergone multiple rounds of evaluation and focuses on tasks like ARC Challenge, HellaSwag, PIQA, and Winogrande.
-	•	Model Size: 514M parameters
-Use Case and Intended Applications
-This model is designed for tasks requiring:
-	•	Commonsense Reasoning: Understanding and predicting everyday physical and linguistic scenarios.
-	•	Text Comprehension: Handling tasks that require completion or understanding of real-world descriptions and ambiguous situations.
-	•	General Knowledge: Reasoning through questions that require broad, general understanding of knowledge domains, such as multiple-choice exams.
-Training Data
-The model was fine-tuned on various datasets to optimize its performance in the following areas:
-	•	Physical Reasoning: Datasets like PIQA help the model to reason about physical situations and solutions.
-	•	Commonsense and Ambiguous Reasoning: Datasets like HellaSwag and Winogrande help the model make sense of events or situations that require a high degree of commonsense understanding.
-	•	General Knowledge: The ARC Challenge dataset allows the model to answer multiple-choice questions that test general reasoning skills.
 Evaluation Results
 The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
@@ -54,38 +29,10 @@ The model was evaluated across a range of tasks. Below are the final evaluation
 | 1.24B      | llama 3.2 | 36.75 | 36.18  | 63.70     | 74.54  | 60.54      | 54.34   |
 | 514M       | archeon   | NA    | 32.34  | 47.80     | 74.37  | 62.12      | 54.16   |
-	•	ARC Challenge: The model performs decently in answering general knowledge questions.
-	•	HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
-	•	PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
-	•	Winogrande: It also shows competitive performance in linguistic reasoning tasks.
-Key Strengths
-	1.	Physical and Commonsense Reasoning: The model consistently performs well in tasks like PIQA and HellaSwag, showcasing strong abilities in understanding and predicting physical scenarios and commonsense events.
-	2.	Linguistic Reasoning: The model also performs competitively in tasks like Winogrande, which tests linguistic understanding and ambiguity resolution.
-Key Weaknesses
-	1.	General Knowledge (ARC Challenge): While the model does reasonably well, it lags behind top models in handling more challenging general knowledge questions.
-	2.	Math Reasoning: Performance on numerical reasoning tasks like GSM8k was excluded due to poor performance, indicating a potential area for future improvement with further fine-tuning.
-Recommendations for Improvement
-	•	Fine-Tuning on Mathematical Reasoning: To improve on GSM8k and other math-heavy tasks, consider fine-tuning on datasets like MathQA or MATH.
-	•	Enhanced General Knowledge: To further enhance performance in general knowledge tasks (ARC Challenge), additional fine-tuning with datasets like SQuAD, TriviaQA, or other large knowledge datasets would be beneficial.
-Model Usage
-This model is well-suited for a variety of NLP tasks where commonsense reasoning and physical reasoning are required, such as:
-	•	Answering multiple-choice questions (e.g., exam preparation, automated tutoring).
-	•	Text completion tasks (e.g., completing sequences of events).
-	•	Commonsense AI applications (e.g., chatbot responses requiring real-world understanding).
-Limitations
-	•	Mathematical Reasoning: The model struggles with tasks requiring numerical problem-solving or complex logical reasoning in math.
-	•	Context-specific Fine-tuning: The model may require additional fine-tuning for specialized tasks outside of its current scope (e.g., legal reasoning, scientific document comprehension).
 Ethical Considerations

 ```
 Evaluation Results
 The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
 | 1.24B      | llama 3.2 | 36.75 | 36.18  | 63.70     | 74.54  | 60.54      | 54.34   |
 | 514M       | archeon   | NA    | 32.34  | 47.80     | 74.37  | 62.12      | 54.16   |
+•	ARC Challenge: The model performs decently in answering general knowledge questions.
+•	HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
+•	PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
+•	Winogrande: It also shows competitive performance in linguistic reasoning tasks.
 Ethical Considerations