Spestly commited on
Commit
fe711e1
·
verified ·
1 Parent(s): 5f9cd29

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -13
README.md CHANGED
@@ -18,16 +18,16 @@ Athena-4-15B is a 15-billion-parameter multimodal reasoning model designed for h
18
 
19
  ## Key capabilities
20
 
21
- * Strong textual reasoning (math, logic, chain-of-thought style outputs). ([Hugging Face][1])
22
- * Multimodal understanding: able to process image+text prompts for captioning and image reasoning via an image-text processor. ([Hugging Face][1])
23
- * Optimised for instruction-following use cases (SFT on curated instruction data). ([Hugging Face][1])
24
 
25
  ---
26
 
27
  ## Highlights / Benchmark notes
28
 
29
  * Competitive performance on reasoning and multimodal benchmarks reported by the Apriel team (reported scores, e.g., Artificial Analysis index and IFBench in their model card). ([Hugging Face][1])
30
- * Targeted to deliver high capability per parameter (aiming for frontier-level reasoning while keeping model size ~15B). ([Hugging Face][1])
31
 
32
  ---
33
 
@@ -47,9 +47,8 @@ Athena-4-15B is a 15-billion-parameter multimodal reasoning model designed for h
47
 
48
  ## Limitations
49
 
50
- * Generates internal chain-of-thought-style reasoning before final answer by design; this can increase token usage and latency. The Apriel upstream notes that the model explicitly produces stepwise reasoning and then a final response. This behaviour may need post-processing or filtering depending on your deployment. ([Hugging Face][1])
51
- * Potential for hallucinations and confident-sounding but incorrect outputs; always verify critical facts with external sources. ([Hugging Face][1])
52
- * The model was trained and fine-tuned on curated datasets prioritising reasoning; domain coverage should be validated for specialised domains (medical, legal, etc.). ([Hugging Face][1])
53
 
54
  ---
55
 
@@ -63,15 +62,15 @@ Athena-4-15B is a 15-billion-parameter multimodal reasoning model designed for h
63
 
64
  ## Training summary (reference implementation)
65
 
66
- * **Mid-training / continual pretraining:** Extensive CPT on reasoning-focused text and multimodal interleaved image-text corpora to strengthen reasoning capabilities. ([Hugging Face][1])
67
- * **Supervised fine-tuning (SFT):** Fine-tuned on >2M high-quality text samples consisting of mathematical problems, coding tasks, instruction-following data, and conversational examples. No RLHF was applied in the referenced Apriel workflow. ([Hugging Face][1])
68
- * **Training hardware (reference):** Apriel reports large-scale training hardware usage (e.g., H100 clusters) in their public card; Athena’s training choices may differ but were informed by this regimen. ([Hugging Face][1])
69
 
70
  ---
71
 
72
  ## Evaluation
73
 
74
- * Third-party and open-benchmark evaluations were used in the Apriel reference (Artificial Analysis for text benchmarks; VLMEvalKit/OpenCompass for image evaluation). Reported scores indicated strong reasoning performance relative to model size. Use case-specific evaluation is recommended before production deployment. ([Hugging Face][1])
75
 
76
  ---
77
 
@@ -102,7 +101,7 @@ pipe(text=messages)
102
 
103
  ## License
104
 
105
- Use a permissive license consistent with your organisation’s policy. The Apriel reference model uses an MIT license — check and align Athena’s license to your legal requirements before publishing. ([Hugging Face][1])
106
 
107
  ---
108
 
@@ -115,6 +114,6 @@ If you publish results using Athena, include a citation to the design and traini
115
  ## Implementation notes & recommendations
116
 
117
  * **Prompting:** Athena benefits from prompts that ask for stepwise reasoning when the trace is required, but for concise outputs prefer instructing the model to “Answer concisely” or to “Provide only the final answer.”
118
- * **Latency vs. accuracy:** Expect higher token usage and slightly longer generation time due to explicit internal reasoning; benchmark inference cost and consider temperature/top-k adjustments for production. ([Hugging Face][1])
119
  * **Safety pipeline:** Add toxicity checks, hallucination detection, and a facts-verification layer for external claims before surfacing to end users.
120
  * **Evaluation:** Run domain-specific benchmarks and human evaluations for calibration prior to public release.
 
18
 
19
  ## Key capabilities
20
 
21
+ * Strong textual reasoning (math, logic, chain-of-thought style outputs).
22
+ * Multimodal understanding: able to process image+text prompts for captioning and image reasoning via an image-text processor.
23
+ * Optimised for instruction-following use cases (SFT on curated instruction data).
24
 
25
  ---
26
 
27
  ## Highlights / Benchmark notes
28
 
29
  * Competitive performance on reasoning and multimodal benchmarks reported by the Apriel team (reported scores, e.g., Artificial Analysis index and IFBench in their model card). ([Hugging Face][1])
30
+ * Targeted to deliver high capability per parameter (aiming for frontier-level reasoning while keeping model size ~15B).
31
 
32
  ---
33
 
 
47
 
48
  ## Limitations
49
 
50
+ * Generates internal chain-of-thought-style reasoning before final answer by design; this can increase token usage and latency. The Apriel upstream notes that the model explicitly produces stepwise reasoning and then a final response. This behaviour may need post-processing or filtering depending on your deployment.
51
+ * The model was trained and fine-tuned on curated datasets prioritising reasoning; domain coverage should be validated for specialised domains (medical, legal, etc.).
 
52
 
53
  ---
54
 
 
62
 
63
  ## Training summary (reference implementation)
64
 
65
+ * **Mid-training / continual pretraining:** Extensive CPT on reasoning-focused text and multimodal interleaved image-text corpora to strengthen reasoning capabilities.
66
+ * **Supervised fine-tuning (SFT):** Fine-tuned on >2M high-quality text samples consisting of mathematical problems, coding tasks, instruction-following data, and conversational examples. No RLHF was applied in the referenced Apriel workflow.
67
+ * **Training hardware (reference):** Apriel reports large-scale training hardware usage (e.g., H100 clusters) in their public card; Athena’s training choices may differ but were informed by this regimen.
68
 
69
  ---
70
 
71
  ## Evaluation
72
 
73
+ * Third-party and open-benchmark evaluations were used in the Apriel reference (Artificial Analysis for text benchmarks; VLMEvalKit/OpenCompass for image evaluation). Reported scores indicated strong reasoning performance relative to model size. Use case-specific evaluation is recommended before production deployment.
74
 
75
  ---
76
 
 
101
 
102
  ## License
103
 
104
+ Use a permissive license consistent with your organisation’s policy. The Apriel reference model uses an MIT license — check and align Athena’s license to your legal requirements before publishing.
105
 
106
  ---
107
 
 
114
  ## Implementation notes & recommendations
115
 
116
  * **Prompting:** Athena benefits from prompts that ask for stepwise reasoning when the trace is required, but for concise outputs prefer instructing the model to “Answer concisely” or to “Provide only the final answer.”
117
+ * **Latency vs. accuracy:** Expect higher token usage and slightly longer generation time due to explicit internal reasoning; benchmark inference cost and consider temperature/top-k adjustments for production.
118
  * **Safety pipeline:** Add toxicity checks, hallucination detection, and a facts-verification layer for external claims before surfacing to end users.
119
  * **Evaluation:** Run domain-specific benchmarks and human evaluations for calibration prior to public release.