Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,81 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- fi
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
tags:
|
| 6 |
+
- finnish
|
| 7 |
+
- gemma
|
| 8 |
+
inference: false
|
| 9 |
+
pipeline_tag: text-generation
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## Pretraining
|
| 13 |
+
More info coming later
|
| 14 |
+
|
| 15 |
+
## Finetuning
|
| 16 |
+
More info coming later
|
| 17 |
+
|
| 18 |
+
## Evaluation results
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
### MTBench Finnish
|
| 22 |
+
|
| 23 |
+
This Ahma-Gemma-3-4B-Instruct-v1.0 model was primarily evaluated using [MTBench Finnish by LumiOpen](https://github.com/LumiOpen/FastChat/tree/main/fastchat/llm_judge)
|
| 24 |
+
|
| 25 |
+
Single-turn results:
|
| 26 |
+
|
| 27 |
+
| Benchmark | Ahma 3B base (instruct prompt format) | Ahma 7B Instruct (instruct prompt format) | Ahma-Gemma-3-4B-Instruct-v1.0 |
|
| 28 |
+
|:--------------------|:--------------------------------------|:------------------------------------------|:--------------------------------------|
|
| 29 |
+
| Coding | 1.00 | 1.00 | 4.2
|
| 30 |
+
| Extraction | 1.30 | 3.00 | 7.3
|
| 31 |
+
| Humanities | 6.20 | 8.00 | 8.9
|
| 32 |
+
| Math | 3.20 | 2.90 | 6.1
|
| 33 |
+
| Reasoning | 4.60 | 5.70 | 4.8
|
| 34 |
+
| Roleplay | 6.50 | 7.20 | 7.7
|
| 35 |
+
| STEM | 5.95 | 7.30 | 9.9
|
| 36 |
+
| Writing | 9.00 | 8.80 | 9.2
|
| 37 |
+
| **Overall Average** | **4.72** | **5.50** | **7.26**
|
| 38 |
+
|
| 39 |
+
Multi-turn results:
|
| 40 |
+
|
| 41 |
+
| Benchmark | Ahma 3B Instruct (instruct prompt format) | Ahma 7B Instruct (instruct prompt format) | Ahma-Gemma-3-4B-Instruct-v1.0 | Poro 34B Chat | Poro-2-8B-Instruct|
|
| 42 |
+
|:--------------------|:------------------------------------------|:------------------------------------------|:------------------------------|---------------|-------------------|
|
| 43 |
+
| Coding | 1.00 | 1.05 | 4.35 | 3.70 | ? |
|
| 44 |
+
| Extraction | 1.15 | 2.65 | 6.55 | 6.37 | ? |
|
| 45 |
+
| Humanities | 6.20 | 7.85 | 6.55 | 9.25 | ? |
|
| 46 |
+
| Math | 2.70 | 2.40 | 4.80 | 1.20 | ? |
|
| 47 |
+
| Reasoning | 3.50 | 4.50 | 4.40 | 4.35 | ? |
|
| 48 |
+
| Roleplay | 6.40 | 6.60 | 7.26 | 7.35 | ? |
|
| 49 |
+
| STEM | 4.78 | 5.40 | 8.80 | 7.80 | ? |
|
| 50 |
+
| Writing | 6.65 | 6.25 | 7.6 | 8.50 | ? |
|
| 51 |
+
| **Overall Average** | **4.05** | **4.59** | **6.57** | **6.06** | **6.75** |
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
As we can see, the Ahma-Gemma-3-4B-Instruct-v1.0 model improves upon our previous model generation. We have already started to work on the datasets and methods to improve this model/scale to bigger models
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
## Acknowledgements
|
| 58 |
+
|
| 59 |
+
This project would not have been possible without compute generously provided by Google through the
|
| 60 |
+
[TPU Research Cloud](https://sites.research.google/trc/).
|
| 61 |
+
|
| 62 |
+
Datacrunch/Verda for sponsoring us some compute for Finetuning:
|
| 63 |
+
HF Org (https://huggingface.co/datacrunch)
|
| 64 |
+
Website: (https://verda.com/)
|
| 65 |
+
|
| 66 |
+
## Team Members
|
| 67 |
+
|
| 68 |
+
- Aapo Tanskanen, [Hugging Face profile](https://huggingface.co/aapot), [LinkedIn profile](https://www.linkedin.com/in/aapotanskanen/)
|
| 69 |
+
- Initial parts in pretraining in our continued pretraining journey
|
| 70 |
+
- Rasmus Toivanen, [Hugging Face profile](https://huggingface.co/RASMUS), [LinkedIn profile](https://www.linkedin.com/in/rasmustoivanen/)
|
| 71 |
+
- Pretraining this model, post-training this model, gathering datasets, running evaluations
|
| 72 |
+
|
| 73 |
+
## Other notable supporters on this journey
|
| 74 |
+
- Ari Kouhia, [Hugging Face profile](https://huggingface.co/concur-means-risotto) for helpful comments on our WA group and helping in synthetic data generation
|
| 75 |
+
- Heikki Saxén, [Hugging Face profile](https://huggingface.co/ducklingcodehouse) for helpful comments on our WA group and also for finetuning DentalQA models on top of this model
|
| 76 |
+
- Mikko Hällfors, [Hugging Face profile](https://huggingface.co/Avokid) for helpful comments on our WA group and helping in synthetic data generation
|
| 77 |
+
|
| 78 |
+
Feel free to contact us for more details 🤗
|
| 79 |
+
|
| 80 |
+
|
| 81 |
+

|