BUT-FIT
/

csmpt7b

@@ -1,7 +1,10 @@
 ---
 license: apache-2.0
 ---
-### Eval
 Dev eval at CS-HellaSwag  (automatically translated HellaSwag benchmark)
 | Model | Model Accuracy |
 |---------------|----------------|
@@ -17,15 +20,17 @@ However, we ran validation over the course of training on CS-Hellaswag, and afte
 The improvement over mistral7b is not significant.
-### How to setup environment
 ```bash
 pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0
 # be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
 pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
 1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
-### How to use in transformers
 ```python
 import torch
 import transformers
@@ -34,7 +39,6 @@ from transformers import pipeline
 name = 'BUT-FIT/csmpt7b'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
-config.attn_config['attn_impl'] = 'flash'
 config.init_device = 'cuda:0'  # For fast initialization directly on GPU!
 model = transformers.AutoModelForCausalLM.from_pretrained(
     name,
@@ -56,30 +60,26 @@ with torch.autocast('cuda', dtype=torch.bfloat16):
              do_sample=True,
              use_cache=True))
-```
-### Our Release Plan
 | Stage | Description | Date |
 |---------------|----------------|----------------|
 | 1       | 'Best' model + training data    | 11.03.2024
 | 2       |  All checkpoints + training code|
 | 3       | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation    |
-- Stage 1: 'Best' model + training data.
-- Stage 2: All checkpoints + training code
-- Stage 3: __Benczechmark__ a collection of Czech datasets. **Get in touch if you'd like to know more and contribute!**
 ## Getting in Touch
 For further questions, email to `martin.fajcik@vut.cz`.
-## Disclaimer
 This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.
-## Acknowledgement
 This work was supported by NAKI III program of  Ministry of Culture Czech Republic, project semANT ---
 "Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and
 by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).

 ---
 license: apache-2.0
 ---
+# Intruduction
+# Eval
 Dev eval at CS-HellaSwag  (automatically translated HellaSwag benchmark)
 | Model | Model Accuracy |
 |---------------|----------------|
 The improvement over mistral7b is not significant.
+# Usage
+## How to Setup Environment
 ```bash
 pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0
 # be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
 pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
 1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
+```
+## Running the Code
 ```python
 import torch
 import transformers
 name = 'BUT-FIT/csmpt7b'
 config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
 config.init_device = 'cuda:0'  # For fast initialization directly on GPU!
 model = transformers.AutoModelForCausalLM.from_pretrained(
     name,
              do_sample=True,
              use_cache=True))
+```
+# Training Data
+We release most of our training data here \[TBD MDocekal.\].
+# Our Release Plan
 | Stage | Description | Date |
 |---------------|----------------|----------------|
 | 1       | 'Best' model + training data    | 11.03.2024
 | 2       |  All checkpoints + training code|
 | 3       | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation    |
 ## Getting in Touch
 For further questions, email to `martin.fajcik@vut.cz`.
+# Disclaimer
 This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.
+# Acknowledgement
 This work was supported by NAKI III program of  Ministry of Culture Czech Republic, project semANT ---
 "Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and
 by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).