anchen1011 commited on
Commit
68f7a27
Β·
verified Β·
1 Parent(s): 3756ae6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -4
README.md CHANGED
@@ -1,10 +1,112 @@
1
  ---
2
  title: README
3
- emoji: πŸ“‰
4
- colorFrom: purple
5
- colorTo: yellow
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- # Hi there πŸ‘‹
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: README
3
+ emoji: πŸ’Š
4
+ colorFrom: blue
5
+ colorTo: red
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ # MidReal
11
+
12
+ We follow an extremely simple format to organize and manage our models aand data.
13
+
14
+ # Model
15
+
16
+ ### Format
17
+
18
+ Repo should be named as `midreal/{model_function}_{train_method}_{train_technique}_{base_model}_{date}`
19
+
20
+ Model card should include ([example](https://huggingface.co/midreal/passage_sft_qlora_llama3_70b_0704)):
21
+
22
+ ```
23
+ # Data
24
+
25
+ Dataset trained with
26
+
27
+ # Base
28
+
29
+ Base model trained on
30
+
31
+ # Template
32
+
33
+ Prompt/Response template of the model
34
+
35
+ # System
36
+
37
+ MidReal system version that's compatible with the model
38
+
39
+ # W&B
40
+
41
+ Tracking of the training procedure
42
+
43
+ # PIC
44
+
45
+ Person in charge
46
+ ```
47
+
48
+ ### Usage
49
+
50
+ Models could be uploaded with:
51
+ ```
52
+ upload
53
+ ```
54
+
55
+ and downloaded with:
56
+ ```
57
+ download
58
+ ```
59
+
60
+ and updated with:
61
+ ```
62
+ update
63
+ ```
64
+
65
+ # Dataset
66
+
67
+ ### Format
68
+
69
+ Repo should be named as `midreal/{model_function}_{train_method}_{status}_{date}`
70
+
71
+ Dataset card should include ([example](https://huggingface.co/datasets/midreal/passage_sft_lmflow_0704)):
72
+
73
+ ```
74
+ # Data Schema
75
+
76
+ The schema that the data elements should follow
77
+
78
+ # PIC
79
+
80
+ Person in charge
81
+ ```
82
+
83
+ Other information about the dataset is also welcome.
84
+
85
+ ### Status
86
+
87
+ The production of dataset should somehow follow a pipeline from `raw_data` to `story_data` to `openai` or `lmflow` format.
88
+
89
+ `raw_data` is any data in its original appearance.
90
+
91
+ `story_data` currently follow [data_schema_0718](https://huggingface.co/datasets/midreal/data_schema_0718).
92
+
93
+ `openai` refers to [OpenAI Fine-tuning data format](https://platform.openai.com/docs/guides/fine-tuning/example-format).
94
+
95
+ `lmflow` refers to [LMFlow data format](https://optimalscale.github.io/LMFlow/examples/DATASETS.html#data-format).
96
+
97
+ ### Usage
98
+
99
+ Datasets could be uploaded with:
100
+ ```
101
+ upload
102
+ ```
103
+
104
+ and downloaded with:
105
+ ```
106
+ download
107
+ ```
108
+
109
+ and updated with:
110
+ ```
111
+ update
112
+ ```