Text Generation
English
xTimeCrystal commited on
Commit
5a9917f
·
verified ·
1 Parent(s): e47f9d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -55
README.md CHANGED
@@ -11,19 +11,13 @@ pipeline_tag: text-generation
11
 
12
  <!-- Provide a quick summary of what the model is/does. -->
13
 
14
- This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
15
-
16
  ## Model Details
17
 
18
  ### Model Description
19
 
20
  <!-- Provide a longer summary of what this model is. -->
21
 
22
-
23
-
24
  - **Developed by:** xTimeCrystal
25
- - **Funded by [optional]:** [More Information Needed]
26
- - **Shared by [optional]:** [More Information Needed]
27
  - **Model type:** RWKV 7 **(NOTE: the decay is computed using -F.softplus instead of -0.606*torch.sigmoid, all LoRAs use Tanh, LoRA weights are stored like nn.Linear)**
28
  - **Language(s) (NLP):** English
29
  - **License:** MIT
@@ -36,32 +30,24 @@ This modelcard aims to be a base template for new models. It has been generated
36
 
37
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
38
 
39
- [More Information Needed]
40
-
41
- ### Downstream Use [optional]
42
-
43
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
44
-
45
- [More Information Needed]
46
 
47
  ### Out-of-Scope Use
48
 
49
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
50
 
51
- [More Information Needed]
52
 
53
  ## Bias, Risks, and Limitations
54
 
55
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
56
 
57
- [More Information Needed]
58
 
59
  ### Recommendations
60
 
61
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
62
 
63
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
64
-
65
  ## How to Get Started with the Model
66
 
67
  Use the code below to get started with the model.
@@ -76,43 +62,18 @@ Use the code below to get started with the model.
76
 
77
  50B Bytes of custom FineWeb Edu & Open Web Math mixture.
78
 
79
- ### Training Procedure
80
-
81
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
82
-
83
- #### Preprocessing [optional]
84
-
85
- [More Information Needed]
86
-
87
-
88
  #### Training Hyperparameters
89
 
90
  - **Training regime:** bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
91
 
92
- #### Speeds, Sizes, Times [optional]
93
 
94
- Throughput = infinite
95
-
96
- [More Information Needed]
97
 
98
  ## Evaluation
99
 
100
  <!-- This section describes the evaluation protocols and provides the results. -->
101
 
102
- ### Testing Data, Factors & Metrics
103
-
104
- #### Testing Data
105
-
106
- <!-- This should link to a Dataset Card if possible. -->
107
-
108
- [More Information Needed]
109
-
110
- #### Metrics
111
-
112
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
113
-
114
- [More Information Needed]
115
-
116
  ### Results
117
 
118
  Bits-per-byte: ~1
@@ -120,20 +81,12 @@ HellaSwag Accuracy: 33.4% (removed Wikihow entries)
120
 
121
  #### Summary
122
 
123
- ## Technical Specifications [optional]
124
 
125
  ### Model Architecture and Objective
126
 
127
- [More Information Needed]
128
 
129
  ### Compute Infrastructure
130
 
131
- [More Information Needed]
132
-
133
- #### Hardware
134
-
135
- [More Information Needed]
136
-
137
- #### Software
138
-
139
- [More Information Needed]
 
11
 
12
  <!-- Provide a quick summary of what the model is/does. -->
13
 
 
 
14
  ## Model Details
15
 
16
  ### Model Description
17
 
18
  <!-- Provide a longer summary of what this model is. -->
19
 
 
 
20
  - **Developed by:** xTimeCrystal
 
 
21
  - **Model type:** RWKV 7 **(NOTE: the decay is computed using -F.softplus instead of -0.606*torch.sigmoid, all LoRAs use Tanh, LoRA weights are stored like nn.Linear)**
22
  - **Language(s) (NLP):** English
23
  - **License:** MIT
 
30
 
31
  <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
32
 
33
+ Fast autocomplete model.
 
 
 
 
 
 
34
 
35
  ### Out-of-Scope Use
36
 
37
  <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
38
 
39
+ Don't use it for anything serious, it lacks any form of intelligence.
40
 
41
  ## Bias, Risks, and Limitations
42
 
43
  <!-- This section is meant to convey both technical and sociotechnical limitations. -->
44
 
45
+ Limited to ~couple exaFLOPs of compute, don't expect anything coherent beyond a couple sentences.
46
 
47
  ### Recommendations
48
 
49
  <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
50
 
 
 
51
  ## How to Get Started with the Model
52
 
53
  Use the code below to get started with the model.
 
62
 
63
  50B Bytes of custom FineWeb Edu & Open Web Math mixture.
64
 
 
 
 
 
 
 
 
 
 
65
  #### Training Hyperparameters
66
 
67
  - **Training regime:** bf16 non-mixed precision, used own version of Muon with lr from 5e-3 to 1e-3. <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
68
 
69
+ #### Speeds, Sizes, Times
70
 
71
+ Throughput = 350 characters/second using unoptimized inference code. Prompt processing is basically instantaneous, so generation is likely bottlenecked by bandwidth and overhead.
 
 
72
 
73
  ## Evaluation
74
 
75
  <!-- This section describes the evaluation protocols and provides the results. -->
76
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77
  ### Results
78
 
79
  Bits-per-byte: ~1
 
81
 
82
  #### Summary
83
 
84
+ ## Technical Specifications
85
 
86
  ### Model Architecture and Objective
87
 
88
+ Modded RWKV 7 (see top)
89
 
90
  ### Compute Infrastructure
91
 
92
+ 1 x RTX 4080 for 1 week