xiaoyao9184 commited on
Commit
bce0cec
·
verified ·
1 Parent(s): cb9942b

Add files using upload-large-folder tool

Browse files
Files changed (5) hide show
  1. README.md +196 -1
  2. config.json +76 -51
  3. manifest.json +1 -1
  4. model.safetensors +2 -2
  5. training_args.bin +3 -0
README.md CHANGED
@@ -1,4 +1,199 @@
1
  ---
2
  library_name: transformers
3
- license: cc-by-nc-sa-4.0
4
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
+ tags: []
4
  ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+ This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
config.json CHANGED
@@ -4,9 +4,12 @@
4
  ],
5
  "bbox_embed_size": 64,
6
  "bbox_size": 1024,
 
 
7
  "blank_bbox_token_id": 1025,
8
  "bos_token_id": {
9
  "block_without_boxes": 151673,
 
10
  "ocr_with_boxes": 151671,
11
  "ocr_without_boxes": 151672
12
  },
@@ -98,21 +101,26 @@
98
  "image_embed_encoding_multiplier": 256,
99
  "image_embed_encoding_size": 1024,
100
  "image_token_id": 151667,
101
- "max_sequence_length": 1024,
 
102
  "merge_size": 2,
103
  "model_type": "surya-multimodal-foundation",
 
104
  "num_attention_heads": 16,
 
105
  "num_hidden_layers": 12,
106
  "num_key_value_heads": 4,
107
  "num_register_tokens": 4,
 
108
  "pad_token_id": 151668,
109
  "patch_size": 14,
110
  "register_token_ids": [
111
- 151674,
112
  151675,
113
  151676,
114
- 151677
 
115
  ],
 
116
  "special_ocr_tokens": {
117
  "all": [
118
  "</S>",
@@ -124,10 +132,12 @@
124
  "<OCR-WB>",
125
  "<OCR-WOB>",
126
  "<BLOCKS-WOB>",
 
127
  "<REG1>",
128
  "<REG2>",
129
  "<REG3>",
130
  "<REG4>",
 
131
  "<NO-MATH>",
132
  "<b>",
133
  "</b>",
@@ -183,6 +193,24 @@
183
  "<SCRIPT-KHMER>",
184
  "<SCRIPT-MONGOLIAN>",
185
  "<SCRIPT-MATH>",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
  "<reserved_0>",
187
  "<reserved_1>",
188
  "<reserved_2>",
@@ -1118,27 +1146,7 @@
1118
  "<reserved_932>",
1119
  "<reserved_933>",
1120
  "<reserved_934>",
1121
- "<reserved_935>",
1122
- "<reserved_936>",
1123
- "<reserved_937>",
1124
- "<reserved_938>",
1125
- "<reserved_939>",
1126
- "<reserved_940>",
1127
- "<reserved_941>",
1128
- "<reserved_942>",
1129
- "<reserved_943>",
1130
- "<reserved_944>",
1131
- "<reserved_945>",
1132
- "<reserved_946>",
1133
- "<reserved_947>",
1134
- "<reserved_948>",
1135
- "<reserved_949>",
1136
- "<reserved_950>",
1137
- "<reserved_951>",
1138
- "<reserved_952>",
1139
- "<reserved_953>",
1140
- "<reserved_954>",
1141
- "<reserved_955>"
1142
  ],
1143
  "formatting": [
1144
  "<b>",
@@ -1161,6 +1169,23 @@
1161
  "<code>",
1162
  "</code>"
1163
  ],
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1164
  "math_external": [
1165
  "<math>",
1166
  "<math display='block'>",
@@ -1169,6 +1194,11 @@
1169
  "<math display=\"inline\">",
1170
  "</math>"
1171
  ],
 
 
 
 
 
1172
  "reserved": [
1173
  "<reserved_0>",
1174
  "<reserved_1>",
@@ -2105,27 +2135,7 @@
2105
  "<reserved_932>",
2106
  "<reserved_933>",
2107
  "<reserved_934>",
2108
- "<reserved_935>",
2109
- "<reserved_936>",
2110
- "<reserved_937>",
2111
- "<reserved_938>",
2112
- "<reserved_939>",
2113
- "<reserved_940>",
2114
- "<reserved_941>",
2115
- "<reserved_942>",
2116
- "<reserved_943>",
2117
- "<reserved_944>",
2118
- "<reserved_945>",
2119
- "<reserved_946>",
2120
- "<reserved_947>",
2121
- "<reserved_948>",
2122
- "<reserved_949>",
2123
- "<reserved_950>",
2124
- "<reserved_951>",
2125
- "<reserved_952>",
2126
- "<reserved_953>",
2127
- "<reserved_954>",
2128
- "<reserved_955>"
2129
  ],
2130
  "script": [
2131
  "<SCRIPT-LATIN>",
@@ -2168,14 +2178,23 @@
2168
  "<OCR-WB>",
2169
  "<OCR-WOB>",
2170
  "<BLOCKS-WOB>",
 
2171
  "<REG1>",
2172
  "<REG2>",
2173
  "<REG3>",
2174
  "<REG4>",
 
2175
  "<NO-MATH>"
2176
  ]
2177
  },
2178
  "special_token_count": 4,
 
 
 
 
 
 
 
2179
  "tasks": {
2180
  "block_without_boxes": {
2181
  "img_size": [
@@ -2183,23 +2202,28 @@
2183
  512
2184
  ]
2185
  },
 
 
 
 
 
 
2186
  "ocr_with_boxes": {
2187
  "img_size": [
2188
  1024,
2189
- 256
2190
  ]
2191
  },
2192
  "ocr_without_boxes": {
2193
  "img_size": [
2194
  1024,
2195
- 256
2196
  ]
2197
  }
2198
  },
2199
  "torch_dtype": "bfloat16",
2200
  "transformers_version": "4.50.3",
2201
- "unmask_image": false,
2202
- "use_ce_loss": false,
2203
  "vision_encoder": {
2204
  "_attn_implementation_autoset": true,
2205
  "_name_or_path": "",
@@ -2211,7 +2235,7 @@
2211
  "chunk_size_feed_forward": 0,
2212
  "cross_attention_hidden_size": null,
2213
  "decoder_start_token_id": null,
2214
- "depth": 8,
2215
  "diversity_penalty": 0.0,
2216
  "do_sample": false,
2217
  "early_stopping": false,
@@ -2223,7 +2247,8 @@
2223
  "forced_eos_token_id": null,
2224
  "fullatt_block_indexes": [
2225
  3,
2226
- 7
 
2227
  ],
2228
  "hidden_act": "silu",
2229
  "hidden_size": 1280,
 
4
  ],
5
  "bbox_embed_size": 64,
6
  "bbox_size": 1024,
7
+ "beacon_token_id": 151679,
8
+ "beacon_token_interval": 64,
9
  "blank_bbox_token_id": 1025,
10
  "bos_token_id": {
11
  "block_without_boxes": 151673,
12
+ "layout": 151674,
13
  "ocr_with_boxes": 151671,
14
  "ocr_without_boxes": 151672
15
  },
 
101
  "image_embed_encoding_multiplier": 256,
102
  "image_embed_encoding_size": 1024,
103
  "image_token_id": 151667,
104
+ "max_multi_out": 8,
105
+ "max_sequence_length": 2560,
106
  "merge_size": 2,
107
  "model_type": "surya-multimodal-foundation",
108
+ "multi_output_distance": 1,
109
  "num_attention_heads": 16,
110
+ "num_beacon_tokens": 1,
111
  "num_hidden_layers": 12,
112
  "num_key_value_heads": 4,
113
  "num_register_tokens": 4,
114
+ "num_styles": 5,
115
  "pad_token_id": 151668,
116
  "patch_size": 14,
117
  "register_token_ids": [
 
118
  151675,
119
  151676,
120
+ 151677,
121
+ 151678
122
  ],
123
+ "sliding_window": 256,
124
  "special_ocr_tokens": {
125
  "all": [
126
  "</S>",
 
132
  "<OCR-WB>",
133
  "<OCR-WOB>",
134
  "<BLOCKS-WOB>",
135
+ "<LAYOUT>",
136
  "<REG1>",
137
  "<REG2>",
138
  "<REG3>",
139
  "<REG4>",
140
+ "<BEACON>",
141
  "<NO-MATH>",
142
  "<b>",
143
  "</b>",
 
193
  "<SCRIPT-KHMER>",
194
  "<SCRIPT-MONGOLIAN>",
195
  "<SCRIPT-MATH>",
196
+ "<page-header>",
197
+ "<page-footer>",
198
+ "<footnote>",
199
+ "<image>",
200
+ "<figure>",
201
+ "<text>",
202
+ "<caption>",
203
+ "<list-item>",
204
+ "<section-header>",
205
+ "<table>",
206
+ "<table-of-contents>",
207
+ "<form>",
208
+ "<equation-block>",
209
+ "<code-block>",
210
+ "<complex-block>",
211
+ "<think1>",
212
+ "<think2>",
213
+ "<think3>",
214
  "<reserved_0>",
215
  "<reserved_1>",
216
  "<reserved_2>",
 
1146
  "<reserved_932>",
1147
  "<reserved_933>",
1148
  "<reserved_934>",
1149
+ "<reserved_935>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1150
  ],
1151
  "formatting": [
1152
  "<b>",
 
1169
  "<code>",
1170
  "</code>"
1171
  ],
1172
+ "layout": [
1173
+ "<page-header>",
1174
+ "<page-footer>",
1175
+ "<footnote>",
1176
+ "<image>",
1177
+ "<figure>",
1178
+ "<text>",
1179
+ "<caption>",
1180
+ "<list-item>",
1181
+ "<section-header>",
1182
+ "<table>",
1183
+ "<table-of-contents>",
1184
+ "<form>",
1185
+ "<equation-block>",
1186
+ "<code-block>",
1187
+ "<complex-block>"
1188
+ ],
1189
  "math_external": [
1190
  "<math>",
1191
  "<math display='block'>",
 
1194
  "<math display=\"inline\">",
1195
  "</math>"
1196
  ],
1197
+ "reasoning": [
1198
+ "<think1>",
1199
+ "<think2>",
1200
+ "<think3>"
1201
+ ],
1202
  "reserved": [
1203
  "<reserved_0>",
1204
  "<reserved_1>",
 
2135
  "<reserved_932>",
2136
  "<reserved_933>",
2137
  "<reserved_934>",
2138
+ "<reserved_935>"
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2139
  ],
2140
  "script": [
2141
  "<SCRIPT-LATIN>",
 
2178
  "<OCR-WB>",
2179
  "<OCR-WOB>",
2180
  "<BLOCKS-WOB>",
2181
+ "<LAYOUT>",
2182
  "<REG1>",
2183
  "<REG2>",
2184
  "<REG3>",
2185
  "<REG4>",
2186
+ "<BEACON>",
2187
  "<NO-MATH>"
2188
  ]
2189
  },
2190
  "special_token_count": 4,
2191
+ "styles": [
2192
+ "Plain",
2193
+ "Handwriting",
2194
+ "Math",
2195
+ "Chemical",
2196
+ "Code"
2197
+ ],
2198
  "tasks": {
2199
  "block_without_boxes": {
2200
  "img_size": [
 
2202
  512
2203
  ]
2204
  },
2205
+ "layout": {
2206
+ "img_size": [
2207
+ 1024,
2208
+ 1024
2209
+ ]
2210
+ },
2211
  "ocr_with_boxes": {
2212
  "img_size": [
2213
  1024,
2214
+ 512
2215
  ]
2216
  },
2217
  "ocr_without_boxes": {
2218
  "img_size": [
2219
  1024,
2220
+ 512
2221
  ]
2222
  }
2223
  },
2224
  "torch_dtype": "bfloat16",
2225
  "transformers_version": "4.50.3",
2226
+ "use_cut_cross_entropy": false,
 
2227
  "vision_encoder": {
2228
  "_attn_implementation_autoset": true,
2229
  "_name_or_path": "",
 
2235
  "chunk_size_feed_forward": 0,
2236
  "cross_attention_hidden_size": null,
2237
  "decoder_start_token_id": null,
2238
+ "depth": 12,
2239
  "diversity_penalty": 0.0,
2240
  "do_sample": false,
2241
  "early_stopping": false,
 
2247
  "forced_eos_token_id": null,
2248
  "fullatt_block_indexes": [
2249
  3,
2250
+ 7,
2251
+ 11
2252
  ],
2253
  "hidden_act": "silu",
2254
  "hidden_size": 1280,
manifest.json CHANGED
@@ -1 +1 @@
1
- {"files": ["model.safetensors", "added_tokens.json", "tokenizer_config.json", "special_tokens_map.json", "config.json", "README.md", "merges.txt", ".gitattributes", "vocab.json", "preprocessor_config.json"]}
 
1
+ {"files": [".gitattributes", "merges.txt", "README.md", "training_args.bin", "special_tokens_map.json", "added_tokens.json", "tokenizer_config.json", "preprocessor_config.json", "config.json", "vocab.json", "model.safetensors"]}
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:30818e0e67898880036bb6b0738104366e8fe4197dc1f21b96c3c21d6ee2d671
3
- size 1634909470
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b4efa3897c82b31aaae7941cfc91308357a4783f2e2477424bcf537f81496fe8
3
+ size 2357747926
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:555cdb558a7ac24c8241387d5c72b1657c81cae1b0a50ad4be0d65eb65d3a487
3
+ size 7505