Chengfengke commited on
Commit
21d2722
·
verified ·
1 Parent(s): 614183c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -3
README.md CHANGED
@@ -1,3 +1,34 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - google-bert/bert-base-chinese
5
+ ---
6
+ # Herberta: Pretrained Language Model for Herbal Medicine
7
+
8
+ **Herberta** is a pretrained model for herbal medicine research, developed based on the `chinese-roberta-wwm-ext-large` model. The model has been fine-tuned on domain-specific data from 675 ancient books and 32 Traditional Chinese Medicine (TCM) textbooks. It is designed to support a variety of TCM-related NLP tasks.
9
+
10
+ ---
11
+
12
+ ## Introduction
13
+
14
+ This model is optimized for TCM-related tasks, including but not limited to:
15
+ - Herbal formula encoding
16
+ - Domain-specific word embedding
17
+ - Classification, labeling, and sequence prediction tasks in TCM research
18
+
19
+ Herberta combines the strengths of modern pretraining techniques and domain knowledge, allowing it to excel in TCM-related text processing tasks.
20
+
21
+ ---
22
+
23
+ ## Model Config
24
+
25
+ ```json
26
+ {
27
+ "hidden_size": 1024,
28
+ "max_position_embeddings": 512,
29
+ "model_type": "bert",
30
+ "num_attention_heads": 16,
31
+ "num_hidden_layers": 24,
32
+ "torch_dtype": "float32",
33
+ "vocab_size": 21128
34
+ }