XiaoEnn commited on
Commit
8a67d41
·
verified ·
1 Parent(s): 7b0be7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -22
README.md CHANGED
@@ -1,12 +1,17 @@
1
- ---
2
- tags:
3
- - pretrain_model
4
- - transformers
5
- - TCM
6
- - herberta
7
- license: apache-2.0
8
- inference: true
9
- ---
 
 
 
 
 
10
  ### intrudcution
11
  Herberta Pretrain model experimental research model developed by the Angelpro Team, focused on Development of a pre-training model for herbal medicine.Based on the chinese-roberta-wwm-ext-large model, we do the MLM task to complete the pre-training model on the data of 675 ancient books and 32 Chinese medicine textbooks, which we named herberta, where we take the front and back words of herb and Roberta and splice them together. We are committed to make a contribution to the TCM big modeling industry.
12
  We hope it can be used:
@@ -14,18 +19,6 @@ We hope it can be used:
14
  - Word Embedding Model for Chinese Medicine Domain Data
15
  - Support for a wide range of downstream TCM tasks, e.g., classification tasks, labeling tasks, etc.
16
 
17
- ### model_config
18
- ```json
19
- {
20
- "hidden_size": 1024,
21
- "max_position_embeddings": 512,
22
- "model_type": "bert",
23
- "num_attention_heads": 16,
24
- "num_hidden_layers": 24,
25
- "torch_dtype": "float32",
26
- "vocab_size": 21128
27
- }
28
- ```
29
 
30
  ### requirements
31
  "transformers_version": "4.45.1"
@@ -35,6 +28,32 @@ pip install herberta
35
  ### Quickstart
36
 
37
  #### Use Huggingface
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
 
40
  #### LocalModel
@@ -72,4 +91,4 @@ If you find our work helpful, feel free to give us a cite.
72
  institution={Beijing Angopro Technology Co., Ltd.},
73
  year={2024},
74
  note={Presented at the 2024 Machine Learning Applications Conference (MLAC)}
75
- }
 
1
+ ---
2
+ tags:
3
+ - pretrain_model
4
+ - transformers
5
+ - TCM
6
+ - herberta
7
+ license: apache-2.0
8
+ inference: true
9
+ language:
10
+ - aa
11
+ base_model:
12
+ - hfl/chinese-roberta-wwm-ext
13
+ library_name: transformers
14
+ ---
15
  ### intrudcution
16
  Herberta Pretrain model experimental research model developed by the Angelpro Team, focused on Development of a pre-training model for herbal medicine.Based on the chinese-roberta-wwm-ext-large model, we do the MLM task to complete the pre-training model on the data of 675 ancient books and 32 Chinese medicine textbooks, which we named herberta, where we take the front and back words of herb and Roberta and splice them together. We are committed to make a contribution to the TCM big modeling industry.
17
  We hope it can be used:
 
19
  - Word Embedding Model for Chinese Medicine Domain Data
20
  - Support for a wide range of downstream TCM tasks, e.g., classification tasks, labeling tasks, etc.
21
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ### requirements
24
  "transformers_version": "4.45.1"
 
28
  ### Quickstart
29
 
30
  #### Use Huggingface
31
+ ```python
32
+ from transformers import AutoTokenizer, AutoModel
33
+
34
+ # Replace "XiaoEnn/herberta" with the Hugging Face model repository name
35
+ model_name = "XiaoEnn/herberta"
36
+
37
+ # Load tokenizer and model
38
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
39
+ model = AutoModel.from_pretrained(model_name)
40
+
41
+ # Input text
42
+ text = "中医理论是我国传统文化的瑰宝。"
43
+
44
+ # Tokenize and prepare input
45
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=128)
46
+
47
+ # Get the model's outputs
48
+ with torch.no_grad():
49
+ outputs = model(**inputs)
50
+
51
+ # Get the embedding (sentence-level average pooling)
52
+ sentence_embedding = outputs.last_hidden_state.mean(dim=1)
53
+
54
+ print("Embedding shape:", sentence_embedding.shape)
55
+ print("Embedding vector:", sentence_embedding)
56
+ ```
57
 
58
 
59
  #### LocalModel
 
91
  institution={Beijing Angopro Technology Co., Ltd.},
92
  year={2024},
93
  note={Presented at the 2024 Machine Learning Applications Conference (MLAC)}
94
+ }