Chengfengke
/

herbert

@@ -2,10 +2,16 @@
 license: apache-2.0
 base_model:
 - google-bert/bert-base-chinese
 ---
-# Herberta: Pretrained Language Model for Herbal Medicine
-**Herberta** is a pretrained model for herbal medicine research, developed based on the `chinese-roberta-wwm-ext-large` model. The model has been fine-tuned on domain-specific data from 675 ancient books and 32 Traditional Chinese Medicine (TCM) textbooks. It is designed to support a variety of TCM-related NLP tasks.
 ---
@@ -31,4 +37,73 @@ Herberta combines the strengths of modern pretraining techniques and domain know
   "num_hidden_layers": 24,
   "torch_dtype": "float32",
   "vocab_size": 21128
-}

 license: apache-2.0
 base_model:
 - google-bert/bert-base-chinese
+metrics:
+- accuracy
+language:
+- en
+- zh
+pipeline_tag: fill-mask
 ---
+# Herbert: Pretrained Bert Model for Herbal Medicine
+**Herberta** is a pretrained model for herbal medicine research, developed based on the `bert-base-chinese` model. The model has been fine-tuned on domain-specific data from 675 ancient books and 32 Traditional Chinese Medicine (TCM) textbooks. It is designed to support a variety of TCM-related NLP tasks.
 ---
   "num_hidden_layers": 24,
   "torch_dtype": "float32",
   "vocab_size": 21128
+}
+### requirements
+"transformers_version": "4.45.1"
+```bash
+pip install herberta
+```
+###  Quickstart
+#### Use Huggingface
+```python
+from transformers import AutoTokenizer, AutoModel
+# Replace "Chengfengke/herbert" with the Hugging Face model repository name
+model_name = "Chengfengke/herbert"
+# Load tokenizer and model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModel.from_pretrained(model_name)
+# Input text
+text = "中医理论是我国传统文化的瑰宝。"
+# Tokenize and prepare input
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=128)
+# Get the model's outputs
+with torch.no_grad():
+    outputs = model(**inputs)
+# Get the embedding (sentence-level average pooling)
+sentence_embedding = outputs.last_hidden_state.mean(dim=1)
+print("Embedding shape:", sentence_embedding.shape)
+print("Embedding vector:", sentence_embedding)
+```
+#### LocalModel
+```python
+from transformers import BertTokenizer, BertForMaskedLM
+# Load the model and tokenizer
+model_name = "Chengfengke/herbert"
+tokenizer = BertTokenizer.from_pretrained(model_name)
+model = BertForMaskedLM.from_pretrained(model_name)
+inputs = tokenizer("This is an example text for herbal medicine.", return_tensors="pt")
+outputs = model(**inputs)
+```
+## Citation
+If you find our work helpful, feel free to give us a cite.
+```bibtex
+@misc{herberta-embedding,
+  title = {Herberta: A Pretrain_Bert_Model for TCM_herb and downstream Tasks as Text Embedding Generation},
+  url = {https://github.com/15392778677/herberta},
+  author = {Yehan Yang,Xinhan Zheng},
+  month = {December},
+  year = {2024}
+}
+@article{herbert-technical-report,
+  title={Herbert: A Pretrain_Bert_Model for TCM_herb and downstream Tasks as Text Embedding Generation},
+  author={Yehan Yang,Xinhan Zheng},
+  institution={Beijing Angopro Technology Co., Ltd.},
+  year={2024},
+  note={Presented at the 2024 Machine Learning Applications Conference (MLAC)}
+}