nielsr HF Staff commited on
Commit
709e0b0
·
verified ·
1 Parent(s): 953cb9e

Update model card with paper and repository links

Browse files

Hi, I'm Niels from the community science team at Hugging Face. I'm opening this PR to improve the documentation for this model.

This PR:
- Adds a link to the associated research paper: [AutoPCR: Automated Phenotype Concept Recognition by Prompting](https://huggingface.co/papers/2507.19315).
- Adds a link to the official GitHub repository for the project.
- Provides a summary of the model's purpose in biomedical phenotype concept recognition (CR).
- Includes the BibTeX citation for the paper.

The metadata has also been updated to include domain-specific tags.

Files changed (1) hide show
  1. README.md +30 -188
README.md CHANGED
@@ -3,208 +3,50 @@ base_model: unsloth/Qwen3-30B-A3B-Instruct
3
  library_name: peft
4
  pipeline_tag: text-generation
5
  tags:
6
- - base_model:adapter:unsloth/Qwen3-30B-A3B-Instruct
7
  - lora
8
  - sft
9
  - transformers
10
- - trl
11
  - unsloth
 
 
12
  ---
13
 
14
- # Model Card for Model ID
15
 
16
- <!-- Provide a quick summary of what the model is/does. -->
17
 
 
 
18
 
 
19
 
20
- ## Model Details
21
 
22
- ### Model Description
23
 
24
- <!-- Provide a longer summary of what this model is. -->
25
 
 
 
 
 
26
 
 
27
 
28
- - **Developed by:** [More Information Needed]
29
- - **Funded by [optional]:** [More Information Needed]
30
- - **Shared by [optional]:** [More Information Needed]
31
- - **Model type:** [More Information Needed]
32
- - **Language(s) (NLP):** [More Information Needed]
33
- - **License:** [More Information Needed]
34
- - **Finetuned from model [optional]:** [More Information Needed]
35
-
36
- ### Model Sources [optional]
37
-
38
- <!-- Provide the basic links for the model. -->
39
-
40
- - **Repository:** [More Information Needed]
41
- - **Paper [optional]:** [More Information Needed]
42
- - **Demo [optional]:** [More Information Needed]
43
-
44
- ## Uses
45
-
46
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
47
-
48
- ### Direct Use
49
-
50
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
51
-
52
- [More Information Needed]
53
-
54
- ### Downstream Use [optional]
55
-
56
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
57
-
58
- [More Information Needed]
59
-
60
- ### Out-of-Scope Use
61
-
62
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
63
-
64
- [More Information Needed]
65
-
66
- ## Bias, Risks, and Limitations
67
-
68
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
69
-
70
- [More Information Needed]
71
-
72
- ### Recommendations
73
-
74
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
75
-
76
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
77
-
78
- ## How to Get Started with the Model
79
-
80
- Use the code below to get started with the model.
81
-
82
- [More Information Needed]
83
-
84
- ## Training Details
85
-
86
- ### Training Data
87
-
88
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
89
-
90
- [More Information Needed]
91
-
92
- ### Training Procedure
93
-
94
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
95
-
96
- #### Preprocessing [optional]
97
-
98
- [More Information Needed]
99
-
100
-
101
- #### Training Hyperparameters
102
-
103
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
104
-
105
- #### Speeds, Sizes, Times [optional]
106
-
107
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
108
-
109
- [More Information Needed]
110
-
111
- ## Evaluation
112
-
113
- <!-- This section describes the evaluation protocols and provides the results. -->
114
-
115
- ### Testing Data, Factors & Metrics
116
-
117
- #### Testing Data
118
-
119
- <!-- This should link to a Dataset Card if possible. -->
120
-
121
- [More Information Needed]
122
-
123
- #### Factors
124
-
125
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
126
-
127
- [More Information Needed]
128
-
129
- #### Metrics
130
-
131
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
132
-
133
- [More Information Needed]
134
-
135
- ### Results
136
-
137
- [More Information Needed]
138
-
139
- #### Summary
140
-
141
-
142
-
143
- ## Model Examination [optional]
144
-
145
- <!-- Relevant interpretability work for the model goes here -->
146
-
147
- [More Information Needed]
148
-
149
- ## Environmental Impact
150
-
151
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
152
-
153
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
154
-
155
- - **Hardware Type:** [More Information Needed]
156
- - **Hours used:** [More Information Needed]
157
- - **Cloud Provider:** [More Information Needed]
158
- - **Compute Region:** [More Information Needed]
159
- - **Carbon Emitted:** [More Information Needed]
160
-
161
- ## Technical Specifications [optional]
162
-
163
- ### Model Architecture and Objective
164
-
165
- [More Information Needed]
166
-
167
- ### Compute Infrastructure
168
-
169
- [More Information Needed]
170
-
171
- #### Hardware
172
-
173
- [More Information Needed]
174
-
175
- #### Software
176
-
177
- [More Information Needed]
178
-
179
- ## Citation [optional]
180
-
181
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
182
 
183
  **BibTeX:**
184
-
185
- [More Information Needed]
186
-
187
- **APA:**
188
-
189
- [More Information Needed]
190
-
191
- ## Glossary [optional]
192
-
193
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
194
-
195
- [More Information Needed]
196
-
197
- ## More Information [optional]
198
-
199
- [More Information Needed]
200
-
201
- ## Model Card Authors [optional]
202
-
203
- [More Information Needed]
204
-
205
- ## Model Card Contact
206
-
207
- [More Information Needed]
208
- ### Framework versions
209
-
210
- - PEFT 0.18.0
 
3
  library_name: peft
4
  pipeline_tag: text-generation
5
  tags:
 
6
  - lora
7
  - sft
8
  - transformers
 
9
  - unsloth
10
+ - biomedical
11
+ - phenotype-recognition
12
  ---
13
 
14
+ # AutoPCR: Automated Phenotype Concept Recognition by Prompting
15
 
16
+ AutoPCR is a prompt-based phenotype concept recognition (CR) method designed to automatically generalize to new ontologies and unseen data without ontology-specific training. This repository contains the fine-tuned entity linker component of the system, which is a LoRA adapter for `unsloth/Qwen3-30B-A3B-Instruct`.
17
 
18
+ - **Repository:** https://github.com/yctao7/AutoPCR
19
+ - **Paper:** [AutoPCR: Automated Phenotype Concept Recognition by Prompting](https://huggingface.co/papers/2507.19315)
20
 
21
+ ## Model Description
22
 
23
+ Phenotype concept recognition (CR) is a fundamental task in biomedical text mining. Existing methods often struggle to generalize across diverse text styles or require extensive ontology-specific training. AutoPCR addresses these limitations by using a prompt-based approach and an optional self-supervised training strategy to achieve robust performance across multiple datasets. This model specifically serves as the entity linker within the pipeline to map extracted phenotype mentions to standard ontologies like HPO and MEDIC.
24
 
25
+ ## Usage
26
 
27
+ For detailed instructions on how to use this model within the AutoPCR framework—including environment setup, dictionary building, indexing, and running evaluation experiments—please refer to the [official GitHub repository](https://github.com/yctao7/AutoPCR).
28
 
29
+ Example command for running HPO evaluation from the source code:
30
+ ```bash
31
+ python HPO_evaluation.py --ontology_dict ../dict/HPO -c BIOC-GS -o ../results/bioc-gs.tsv --only_longest
32
+ ```
33
 
34
+ ## Citation
35
 
36
+ If you find this work useful, please cite:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  **BibTeX:**
39
+ ```bibtex
40
+ @misc{tao2025autopcr,
41
+ title={AutoPCR: Automated Phenotype Concept Recognition by Prompting},
42
+ author={Yichao Tao and others},
43
+ year={2025},
44
+ eprint={2507.19315},
45
+ archivePrefix={arXiv},
46
+ primaryClass={cs.CL},
47
+ url={https://arxiv.org/abs/2507.19315},
48
+ }
49
+ ```
50
+
51
+ ## Contact
52
+ Contact: drjieliu@umich.edu