Improve language tag

#2
by lbourdois - opened
Files changed (1) hide show
  1. README.md +133 -121
README.md CHANGED
@@ -1,122 +1,134 @@
1
- ---
2
- license: creativeml-openrail-m
3
- datasets:
4
- - avaliev/umls
5
- language:
6
- - en
7
- base_model:
8
- - Qwen/Qwen2.5-7B-Instruct
9
- pipeline_tag: text-generation
10
- library_name: transformers
11
- tags:
12
- - safetensors
13
- - Unified Medical Language System
14
- - Qwen2.5
15
- - 7B
16
- - Instruct
17
- - Medical
18
- - text-generation-inference
19
- - National Library of Medicine
20
- - umls
21
- ---
22
-
23
-
24
- ### Qwen-UMLS-7B-Instruct `[ Unified Medical Language System ]`
25
-
26
- The **Qwen-UMLS-7B-Instruct** model is a specialized, instruction-tuned language model designed for medical and healthcare-related tasks. It is fine-tuned on the **Qwen2.5-7B-Instruct** base model using the **UMLS (Unified Medical Language System)** dataset, making it an invaluable tool for medical professionals, researchers, and developers building healthcare applications.
27
-
28
- | **File Name** | **Size** | **Description** | **Upload Status** |
29
- |-----------------------------------------|----------------|-------------------------------------------------|--------------------|
30
- | `.gitattributes` | 1.57 kB | File to specify LFS rules for large file tracking. | Uploaded |
31
- | `README.md` | 323 Bytes | Basic project information file. | Updated |
32
- | `added_tokens.json` | 657 Bytes | Contains additional tokens for the tokenizer. | Uploaded |
33
- | `config.json` | 860 Bytes | Configuration file for the model. | Uploaded |
34
- | `generation_config.json` | 281 Bytes | Configuration file for generation settings. | Uploaded |
35
- | `merges.txt` | 1.82 MB | Byte-pair encoding merge rules for tokenization.| Uploaded |
36
- | `pytorch_model-00001-of-00004.bin` | 4.88 GB | First part of the model's PyTorch checkpoint. | Uploaded (LFS) |
37
- | `pytorch_model-00002-of-00004.bin` | 4.93 GB | Second part of the model's PyTorch checkpoint. | Uploaded (LFS) |
38
- | `pytorch_model-00003-of-00004.bin` | 4.33 GB | Third part of the model's PyTorch checkpoint. | Uploaded (LFS) |
39
- | `pytorch_model-00004-of-00004.bin` | 1.09 GB | Fourth part of the model's PyTorch checkpoint. | Uploaded (LFS) |
40
- | `pytorch_model.bin.index.json` | 28.1 kB | Index file mapping layers to checkpoint shards. | Uploaded |
41
- | `special_tokens_map.json` | 644 Bytes | Maps special tokens like `[CLS]`, `[SEP]`, etc. | Uploaded |
42
- | `tokenizer.json` | 11.4 MB | Tokenizer definition and configuration. | Uploaded (LFS) |
43
- | `tokenizer_config.json` | 7.73 kB | Configuration file for the tokenizer. | Uploaded |
44
- | `vocab.json` | 2.78 MB | Vocabulary file for tokenization. | Uploaded |
45
-
46
- ### **Key Features:**
47
-
48
- 1. **Medical Expertise:**
49
- - Trained on the UMLS dataset, ensuring deep domain knowledge in medical terminology, diagnostics, and treatment plans.
50
-
51
- 2. **Instruction-Following:**
52
- - Designed to handle complex queries with clarity and precision, suitable for diagnostic support, patient education, and research.
53
-
54
- 3. **High-Parameter Model:**
55
- - Leverages 7 billion parameters to deliver detailed, contextually accurate responses.
56
-
57
- ---
58
-
59
- ### **Training Details:**
60
-
61
- - **Base Model:** [Qwen2.5-7B-Instruct](#)
62
- - **Dataset:** [avaliev/UMLS](#)
63
- - Comprehensive dataset of medical terminologies, relationships, and use cases with 99.1k samples.
64
- ---
65
- ### **Capabilities:**
66
-
67
- 1. **Clinical Text Analysis:**
68
- - Interpret medical notes, prescriptions, and research articles.
69
-
70
- 2. **Question-Answering:**
71
- - Answer medical queries, provide explanations for symptoms, and suggest treatments based on user prompts.
72
-
73
- 3. **Educational Support:**
74
- - Assist in learning medical terminologies and understanding complex concepts.
75
-
76
- 4. **Healthcare Applications:**
77
- - Integrate into clinical decision-support systems or patient care applications.
78
- ---
79
- ### **Usage Instructions:**
80
-
81
- 1. **Setup:**
82
- Download all files and ensure compatibility with the Hugging Face Transformers library.
83
-
84
- 2. **Loading the Model:**
85
- ```python
86
- from transformers import AutoModelForCausalLM, AutoTokenizer
87
-
88
- model_name = "prithivMLmods/Qwen-UMLS-7B-Instruct"
89
- tokenizer = AutoTokenizer.from_pretrained(model_name)
90
- model = AutoModelForCausalLM.from_pretrained(model_name)
91
- ```
92
-
93
- 3. **Generate Medical Text:**
94
- ```python
95
- input_text = "What are the symptoms and treatments for diabetes?"
96
- inputs = tokenizer(input_text, return_tensors="pt")
97
- outputs = model.generate(**inputs, max_length=200, temperature=0.7)
98
- print(tokenizer.decode(outputs[0], skip_special_tokens=True))
99
- ```
100
-
101
- 4. **Customizing Outputs:**
102
- Modify `generation_config.json` to optimize output style:
103
- - `temperature` for creativity vs. determinism.
104
- - `max_length` for concise or extended responses.
105
-
106
- ---
107
-
108
- ### **Applications:**
109
-
110
- 1. **Clinical Support:**
111
- - Assist healthcare providers with quick, accurate information retrieval.
112
-
113
- 2. **Patient Education:**
114
- - Provide patients with understandable explanations of medical conditions.
115
-
116
- 3. **Medical Research:**
117
- - Summarize or analyze complex medical research papers.
118
-
119
- 4. **AI-Driven Diagnostics:**
120
- - Integrate with diagnostic systems for preliminary assessments.
121
-
 
 
 
 
 
 
 
 
 
 
 
 
122
  ---
 
1
+ ---
2
+ license: creativeml-openrail-m
3
+ datasets:
4
+ - avaliev/umls
5
+ language:
6
+ - zho
7
+ - eng
8
+ - fra
9
+ - spa
10
+ - por
11
+ - deu
12
+ - ita
13
+ - rus
14
+ - jpn
15
+ - kor
16
+ - vie
17
+ - tha
18
+ - ara
19
+ base_model:
20
+ - Qwen/Qwen2.5-7B-Instruct
21
+ pipeline_tag: text-generation
22
+ library_name: transformers
23
+ tags:
24
+ - safetensors
25
+ - Unified Medical Language System
26
+ - Qwen2.5
27
+ - 7B
28
+ - Instruct
29
+ - Medical
30
+ - text-generation-inference
31
+ - National Library of Medicine
32
+ - umls
33
+ ---
34
+
35
+
36
+ ### Qwen-UMLS-7B-Instruct `[ Unified Medical Language System ]`
37
+
38
+ The **Qwen-UMLS-7B-Instruct** model is a specialized, instruction-tuned language model designed for medical and healthcare-related tasks. It is fine-tuned on the **Qwen2.5-7B-Instruct** base model using the **UMLS (Unified Medical Language System)** dataset, making it an invaluable tool for medical professionals, researchers, and developers building healthcare applications.
39
+
40
+ | **File Name** | **Size** | **Description** | **Upload Status** |
41
+ |-----------------------------------------|----------------|-------------------------------------------------|--------------------|
42
+ | `.gitattributes` | 1.57 kB | File to specify LFS rules for large file tracking. | Uploaded |
43
+ | `README.md` | 323 Bytes | Basic project information file. | Updated |
44
+ | `added_tokens.json` | 657 Bytes | Contains additional tokens for the tokenizer. | Uploaded |
45
+ | `config.json` | 860 Bytes | Configuration file for the model. | Uploaded |
46
+ | `generation_config.json` | 281 Bytes | Configuration file for generation settings. | Uploaded |
47
+ | `merges.txt` | 1.82 MB | Byte-pair encoding merge rules for tokenization.| Uploaded |
48
+ | `pytorch_model-00001-of-00004.bin` | 4.88 GB | First part of the model's PyTorch checkpoint. | Uploaded (LFS) |
49
+ | `pytorch_model-00002-of-00004.bin` | 4.93 GB | Second part of the model's PyTorch checkpoint. | Uploaded (LFS) |
50
+ | `pytorch_model-00003-of-00004.bin` | 4.33 GB | Third part of the model's PyTorch checkpoint. | Uploaded (LFS) |
51
+ | `pytorch_model-00004-of-00004.bin` | 1.09 GB | Fourth part of the model's PyTorch checkpoint. | Uploaded (LFS) |
52
+ | `pytorch_model.bin.index.json` | 28.1 kB | Index file mapping layers to checkpoint shards. | Uploaded |
53
+ | `special_tokens_map.json` | 644 Bytes | Maps special tokens like `[CLS]`, `[SEP]`, etc. | Uploaded |
54
+ | `tokenizer.json` | 11.4 MB | Tokenizer definition and configuration. | Uploaded (LFS) |
55
+ | `tokenizer_config.json` | 7.73 kB | Configuration file for the tokenizer. | Uploaded |
56
+ | `vocab.json` | 2.78 MB | Vocabulary file for tokenization. | Uploaded |
57
+
58
+ ### **Key Features:**
59
+
60
+ 1. **Medical Expertise:**
61
+ - Trained on the UMLS dataset, ensuring deep domain knowledge in medical terminology, diagnostics, and treatment plans.
62
+
63
+ 2. **Instruction-Following:**
64
+ - Designed to handle complex queries with clarity and precision, suitable for diagnostic support, patient education, and research.
65
+
66
+ 3. **High-Parameter Model:**
67
+ - Leverages 7 billion parameters to deliver detailed, contextually accurate responses.
68
+
69
+ ---
70
+
71
+ ### **Training Details:**
72
+
73
+ - **Base Model:** [Qwen2.5-7B-Instruct](#)
74
+ - **Dataset:** [avaliev/UMLS](#)
75
+ - Comprehensive dataset of medical terminologies, relationships, and use cases with 99.1k samples.
76
+ ---
77
+ ### **Capabilities:**
78
+
79
+ 1. **Clinical Text Analysis:**
80
+ - Interpret medical notes, prescriptions, and research articles.
81
+
82
+ 2. **Question-Answering:**
83
+ - Answer medical queries, provide explanations for symptoms, and suggest treatments based on user prompts.
84
+
85
+ 3. **Educational Support:**
86
+ - Assist in learning medical terminologies and understanding complex concepts.
87
+
88
+ 4. **Healthcare Applications:**
89
+ - Integrate into clinical decision-support systems or patient care applications.
90
+ ---
91
+ ### **Usage Instructions:**
92
+
93
+ 1. **Setup:**
94
+ Download all files and ensure compatibility with the Hugging Face Transformers library.
95
+
96
+ 2. **Loading the Model:**
97
+ ```python
98
+ from transformers import AutoModelForCausalLM, AutoTokenizer
99
+
100
+ model_name = "prithivMLmods/Qwen-UMLS-7B-Instruct"
101
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
102
+ model = AutoModelForCausalLM.from_pretrained(model_name)
103
+ ```
104
+
105
+ 3. **Generate Medical Text:**
106
+ ```python
107
+ input_text = "What are the symptoms and treatments for diabetes?"
108
+ inputs = tokenizer(input_text, return_tensors="pt")
109
+ outputs = model.generate(**inputs, max_length=200, temperature=0.7)
110
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
111
+ ```
112
+
113
+ 4. **Customizing Outputs:**
114
+ Modify `generation_config.json` to optimize output style:
115
+ - `temperature` for creativity vs. determinism.
116
+ - `max_length` for concise or extended responses.
117
+
118
+ ---
119
+
120
+ ### **Applications:**
121
+
122
+ 1. **Clinical Support:**
123
+ - Assist healthcare providers with quick, accurate information retrieval.
124
+
125
+ 2. **Patient Education:**
126
+ - Provide patients with understandable explanations of medical conditions.
127
+
128
+ 3. **Medical Research:**
129
+ - Summarize or analyze complex medical research papers.
130
+
131
+ 4. **AI-Driven Diagnostics:**
132
+ - Integrate with diagnostic systems for preliminary assessments.
133
+
134
  ---