| # Model Card for BioGPT-FineTuned-MedicalTextbooks-FP16 | |
| # Model Overview | |
| This model is a fine-tuned and quantized version of the microsoft/biogpt model, specifically tailored for medical text understanding. It was fine-tuned on the dmedhi/medical-textbooks dataset from Hugging Face and subsequently quantized to FP16 (half-precision) to reduce memory usage and improve inference speed while maintaining accuracy. The model is designed for tasks like keyword extraction from medical texts and generative tasks in the biomedical domain. | |
| # Model Details | |
| ``` | |
| Base Model: microsoft/biogpt | |
| Fine-Tuning Dataset: dmedhi/medical-textbooks (15,970 rows) | |
| Quantization: FP16 (half-precision) using PyTorch's .half() method | |
| Model Type: Causal Language Model | |
| Language: English | |
| ``` | |
| # Intended Use | |
| This model is intended for: | |
| - Keyword Extraction: Extracting relevant lines containing specific keywords (e.g., "anatomy") from medical textbooks, along with metadata like book names. | |
| - Generative Tasks: Generating short explanations or summaries in the biomedical domain (e.g., answering questions like "What is anatomy?"). | |
| - Research and Education: Assisting researchers, students, and educators in exploring medical texts and generating insights. | |
| # Out of Scope | |
| - Real-time clinical decision-making or medical diagnosis (not evaluated for such tasks). | |
| - Non-English text processing (not tested on other languages). | |
| - Tasks requiring high precision in generative output without human oversight. | |
| # Training Details | |
| # Dataset | |
| The model was fine-tuned on the dmedhi/medical-textbooks dataset, which contains excerpts from medical textbooks with two attributes: | |
| **text:** The content of the excerpt. | |
| **book:** The name of the book (e.g., "Gray's Anatomy"). | |
| # Dataset Splits: | |
| - Original split: train (15,970 rows). | |
| - Custom splits: 80% train (12,776 rows), 20% validation (3,194 rows). | |
| # Training Procedure | |
| # Preprocessing: | |
| - Tokenized the text field using the BioGPT tokenizer (microsoft/biogpt). | |
| - Set max_length=512, with truncation and padding. | |
| - Used input_ids as labels for causal language modeling. | |
| # Fine-Tuning: | |
| - Fine-tuned microsoft/biogpt using Hugging Face's Trainer API. | |
| ``` | |
| Training arguments: | |
| Epochs: 1 | |
| Batch size: 4 per device | |
| Learning rate: 2e-5 | |
| Mixed precision: FP16 (fp16=True) | |
| Evaluation strategy: Steps (every 1000 steps) | |
| Training loss decreased from 2.8409 to 2.7006 over 3,194 steps. | |
| Validation loss decreased from 2.7317 to 2.6512. | |
| ``` | |
| # Quantization: | |
| - Converted the fine-tuned model to FP16 using PyTorch's .half() method. | |
| - Saved as ./biogpt_finetuned/final_model_fp16. | |
| - Compute Infrastructure | |
| - Hardware: 12 GB GPU (NVIDIA) | |
| - Environment: Jupyter Notebook on Windows | |
| - Framework: PyTorch, Hugging Face Transformers | |
| - Training Time: Approximately 27 minutes for 1 epoch | |
| # Evaluation | |
| **Metrics** | |
| ``` | |
| Training Loss: Decreased from 2.8409 to 2.7006. | |
| Validation Loss: Decreased from 2.7317 to 2.6512. | |
| Memory Usage: Post-quantization memory usage reported as ~661 MB (FP16), though actual savings may vary due to buffers and non-weight tensors. | |
| ``` | |
| # Qualitative Testing | |
| **Generative Task:** Generated a response to "What is anatomy?" with reasonable output: "What is anatomy? Anatomy is the basis of medicine..." | |
| **Keyword Extraction:** Successfully extracted up to 10 lines containing keywords (e.g., "anatomy") with corresponding book names (e.g., "Gray's Anatomy"). | |
| # Usage | |
| **Installation** | |
| - Ensure you have the required libraries installed: | |
| ``` | |
| pip install transformers torch datasets sacremoses | |
| ``` | |
| # Loading the Model | |
| - Load the quantized FP16 model and tokenizer: | |
| ``` | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| model_path = "path/to/biogpt_finetuned/final_model_fp16" # Update with your HF repo path | |
| model = AutoModelForCausalLM.from_pretrained(model_path) | |
| tokenizer = AutoTokenizer.from_pretrained(model_path) | |
| device = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
| model.to(device) | |
| model.eval() | |
| ``` | |
| # Example 1: Generative Inference | |
| # Generate text with the quantized model: | |
| ``` | |
| input_text = "What is anatomy?" | |
| inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True, max_length=512) | |
| inputs = {k: v.to(device) for k, v in inputs.items()} | |
| with torch.no_grad(): | |
| outputs = model.generate(**inputs, max_length=50) | |
| output_text = tokenizer.decode(outputs[0], skip_special_tokens=True) | |
| print(output_text) | |
| ``` | |
| # Example 2: Keyword Extraction | |
| ``` | |
| from datasets import load_from_disk | |
| original_datasets = load_from_disk('path/to/original_medical_textbooks') | |
| def extract_lines_with_keyword(keyword, dataset_split='train', max_results=10): | |
| dataset = original_datasets[dataset_split] | |
| matching_lines = [] | |
| for entry in dataset: | |
| text = entry['text'] | |
| book = entry['book'] | |
| lines = text.split('\n') | |
| for line in lines: | |
| if keyword.lower() in line.lower(): | |
| matching_lines.append({'text': line.strip(), 'book': book}) | |
| if len(matching_lines) >= max_results: | |
| return matching_lines | |
| return matching_lines | |
| keyword = "anatomy" | |
| matching_lines = extract_lines_with_keyword(keyword) | |
| for i, match in enumerate(matching_lines, 1): | |
| print(f"{i}. Text: {match['text']}") | |
| print(f" Book: {match['book']}\n") | |
| ``` | |
| # Limitations | |
| - Quantization Trade-offs: FP16 quantization may lead to minor accuracy degradation, though not extensively evaluated. | |
| - Dataset Bias: Fine-tuned only on dmedhi/medical-textbooks, which may not cover all medical domains or topics. | |
| - Generative Quality: Generative outputs may require human oversight for correctness. | |
| - Scalability: Keyword extraction relies on string matching, not semantic understanding, limiting its ability to capture nuanced relationships. | |