Text Generation
PEFT
Safetensors
GGUF
English
clinicalthought-ai-8b
ClinicalThought-AI-8B
medical-ai
healthcare-ai
clinical-reasoning
chain-of-thought
diagnostic-support
differential-diagnosis
clinical-decision-making
medical-education
reasoning-model
8b
clinical-ai
medical-diagnosis
healthcare-llm
quantized
fine-tuned
lora
medical-nlp
clinical-support
healthcare-professional
evidence-based-medicine
conversational
| ClinicalThought-AI-8B Training Documentation | |
| =============================================== | |
| Model Training Details | |
| --------------------- | |
| Base Model: Granite 3.3 8B Instruct | |
| Fine-tuning Method: LoRA (Low-Rank Adaptation) | |
| Training Infrastructure: Single NVIDIA RTX 6000 Ada Generation GPU | |
| Training Duration: Approximately 75.8 hours | |
| Training Dataset: Custom curated dataset for medical reasoning | |
| Dataset Specifications | |
| --------------------- | |
| Total Token Count: 31,929,580 | |
| Total Sample Count: 29,500 | |
| Average Tokens/Sample: 1082.36 | |
| Max Token Count: 9,803 | |
| Min Token Count: 237 | |
| Tokens Counted Using: tiktoken (cl100k_base encoding) | |
| Dataset Creation: Created from a combination of public medical reasoning datasets from OpenAI o1 and DeepSeek-R1, along with additional reasoning chains created using Claude Sonnet 4 extended thinking | |
| Training Configuration | |
| --------------------- | |
| LoRA Parameters: | |
| - Rank: 32 | |
| - Alpha: 64 | |
| - Dropout: 0.1 | |
| - Target Modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, lm_head | |
| Training Hyperparameters: | |
| - Learning Rate: 2e-5 | |
| - Batch Size: 1 | |
| - Gradient Accumulation: 8 | |
| - Effective Batch Size: 8 | |
| - Max Sequence Length: 12,000 | |
| - Epochs: 8 | |
| - Warmup Ratio: 0.05 | |
| - Weight Decay: 0.005 | |
| - Max Grad Norm: 1.0 | |
| - LR Scheduler: Cosine with Restarts | |
| Hardware & Environment | |
| --------------------- | |
| GPU: NVIDIA RTX 6000 Ada Generation (48GB) | |
| Operating System: Ubuntu | |
| CUDA Version: 11.8 | |
| PyTorch Version: 2.7.0 | |
| Compute Capability: 8.9 | |
| Optimization: FP16, Gradient Checkpointing | |
| Training Performance | |
| --------------------- | |
| Training Runtime: 75.8 hours (272,919 seconds) | |
| Train Samples/Second: 0.865 | |
| Train Steps/Second: 0.108 | |
| Training Loss (Final): 0.738 | |
| Total Training Steps: 29,504 |