GrantsLLM
A specialized language model for STEM research grant writing and review
Developed by Evionex | Created by Kedar P. Navsariwala
Model Description
GrantsLLM is a domain-specialized language model fine-tuned on 78 STEM research grant applications to assist researchers in drafting, refining, and reviewing grant proposals. Built on Llama 3.2 1B, this model has been trained to understand the structure, terminology, and writing style of successful research grants across NIH, NSF, and similar funding mechanisms.
- Developed by: Kedar P. Navsariwala, CTO & Co-Founder at Evionex
- Model type: Causal Language Model (Decoder-only Transformer)
- Language(s): English
- License: CC BY 4.0 (requires attribution)
- Finetuned from: unsloth/Qwen3-4B-GGUF
🎯 Use Cases
What GrantsLLM Can Do
- ✅ Generate complete grant proposals (NIH R03/R01/R21, NSF, etc.)
- ✅ Draft specific sections: Specific Aims, Significance, Innovation, Approach, Research Strategy
- ✅ Improve existing text for clarity, structure, and persuasiveness
- ✅ Provide review feedback on grant coherence and alignment
- ✅ Expand bullet points into full narrative sections
- ✅ Adapt tone to academic/scientific writing standards
Intended Users
- Principal Investigators (PIs) and research scientists
- Postdoctoral researchers and graduate students
- University grant support offices
- Biotech and research startups
- Academic research administrators
Out of Scope
- ❌ Automated funding decisions or grant scoring
- ❌ Legal, regulatory, or IRB compliance review
- ❌ Generating fabricated data or citations
- ❌ Non-STEM grants (humanities, arts, social sciences may have reduced quality)
- ❌ Non-English grant applications
🚀 Quick Start
Installation
pip install transformers torch accelerate
Basic Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model and tokenizer
model_name = "your-username/GrantsLLM" # Replace with actual repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# Generate grant text
prompt = """Write a Specific Aims section for an NIH R03 grant on developing novel CRISPR-based gene editing tools for treating sickle cell disease. Include 2-3 specific aims with clear objectives and expected outcomes."""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.7,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(outputs, skip_special_tokens=True))
Using with Pipeline
from transformers import pipeline
generator = pipeline(
"text-generation",
model="your-username/GrantsLLM",
device_map="auto"
)
prompt = "Draft a Research Significance statement for a computational biology grant on protein folding prediction using deep learning."
output = generator(
prompt,
max_new_tokens=400,
temperature=0.7,
top_p=0.9
)
print(output['generated_text'])
Prompt Templates
For Section Generation:
Write a [Section] for a [Funder] [Mechanism] grant on [Topic].
Requirements: [Specific elements needed]
Word limit: [Number] words
For Review/Feedback:
Review the following [Section] and provide feedback on clarity, structure, and alignment with [Funder] guidelines:
[Paste text here]
Examples:
"Write Specific Aims for an NIH R01 grant on cancer immunotherapy""Draft Innovation section for NSF CAREER award on quantum computing""Review this Research Strategy for logical flow and hypothesis clarity"
📊 Training Data
Dataset Composition
- Size: 78 research grant applications
- Domains: Biotechnology, Molecular Biology, Computational Biology, Chemistry, Biomedical Sciences
- Formats: NIH (R01, R03, R21), NSF, and similar federal/institutional grant formats
- Sources: Publicly available grant examples, institutional repositories, and NIH RePORTER
- Language: English
Data Processing
Stage 1: Continued Pretraining (CPT)
- Raw grant text extracted and cleaned from PDFs/documents
- Structured into single-column
textformat (JSONL/Parquet) - Preserves section structure and domain terminology
Stage 2: Supervised Fine-Tuning (SFT)
- Chat-style instruction pairs using ChatML template
- Tasks include: section generation, expansion, refinement, review
- Format:
{"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}
🔧 Training Procedure
Training Hyperparameters
- Base Model: unsloth/Qwen3-4B-GGUF (~4B parameters)
- Training Framework: Unsloth + PyTorch
- Hardware: Google Colab (single GPU, T4/V100)
- Fine-tuning Method: LoRA/QLoRA (Parameter-Efficient Fine-Tuning)
- Training Stages:
- Continued Pretraining on grant corpus
- Supervised Instruction Fine-Tuning on QnA pairs
- Optimizer: AdamW
- Learning Rate: Low rate to prevent catastrophic forgetting
- Training monitored for: Overfitting, repetition, coherence
Training Details
Training Type: Full fine-tuning with LoRA adapters
Epochs: [Adjusted based on validation performance]
Batch Size: Optimized for 1B model on single GPU
Context Length: Inherited from base model (likely 2048-8192 tokens)
Loss Function: Causal Language Modeling (CLM) loss
Validation Strategy: Qualitative evaluation on held-out grant examples
📈 Performance & Evaluation
Evaluation Methodology
Qualitative Assessment:
- Human expert review of generated grant sections
- Evaluation criteria: coherence, structure, domain accuracy, persuasiveness
- Practical testing on mock NIH/NSF grant prompts
Known Strengths
- ✅ Strong grasp of STEM grant structure (Aims, Significance, Innovation, Approach)
- ✅ Effective expansion of bullet points to narrative
- ✅ Appropriate academic/scientific tone
- ✅ Good understanding of NIH/NSF terminology and conventions
- ✅ Maintains logical flow between sections
Known Limitations
- ⚠️ Hallucination Risk: May generate plausible but incorrect citations, grant numbers, or policies
- ⚠️ Format Bias: Optimized for NIH/NSF; other formats (European, private foundations) may be weaker
- ⚠️ Domain Bias: Best for biotech/life sciences; physics/engineering grants may be less polished
- ⚠️ Repetition: Can produce repetitive text if prompt lacks detail or structure
- ⚠️ Context Limits: Long grants may need to be drafted in sections
- ⚠️ Recency: Training data may not reflect latest funder guidelines (post-2025)
⚠️ Bias, Risks, and Limitations
Bias Sources
Domain Bias: Model is optimized for STEM fields represented in training data (biotech, molecular biology, computational biology). Grants in underrepresented fields may receive lower quality outputs.
Institutional Bias: Writing style may reflect patterns from R1 research universities and well-funded institutions present in training examples.
Funding Mechanism Bias: Strongest performance on NIH R-series and NSF standard grants; less reliable for fellowships, training grants, or international formats.
Historical Bias: May reinforce language patterns from historically funded research areas, potentially disadvantaging emerging or interdisciplinary fields.
Risks
Fabrication: Model may generate convincing but false information including:
- Non-existent citations and references
- Incorrect grant mechanism details
- Fabricated preliminary data or results
- Inaccurate funder policies
Over-reliance: Users may trust outputs without verification, risking submission of flawed proposals.
Privacy: Users may inadvertently input confidential research ideas or unpublished data.
Recommendations
- Always verify: Check all factual claims, citations, and funder guidelines
- Human review required: Never submit AI-generated grants without expert review
- Iterative refinement: Use as drafting assistant, not final author
- Protect IP: Don't input confidential or proprietary information
- Disclose usage: Be transparent with collaborators and (when appropriate) funders about AI assistance
- Update manually: Cross-reference current funder guidelines and requirements
🔐 Ethical Considerations
Responsible Use
- Transparency: Disclose AI assistance to co-authors and collaborators
- Human oversight: Keep domain experts in the loop for all submissions
- Academic integrity: Ensure outputs align with your institution's policies on AI use
- Verification: Validate all scientific claims and citations independently
- Privacy: Avoid inputting sensitive, unpublished, or identifiable information
Funder Policies
As of February 2026, grant-writing AI policies vary by funder:
- NIH: Generally permits AI assistance for writing, but PIs remain responsible for all content
- NSF: Similar stance; emphasizes researcher accountability
- Check specific RFAs for any AI-related restrictions or disclosure requirements
When in doubt: Contact your program officer or sponsored research office.
📜 Licensing & Attribution
License: CC BY 4.0
This model is licensed under Creative Commons Attribution 4.0 International.
You Must:
✅ Give appropriate credit to Evionex and Kedar P. Navsariwala
✅ Provide a link to the license
✅ Indicate if changes were made to the model
✅ Retain attribution in any derivative works or applications
Citation
If you use GrantsLLM in your research or projects, please cite:
@software{grantsllm2026,
author = {Navsariwala, Kedar P.},
title = {GrantsLLM: A Fine-Tuned Language Model for STEM Grant Writing},
year = {2026},
publisher = {Hugging Face},
organization = {Evionex},
howpublished = {\url{https://huggingface.co/your-username/GrantsLLM}},
license = {CC-BY-4.0}
}
Attribution Example
Grant drafting assistance provided by GrantsLLM (Navsariwala, 2026),
developed by Evionex. Available at https://huggingface.co/your-username/GrantsLLM
🛠️ Technical Specifications
Model Architecture
- Architecture: Llama 3.2 (Decoder-only Transformer)
- Parameters: ~1 billion
- Layers: [Inherited from base model]
- Hidden Size: [Inherited from base model]
- Attention Heads: [Inherited from base model]
- Vocabulary Size: [Inherited from base model]
- Context Window: [Typically 2048-8192 tokens]
Software Stack
- Training: Unsloth, PyTorch, Hugging Face Transformers
- Fine-tuning: LoRA/QLoRA with PEFT
- Environment: Google Colab (GPU)
- Export Formats:
- Hugging Face Transformers checkpoint
- GGUF
Hardware Requirements
Inference:
- Minimum: 8GB VRAM (GPU) or 16GB RAM (CPU with quantization)
- Recommended: 16GB+ VRAM for optimal speed
- CPU inference: Possible but slower; consider GGUF quantized versions
Formats for Different Hardware:
- Full precision: 16GB+ VRAM
- GGUF Q4_K_M: 4-8GB VRAM or CPU
- GGUF Q8_0: 8-12GB VRAM
📦 Model Variants
| Variant | Size | Use Case | Hardware |
|---|---|---|---|
| Full precision | ~4GB | Maximum quality | 16GB+ VRAM |
| GGUF Q8_0 | ~1.5GB | Balanced quality/speed | 8GB+ VRAM or CPU |
| GGUF Q4_K_M | ~800MB | Fast inference | 4GB+ VRAM or CPU |
🤝 Acknowledgments
Built With
- Base Model: Llama 3.2 1B Instruct by Qwen
- Training Framework: Unsloth for efficient fine-tuning
- ML Libraries: PyTorch, Hugging Face Transformers
- Infrastructure: Google Colab
Special Thanks
- Open-source grant examples from NIH RePORTER and NSF Award Search
- Academic institutions sharing grant templates and examples
- Unsloth team for efficient fine-tuning tools
- Hugging Face for model hosting and inference infrastructure
📞 Contact & Support
Developer: Kedar P. Navsariwala
Organization: Evionex
Website: www.evionex.com
Model Repository: KedarPN/GrantsLLM
Issues & Feedback
- Report bugs or issues in the Discussion tab
- Share use cases and success stories
- Request features or improvements
- Contribute to model evaluation
📌 Disclaimer
GrantsLLM is an assistive tool designed to support the grant writing process. It does not:
- Guarantee grant success or funding approval
- Replace domain expertise or scientific judgment
- Ensure compliance with all funder requirements
- Eliminate the need for human review and verification
Always consult official funder guidelines and domain experts before grant submission.
🔄 Version History
v1.0 (February 2026)
- Initial release
- Trained on 78 STEM grant applications
- Base model: Qwen3-4B-GGUF
- Supports NIH and NSF formats
© 2026 Evionex | Licensed under CC BY 4.0
Made with ❤️ for the research community
This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- -