GrantsLLM / README.md

KedarPN

Update README.md

401a177 verified 3 days ago

preview code

raw

history blame contribute delete

14.8 kB

metadata

license: cc-by-4.0
language:
  - en
library_name: transformers
tags:
  - grant-writing
  - research
  - STEM
  - biotech
  - fine-tuned
  - Qwen
  - text-generation
  - academic-writing
  - proposal-writing
base_model:
  - unsloth/Qwen3-4B-GGUF
datasets:
  - custom
pipeline_tag: text-generation
widget:
  - text: >-
      Write a Specific Aims section for an NIH R03 grant on developing
      CRISPR-based therapeutics for rare genetic disorders. Include 2 aims.
    example_title: Generate Specific Aims
  - text: >-
      Draft a Significance and Innovation section for an NSF grant on machine
      learning applications in protein structure prediction.
    example_title: Generate Significance
  - text: >-
      Review the following grant aims and provide feedback: Aim 1: Develop a
      novel CRISPR delivery system. Aim 2: Test efficacy in animal models.
    example_title: Review Grant Section
model-index:
  - name: GrantsLLM
    results: []

GrantsLLM

A specialized language model for STEM research grant writing and review

Developed by Evionex | Created by Kedar P. Navsariwala

Model Description

GrantsLLM is a domain-specialized language model fine-tuned on 78 STEM research grant applications to assist researchers in drafting, refining, and reviewing grant proposals. Built on Llama 3.2 1B, this model has been trained to understand the structure, terminology, and writing style of successful research grants across NIH, NSF, and similar funding mechanisms.

Developed by: Kedar P. Navsariwala, CTO & Co-Founder at Evionex
Model type: Causal Language Model (Decoder-only Transformer)
Language(s): English
License: CC BY 4.0 (requires attribution)
Finetuned from: unsloth/Qwen3-4B-GGUF

🎯 Use Cases

What GrantsLLM Can Do

✅ Generate complete grant proposals (NIH R03/R01/R21, NSF, etc.)
✅ Draft specific sections: Specific Aims, Significance, Innovation, Approach, Research Strategy
✅ Improve existing text for clarity, structure, and persuasiveness
✅ Provide review feedback on grant coherence and alignment
✅ Expand bullet points into full narrative sections
✅ Adapt tone to academic/scientific writing standards

Intended Users

Principal Investigators (PIs) and research scientists
Postdoctoral researchers and graduate students
University grant support offices
Biotech and research startups
Academic research administrators

Out of Scope

❌ Automated funding decisions or grant scoring
❌ Legal, regulatory, or IRB compliance review
❌ Generating fabricated data or citations
❌ Non-STEM grants (humanities, arts, social sciences may have reduced quality)
❌ Non-English grant applications

🚀 Quick Start

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "your-username/GrantsLLM"  # Replace with actual repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Generate grant text
prompt = """Write a Specific Aims section for an NIH R03 grant on developing novel CRISPR-based gene editing tools for treating sickle cell disease. Include 2-3 specific aims with clear objectives and expected outcomes."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs, skip_special_tokens=True))

Using with Pipeline

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="your-username/GrantsLLM",
    device_map="auto"
)

prompt = "Draft a Research Significance statement for a computational biology grant on protein folding prediction using deep learning."

output = generator(
    prompt,
    max_new_tokens=400,
    temperature=0.7,
    top_p=0.9
)

print(output['generated_text'])

Prompt Templates

For Section Generation:

Write a [Section] for a [Funder] [Mechanism] grant on [Topic].
Requirements: [Specific elements needed]
Word limit: [Number] words

For Review/Feedback:

Review the following [Section] and provide feedback on clarity, structure, and alignment with [Funder] guidelines:

[Paste text here]

Examples:

"Write Specific Aims for an NIH R01 grant on cancer immunotherapy"
"Draft Innovation section for NSF CAREER award on quantum computing"
"Review this Research Strategy for logical flow and hypothesis clarity"

📊 Training Data

Dataset Composition

Size: 78 research grant applications
Domains: Biotechnology, Molecular Biology, Computational Biology, Chemistry, Biomedical Sciences
Formats: NIH (R01, R03, R21), NSF, and similar federal/institutional grant formats
Sources: Publicly available grant examples, institutional repositories, and NIH RePORTER
Language: English

Data Processing

Stage 1: Continued Pretraining (CPT)

Raw grant text extracted and cleaned from PDFs/documents
Structured into single-column text format (JSONL/Parquet)
Preserves section structure and domain terminology

Stage 2: Supervised Fine-Tuning (SFT)

Chat-style instruction pairs using ChatML template
Tasks include: section generation, expansion, refinement, review
Format: {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

🔧 Training Procedure

Training Hyperparameters

Base Model: unsloth/Qwen3-4B-GGUF (~4B parameters)
Training Framework: Unsloth + PyTorch
Hardware: Google Colab (single GPU, T4/V100)
Fine-tuning Method: LoRA/QLoRA (Parameter-Efficient Fine-Tuning)
Training Stages:
1. Continued Pretraining on grant corpus
2. Supervised Instruction Fine-Tuning on QnA pairs
Optimizer: AdamW
Learning Rate: Low rate to prevent catastrophic forgetting
Training monitored for: Overfitting, repetition, coherence

Training Details

Training Type: Full fine-tuning with LoRA adapters
Epochs: [Adjusted based on validation performance]
Batch Size: Optimized for 1B model on single GPU
Context Length: Inherited from base model (likely 2048-8192 tokens)
Loss Function: Causal Language Modeling (CLM) loss
Validation Strategy: Qualitative evaluation on held-out grant examples

📈 Performance & Evaluation

Evaluation Methodology

Qualitative Assessment:

Human expert review of generated grant sections
Evaluation criteria: coherence, structure, domain accuracy, persuasiveness
Practical testing on mock NIH/NSF grant prompts

Known Strengths

✅ Strong grasp of STEM grant structure (Aims, Significance, Innovation, Approach)
✅ Effective expansion of bullet points to narrative
✅ Appropriate academic/scientific tone
✅ Good understanding of NIH/NSF terminology and conventions
✅ Maintains logical flow between sections

Known Limitations

⚠️ Hallucination Risk: May generate plausible but incorrect citations, grant numbers, or policies
⚠️ Format Bias: Optimized for NIH/NSF; other formats (European, private foundations) may be weaker
⚠️ Domain Bias: Best for biotech/life sciences; physics/engineering grants may be less polished
⚠️ Repetition: Can produce repetitive text if prompt lacks detail or structure
⚠️ Context Limits: Long grants may need to be drafted in sections
⚠️ Recency: Training data may not reflect latest funder guidelines (post-2025)

⚠️ Bias, Risks, and Limitations

Bias Sources

Domain Bias: Model is optimized for STEM fields represented in training data (biotech, molecular biology, computational biology). Grants in underrepresented fields may receive lower quality outputs.

Institutional Bias: Writing style may reflect patterns from R1 research universities and well-funded institutions present in training examples.

Funding Mechanism Bias: Strongest performance on NIH R-series and NSF standard grants; less reliable for fellowships, training grants, or international formats.

Historical Bias: May reinforce language patterns from historically funded research areas, potentially disadvantaging emerging or interdisciplinary fields.

Risks

Fabrication: Model may generate convincing but false information including:

Non-existent citations and references
Incorrect grant mechanism details
Fabricated preliminary data or results
Inaccurate funder policies

Over-reliance: Users may trust outputs without verification, risking submission of flawed proposals.

Privacy: Users may inadvertently input confidential research ideas or unpublished data.

Recommendations

Always verify: Check all factual claims, citations, and funder guidelines
Human review required: Never submit AI-generated grants without expert review
Iterative refinement: Use as drafting assistant, not final author
Protect IP: Don't input confidential or proprietary information
Disclose usage: Be transparent with collaborators and (when appropriate) funders about AI assistance
Update manually: Cross-reference current funder guidelines and requirements

🔐 Ethical Considerations

Responsible Use

Transparency: Disclose AI assistance to co-authors and collaborators
Human oversight: Keep domain experts in the loop for all submissions
Academic integrity: Ensure outputs align with your institution's policies on AI use
Verification: Validate all scientific claims and citations independently
Privacy: Avoid inputting sensitive, unpublished, or identifiable information

Funder Policies

As of February 2026, grant-writing AI policies vary by funder:

NIH: Generally permits AI assistance for writing, but PIs remain responsible for all content
NSF: Similar stance; emphasizes researcher accountability
Check specific RFAs for any AI-related restrictions or disclosure requirements

When in doubt: Contact your program officer or sponsored research office.

📜 Licensing & Attribution

License: CC BY 4.0

This model is licensed under Creative Commons Attribution 4.0 International.

You Must:

✅ Give appropriate credit to Evionex and Kedar P. Navsariwala
✅ Provide a link to the license
✅ Indicate if changes were made to the model
✅ Retain attribution in any derivative works or applications

Citation

If you use GrantsLLM in your research or projects, please cite:

@software{grantsllm2026,
  author = {Navsariwala, Kedar P.},
  title = {GrantsLLM: A Fine-Tuned Language Model for STEM Grant Writing},
  year = {2026},
  publisher = {Hugging Face},
  organization = {Evionex},
  howpublished = {\url{https://huggingface.co/your-username/GrantsLLM}},
  license = {CC-BY-4.0}
}

Attribution Example

Grant drafting assistance provided by GrantsLLM (Navsariwala, 2026),
developed by Evionex. Available at https://huggingface.co/your-username/GrantsLLM

🛠️ Technical Specifications

Model Architecture

Architecture: Llama 3.2 (Decoder-only Transformer)
Parameters: ~1 billion
Layers: [Inherited from base model]
Hidden Size: [Inherited from base model]
Attention Heads: [Inherited from base model]
Vocabulary Size: [Inherited from base model]
Context Window: [Typically 2048-8192 tokens]

Software Stack

Training: Unsloth, PyTorch, Hugging Face Transformers
Fine-tuning: LoRA/QLoRA with PEFT
Environment: Google Colab (GPU)
Export Formats:
- Hugging Face Transformers checkpoint
- GGUF

Hardware Requirements

Inference:

Minimum: 8GB VRAM (GPU) or 16GB RAM (CPU with quantization)
Recommended: 16GB+ VRAM for optimal speed
CPU inference: Possible but slower; consider GGUF quantized versions

Formats for Different Hardware:

Full precision: 16GB+ VRAM
GGUF Q4_K_M: 4-8GB VRAM or CPU
GGUF Q8_0: 8-12GB VRAM

📦 Model Variants

Variant	Size	Use Case	Hardware
Full precision	~4GB	Maximum quality	16GB+ VRAM
GGUF Q8_0	~1.5GB	Balanced quality/speed	8GB+ VRAM or CPU
GGUF Q4_K_M	~800MB	Fast inference	4GB+ VRAM or CPU

🤝 Acknowledgments

Built With

Base Model: Llama 3.2 1B Instruct by Qwen
Training Framework: Unsloth for efficient fine-tuning
ML Libraries: PyTorch, Hugging Face Transformers
Infrastructure: Google Colab

Special Thanks

Open-source grant examples from NIH RePORTER and NSF Award Search
Academic institutions sharing grant templates and examples
Unsloth team for efficient fine-tuning tools
Hugging Face for model hosting and inference infrastructure

📞 Contact & Support

Developer: Kedar P. Navsariwala
Organization: Evionex
Website: www.evionex.com
Model Repository: KedarPN/GrantsLLM

Issues & Feedback

Report bugs or issues in the Discussion tab
Share use cases and success stories
Request features or improvements
Contribute to model evaluation

📌 Disclaimer

GrantsLLM is an assistive tool designed to support the grant writing process. It does not:

Guarantee grant success or funding approval
Replace domain expertise or scientific judgment
Ensure compliance with all funder requirements
Eliminate the need for human review and verification

Always consult official funder guidelines and domain experts before grant submission.

🔄 Version History

v1.0 (February 2026)

Initial release
Trained on 78 STEM grant applications
Base model: Qwen3-4B-GGUF
Supports NIH and NSF formats

Made with ❤️ for the research community

```

This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.