GrantsLLM / README.md
KedarPN's picture
Update README.md
401a177 verified
metadata
license: cc-by-4.0
language:
  - en
library_name: transformers
tags:
  - grant-writing
  - research
  - STEM
  - biotech
  - fine-tuned
  - Qwen
  - text-generation
  - academic-writing
  - proposal-writing
base_model:
  - unsloth/Qwen3-4B-GGUF
datasets:
  - custom
pipeline_tag: text-generation
widget:
  - text: >-
      Write a Specific Aims section for an NIH R03 grant on developing
      CRISPR-based therapeutics for rare genetic disorders. Include 2 aims.
    example_title: Generate Specific Aims
  - text: >-
      Draft a Significance and Innovation section for an NSF grant on machine
      learning applications in protein structure prediction.
    example_title: Generate Significance
  - text: >-
      Review the following grant aims and provide feedback: Aim 1: Develop a
      novel CRISPR delivery system. Aim 2: Test efficacy in animal models.
    example_title: Review Grant Section
model-index:
  - name: GrantsLLM
    results: []

GrantsLLM

License: CC BY 4.0 Base Model

A specialized language model for STEM research grant writing and review

Developed by Evionex | Created by Kedar P. Navsariwala


Model Description

GrantsLLM is a domain-specialized language model fine-tuned on 78 STEM research grant applications to assist researchers in drafting, refining, and reviewing grant proposals. Built on Llama 3.2 1B, this model has been trained to understand the structure, terminology, and writing style of successful research grants across NIH, NSF, and similar funding mechanisms.

  • Developed by: Kedar P. Navsariwala, CTO & Co-Founder at Evionex
  • Model type: Causal Language Model (Decoder-only Transformer)
  • Language(s): English
  • License: CC BY 4.0 (requires attribution)
  • Finetuned from: unsloth/Qwen3-4B-GGUF

🎯 Use Cases

What GrantsLLM Can Do

  • βœ… Generate complete grant proposals (NIH R03/R01/R21, NSF, etc.)
  • βœ… Draft specific sections: Specific Aims, Significance, Innovation, Approach, Research Strategy
  • βœ… Improve existing text for clarity, structure, and persuasiveness
  • βœ… Provide review feedback on grant coherence and alignment
  • βœ… Expand bullet points into full narrative sections
  • βœ… Adapt tone to academic/scientific writing standards

Intended Users

  • Principal Investigators (PIs) and research scientists
  • Postdoctoral researchers and graduate students
  • University grant support offices
  • Biotech and research startups
  • Academic research administrators

Out of Scope

  • ❌ Automated funding decisions or grant scoring
  • ❌ Legal, regulatory, or IRB compliance review
  • ❌ Generating fabricated data or citations
  • ❌ Non-STEM grants (humanities, arts, social sciences may have reduced quality)
  • ❌ Non-English grant applications

πŸš€ Quick Start

Installation

pip install transformers torch accelerate

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "your-username/GrantsLLM"  # Replace with actual repo
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

# Generate grant text
prompt = """Write a Specific Aims section for an NIH R03 grant on developing novel CRISPR-based gene editing tools for treating sickle cell disease. Include 2-3 specific aims with clear objectives and expected outcomes."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs, skip_special_tokens=True))

Using with Pipeline

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="your-username/GrantsLLM",
    device_map="auto"
)

prompt = "Draft a Research Significance statement for a computational biology grant on protein folding prediction using deep learning."

output = generator(
    prompt,
    max_new_tokens=400,
    temperature=0.7,
    top_p=0.9
)

print(output['generated_text'])

Prompt Templates

For Section Generation:

Write a [Section] for a [Funder] [Mechanism] grant on [Topic].
Requirements: [Specific elements needed]
Word limit: [Number] words

For Review/Feedback:

Review the following [Section] and provide feedback on clarity, structure, and alignment with [Funder] guidelines:

[Paste text here]

Examples:

  • "Write Specific Aims for an NIH R01 grant on cancer immunotherapy"
  • "Draft Innovation section for NSF CAREER award on quantum computing"
  • "Review this Research Strategy for logical flow and hypothesis clarity"

πŸ“Š Training Data

Dataset Composition

  • Size: 78 research grant applications
  • Domains: Biotechnology, Molecular Biology, Computational Biology, Chemistry, Biomedical Sciences
  • Formats: NIH (R01, R03, R21), NSF, and similar federal/institutional grant formats
  • Sources: Publicly available grant examples, institutional repositories, and NIH RePORTER
  • Language: English

Data Processing

Stage 1: Continued Pretraining (CPT)

  • Raw grant text extracted and cleaned from PDFs/documents
  • Structured into single-column text format (JSONL/Parquet)
  • Preserves section structure and domain terminology

Stage 2: Supervised Fine-Tuning (SFT)

  • Chat-style instruction pairs using ChatML template
  • Tasks include: section generation, expansion, refinement, review
  • Format: {"messages": [{"role": "user", "content": "..."}, {"role": "assistant", "content": "..."}]}

πŸ”§ Training Procedure

Training Hyperparameters

  • Base Model: unsloth/Qwen3-4B-GGUF (~4B parameters)
  • Training Framework: Unsloth + PyTorch
  • Hardware: Google Colab (single GPU, T4/V100)
  • Fine-tuning Method: LoRA/QLoRA (Parameter-Efficient Fine-Tuning)
  • Training Stages:
    1. Continued Pretraining on grant corpus
    2. Supervised Instruction Fine-Tuning on QnA pairs
  • Optimizer: AdamW
  • Learning Rate: Low rate to prevent catastrophic forgetting
  • Training monitored for: Overfitting, repetition, coherence

Training Details

Training Type: Full fine-tuning with LoRA adapters
Epochs: [Adjusted based on validation performance]
Batch Size: Optimized for 1B model on single GPU
Context Length: Inherited from base model (likely 2048-8192 tokens)
Loss Function: Causal Language Modeling (CLM) loss
Validation Strategy: Qualitative evaluation on held-out grant examples

πŸ“ˆ Performance & Evaluation

Evaluation Methodology

Qualitative Assessment:

  • Human expert review of generated grant sections
  • Evaluation criteria: coherence, structure, domain accuracy, persuasiveness
  • Practical testing on mock NIH/NSF grant prompts

Known Strengths

  • βœ… Strong grasp of STEM grant structure (Aims, Significance, Innovation, Approach)
  • βœ… Effective expansion of bullet points to narrative
  • βœ… Appropriate academic/scientific tone
  • βœ… Good understanding of NIH/NSF terminology and conventions
  • βœ… Maintains logical flow between sections

Known Limitations

  • ⚠️ Hallucination Risk: May generate plausible but incorrect citations, grant numbers, or policies
  • ⚠️ Format Bias: Optimized for NIH/NSF; other formats (European, private foundations) may be weaker
  • ⚠️ Domain Bias: Best for biotech/life sciences; physics/engineering grants may be less polished
  • ⚠️ Repetition: Can produce repetitive text if prompt lacks detail or structure
  • ⚠️ Context Limits: Long grants may need to be drafted in sections
  • ⚠️ Recency: Training data may not reflect latest funder guidelines (post-2025)

⚠️ Bias, Risks, and Limitations

Bias Sources

Domain Bias: Model is optimized for STEM fields represented in training data (biotech, molecular biology, computational biology). Grants in underrepresented fields may receive lower quality outputs.

Institutional Bias: Writing style may reflect patterns from R1 research universities and well-funded institutions present in training examples.

Funding Mechanism Bias: Strongest performance on NIH R-series and NSF standard grants; less reliable for fellowships, training grants, or international formats.

Historical Bias: May reinforce language patterns from historically funded research areas, potentially disadvantaging emerging or interdisciplinary fields.

Risks

Fabrication: Model may generate convincing but false information including:

  • Non-existent citations and references
  • Incorrect grant mechanism details
  • Fabricated preliminary data or results
  • Inaccurate funder policies

Over-reliance: Users may trust outputs without verification, risking submission of flawed proposals.

Privacy: Users may inadvertently input confidential research ideas or unpublished data.

Recommendations

  1. Always verify: Check all factual claims, citations, and funder guidelines
  2. Human review required: Never submit AI-generated grants without expert review
  3. Iterative refinement: Use as drafting assistant, not final author
  4. Protect IP: Don't input confidential or proprietary information
  5. Disclose usage: Be transparent with collaborators and (when appropriate) funders about AI assistance
  6. Update manually: Cross-reference current funder guidelines and requirements

πŸ” Ethical Considerations

Responsible Use

  • Transparency: Disclose AI assistance to co-authors and collaborators
  • Human oversight: Keep domain experts in the loop for all submissions
  • Academic integrity: Ensure outputs align with your institution's policies on AI use
  • Verification: Validate all scientific claims and citations independently
  • Privacy: Avoid inputting sensitive, unpublished, or identifiable information

Funder Policies

As of February 2026, grant-writing AI policies vary by funder:

  • NIH: Generally permits AI assistance for writing, but PIs remain responsible for all content
  • NSF: Similar stance; emphasizes researcher accountability
  • Check specific RFAs for any AI-related restrictions or disclosure requirements

When in doubt: Contact your program officer or sponsored research office.


πŸ“œ Licensing & Attribution

License: CC BY 4.0

This model is licensed under Creative Commons Attribution 4.0 International.

You Must:

βœ… Give appropriate credit to Evionex and Kedar P. Navsariwala
βœ… Provide a link to the license
βœ… Indicate if changes were made to the model
βœ… Retain attribution in any derivative works or applications

Citation

If you use GrantsLLM in your research or projects, please cite:

@software{grantsllm2026,
  author = {Navsariwala, Kedar P.},
  title = {GrantsLLM: A Fine-Tuned Language Model for STEM Grant Writing},
  year = {2026},
  publisher = {Hugging Face},
  organization = {Evionex},
  howpublished = {\url{https://huggingface.co/your-username/GrantsLLM}},
  license = {CC-BY-4.0}
}

Attribution Example

Grant drafting assistance provided by GrantsLLM (Navsariwala, 2026),
developed by Evionex. Available at https://huggingface.co/your-username/GrantsLLM

πŸ› οΈ Technical Specifications

Model Architecture

  • Architecture: Llama 3.2 (Decoder-only Transformer)
  • Parameters: ~1 billion
  • Layers: [Inherited from base model]
  • Hidden Size: [Inherited from base model]
  • Attention Heads: [Inherited from base model]
  • Vocabulary Size: [Inherited from base model]
  • Context Window: [Typically 2048-8192 tokens]

Software Stack

  • Training: Unsloth, PyTorch, Hugging Face Transformers
  • Fine-tuning: LoRA/QLoRA with PEFT
  • Environment: Google Colab (GPU)
  • Export Formats:
    • Hugging Face Transformers checkpoint
    • GGUF

Hardware Requirements

Inference:

  • Minimum: 8GB VRAM (GPU) or 16GB RAM (CPU with quantization)
  • Recommended: 16GB+ VRAM for optimal speed
  • CPU inference: Possible but slower; consider GGUF quantized versions

Formats for Different Hardware:

  • Full precision: 16GB+ VRAM
  • GGUF Q4_K_M: 4-8GB VRAM or CPU
  • GGUF Q8_0: 8-12GB VRAM

πŸ“¦ Model Variants

Variant Size Use Case Hardware
Full precision ~4GB Maximum quality 16GB+ VRAM
GGUF Q8_0 ~1.5GB Balanced quality/speed 8GB+ VRAM or CPU
GGUF Q4_K_M ~800MB Fast inference 4GB+ VRAM or CPU

🀝 Acknowledgments

Built With

  • Base Model: Llama 3.2 1B Instruct by Qwen
  • Training Framework: Unsloth for efficient fine-tuning
  • ML Libraries: PyTorch, Hugging Face Transformers
  • Infrastructure: Google Colab

Special Thanks

  • Open-source grant examples from NIH RePORTER and NSF Award Search
  • Academic institutions sharing grant templates and examples
  • Unsloth team for efficient fine-tuning tools
  • Hugging Face for model hosting and inference infrastructure

πŸ“ž Contact & Support

Developer: Kedar P. Navsariwala
Organization: Evionex
Website: www.evionex.com
Model Repository: KedarPN/GrantsLLM

Issues & Feedback

  • Report bugs or issues in the Discussion tab
  • Share use cases and success stories
  • Request features or improvements
  • Contribute to model evaluation

πŸ“Œ Disclaimer

GrantsLLM is an assistive tool designed to support the grant writing process. It does not:

  • Guarantee grant success or funding approval
  • Replace domain expertise or scientific judgment
  • Ensure compliance with all funder requirements
  • Eliminate the need for human review and verification

Always consult official funder guidelines and domain experts before grant submission.


πŸ”„ Version History

v1.0 (February 2026)

  • Initial release
  • Trained on 78 STEM grant applications
  • Base model: Qwen3-4B-GGUF
  • Supports NIH and NSF formats

Β© 2026 Evionex | Licensed under CC BY 4.0

Made with ❀️ for the research community

```

This qwen3 model was trained 2x faster with Unsloth and Huggingface's TRL library.