File size: 6,309 Bytes
0f5c870
c5458d7
0f5c870
 
a05c4d5
c5458d7
 
0f5c870
45c7e32
c4c46fe
1f23580
 
fcd8162
c5458d7
fcd8162
6e076df
3123b9a
 
c5458d7
3123b9a
834f001
3123b9a
ce03eda
3123b9a
e71d52f
3123b9a
 
 
c5458d7
3123b9a
0da7d98
 
 
 
 
 
 
3123b9a
 
c5458d7
3123b9a
c5458d7
 
36e1629
c5458d7
 
 
3123b9a
 
 
 
c5458d7
3123b9a
2c35ac9
a426451
ac47aab
912fc6a
2c35ac9
63128bc
6436391
39d3901
3123b9a
 
 
a781a5b
3123b9a
6419831
3123b9a
df79716
c65d5ec
3123b9a
dc35fb8
df79716
9131592
c65d5ec
df79716
ff97f05
df79716
dc35fb8
c65d5ec
df79716
3123b9a
 
 
dc35fb8
3123b9a
dc35fb8
 
 
 
 
 
 
df79716
9131592
c65d5ec
df79716
523e3d3
dc35fb8
 
3123b9a
dc35fb8
 
c5458d7
 
 
 
 
46e6afb
da73a88
c5458d7
 
9ef9744
 
 
 
 
 
e91e553
 
 
 
 
 
 
 
 
 
 
 
412343c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
license: apache-2.0
tags:
- text-generation-inference
- transformers
language:
- en
base_model:
- mistralai/Mistral-7B-Instruct-v0.3
pipeline_tag: text-generation
datasets:
- KurmaAI/AQUA-Test-Dataset
---

<p align="center">
  <img src="./AQUA-7B.png" alt="AQUA-7B" width="600" style="border-radius: 6px;"/>
</p>

# Model Information

**AQUA-7B** is Kurma AI’s flagship 7-billion parameter large language model built exclusively for the global aquaculture industry. And it is the **first large language model for the aquaculture**. It is fine-tuned to deliver actionable insights for aquaculture species-specific farming, hatchery operations, water quality control, and disease management.

Trained on **over 3 million** real and synthetic aquaculture conversations (~1B tokens), AquaGPT-7B brings the power of domain-specific AI to fish farms, fish hatcheries, researchers, and Aqua-Tech innovators worldwide.

Learn more about [Kurma AI](https://kurma.ai/company).

---

# Key Features

- **Production Systems & Species Management**: Covers ponds, tanks, cages, RAS, aquaponics, mariculture, and longlines. Delivers best practices for raising tilapia, catfish, carp, salmon, shrimp, crabs, oysters, trout, sea bass, and more supporting both smallholder and industrial farms.
- **Genetics, Hatchery, and Early Life Stage Management**: Guides advanced breeding, gene editing, hatchery design, spawning, larval care, nursery systems, live feed, transport, egg incubation, and biosecurity.
- **Nutrition, Feeding, and Growth Optimization**: Provides actionable protocols for water quality (temperature, oxygen, pH, ammonia, nitrite, salinity), and structured disease management: identification, vaccination, biosecurity, antibiotic use, and outbreak response.
- **Water Quality, Health, and Disease Management**: Provides actionable protocols for water quality (temperature, oxygen, pH, ammonia, nitrite, salinity), and structured disease management: identification, vaccination, biosecurity, antibiotic use, and outbreak response.
- **Sustainable Aquaculture & Innovation**: Promotes Promotes eco-friendly practices in waste management, environmental impact, biodiversity, climate adaptation, and guides adoption of new technologies AI, automation, sensors, water drones, and modern farm management.
- **Water Quality, Health, and Disease Management**: Advises on market trends, business planning, regulation, certification, traceability, and insurance. Covers best practices for harvesting, processing, cold chain, grading, packaging, contamination prevention, HACCP, and food safety.

---

# Training Data Highlights

- Extension worker–farmer dialogues and field advisory logs  
- FAO, ICAR, NOAA, and peer-reviewed aquaculture research  
- Synthetic Q&A from 5,000+ aquaculture-focused topics  
- Climate-resilient practices, hatchery SOPs, and water quality datasets  
- Carefully curated to support **species-specific culture** methods
- **Scale:** Trained on approximately **3 million real and synthetic Q&A pairs**, totaling around **1 billion tokens** of high-quality, domain-specific data.


---

# Model Specifications

- **Base Model**: Mistral 7B v0.3 (by [Mistral AI](https://mistral.ai/))
- **Training Tokens**: ~1 Billion
- **Released On** 4, July 2025
- **Data Volume**: 3M+ expert-verified and synthetic instructions   
- **Origin**: Made in America by [Kurma AI](https://kurma.ai/)
- **Training Technic** Model is trained via Fine-tuning using (LoRA-based) Supervised Fine-Tuning (SFT).
- **Training Infrastructure**: Trained using **16 NVIDIA H200 GPU Multi Cluster** 
Special Thanks to [Nebius](https://nebius.com/)

---

# Quickstart

Transformers (Google Colab/ jupyter)


- Install dependencies
```python
!pip install transformers accelerate
```

- Log in with your Hugging Face access token
```python
from huggingface_hub import login
```

- Import model from Huggingface
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "KurmaAI/AQUA-7B"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",             # Automatically uses GPU if available
    torch_dtype=torch.float16,     # Use torch.float32 if no GPU
    trust_remote_code=True
)
```

- Test Prompt
```python
prompt = "What are the most common diseases in shrimp farming and how can they be prevented?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```
---

# 🙏 Acknowledgements
This project was made possible thanks to:
- [Nebius](https://nebius.com/) for providing a compute grant and access to NVIDIA H200 GPU servers, which powered the model training process.
- [Mistral](https://mistral.ai/) for sharing their open-source language models, which made this project possible.
- Kurma AI research team: including aquaculture experts, machine learning engineers, data annotators, and advisors who collaborated to curate, verify, and refine the domain-specific dataset used for fine-tuning this model.

---

# ⚠️ Disclaimer, Bias & Limitations

- **Domain Bias**: The model may reflect inherent biases present in the aquaculture data sources and industry practices on which it was trained.
- **Temporal Data Limitation**: Climate and environmental recommendations are based on information available up to 2024. Users should cross-check any climate-related advice against the latest advisories (e.g., IMD or NOAA updates).
- **Potential Hallucinations**: Like all large language models, Aqua-7B may occasionally generate inaccurate or misleading responses ("hallucinations"). **Always validate critical, regulatory, or high-impact decisions with a qualified aquaculture professional.**

# Citation

```bibtex
@article{narisetty2025aqua,
  title={AQUA: A Large Language Model for Aquaculture \& Fisheries},
  author={Narisetty, Praneeth and Kattamanchi, Uday Kumar Reddy and Nimma, Lohit Akshant and Karnati, Sri Ram Kaushik and Kore, Shiva Nagendra Babu and Golamari, Mounika and Nageshreddy, Tejashree},
  journal={arXiv preprint arXiv:2507.20520},
  year={2025},
  doi={10.48550/arXiv.2507.20520}
}
```