Model Card for CropSeek-LLM

CropSeek-LLM is a domain-specific agricultural language model developed by DARJYO, using the DARJYO/sawotiQ29_crop_optimization dataset, designed to support agricultural reasoning, advisory systems, and applied AI research in crop science and agritech environments (crop planting, soil conditions, pest control, irrigation, and other agricultural practices)

The model is intended for research, educational, and commercial use under an attribution and citation-required licence.

Model Details

Model Description

CropSeek-LLM is trained and adapted for agricultural contexts including:

  • crop advisory reasoning
  • soil and environmental interpretation
  • agricultural decision support
  • farm-level AI assistance systems
  • domain-specific Q&A and inference

The model is designed for deployment in both offline-first environments and cloud-based agritech systems, to assist farmers, agronomists, and researchers in making informed decisions about crop management.

  • Developed by: persadian, DARJYO
  • Model type: Causal Language Model
  • Training Method: Parameter-efficient fine-tuning using LoRA (Low-Rank Adaptation)
  • Language(s) (NLP): English
  • License: DARJYO License v1.3
  • Base model: deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
  • Fine-tuning approach: Domain-specific adaptation for agricultural reasoning and crop intelligence tasks
  • Training Hardware: Nvidia Tesla T4 GPU

Model Lineage

CropSeek-LLM is built by fine-tuning a distilled DeepSeek R1 Qwen-based architecture using LoRA-based adaptation techniques. The resulting model is optimised for agricultural domain inference and applied agritech reasoning tasks.

The base model retains general reasoning capabilities while CropSeek-LLM introduces domain specialisation in soil, crop, and environmental analysis contexts.

Uses

Intended Use

Permitted use includes:

  • research and academic experimentation
  • fine-tuning for downstream agricultural applications
  • integration into decision-support systems
  • commercial deployment with attribution

Direct Use

CropSeek-LLM can be used directly to answer questions related to crop optimization, such as:

  • Optimal planting seasons for specific crops.
  • Ideal soil conditions for crop growth.
  • Natural pest control methods.
  • Best irrigation practices.
  • Crop rotation strategies.

Downstream Use

CropSeek-LLM can be integrated into agricultural advisory systems, mobile apps, or chatbots to provide real-time recommendations to farmers and agronomists.

Out-of-Scope Use

  • Medical Advice: This model is not designed to provide medical or health-related advice.
  • Financial Decisions: The model should not be used for financial or investment decisions.
  • Non-Agricultural Use: The model is specifically fine-tuned for crop optimization and may not perform well in unrelated domains.

Bias, Risks, and Limitations

  • Data Bias: The model is trained on a dataset focused on specific crops and regions. It may not generalize well to all crops or geographical areas.
  • Limited Scope: The model is designed for crop optimization and may not provide accurate answers for unrelated topics.
  • Ethical Concerns: The model should not replace professional advice from agronomists or agricultural experts.

Recommendations

Users should:

  • Verify the model's recommendations with local agricultural experts.
  • Be aware of the model's limitations and use it as a supplementary tool, not a replacement for professional advice.
  • Report any biases or inaccuracies to the developers for improvement.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the fine-tuned model
model = AutoModelForCausalLM.from_pretrained("persadian/CropSeek-LLM", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("persadian/CropSeek-LLM")

# Example inference
input_text = "What is the best planting season for cabbages in South Coast, Durban?"
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Usage

You can interact with CropSeek-LLM using DARJYO's HF Space.

Training Details

Training Data

The model was fine-tuned on a curated dataset of agricultural texts, including:

  • Crop descriptions and classifications.
  • Plant disease symptoms and treatments.
  • Farming techniques and best practices.
  • Regional agricultural guidelines.

Specific dataset used: DARYJO/sawotiQ29_crop_optimization

Training Procedure

Preprocessing

  • The dataset was cleaned and preprocessed to remove irrelevant information and ensure consistency.
  • Text data was tokenized using the tokenizer associated with the base model.
  • Data augmentation techniques, such as synonym replacement and paraphrasing, were applied to improve generalization.

Training Hyperparameters

  • Training regime: Mixed precision (fp16)
  • Batch size: 16
  • Learning rate: 2e-5
  • Epochs: 3
  • Optimizer: AdamW
  • Weight decay: 0.01
  • Warmup steps: 500

Speeds, Sizes, Times

  • Training time: Approximately 10 hours on a T4 GPU.
  • Checkpoint size: 1.5 GB
  • Throughput: 120 samples/second

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on a held-out test set of agricultural queries, including crop identification, disease diagnosis, and farming recommendations.

[https://huggingface.co/datasets/DARJYO/sawotiQ29_crop_optimization]

Factors

Evaluation was disaggregated by:

  • Crop type (cereals, fruits, vegetables).
  • Disease type (fungal, bacterial, viral).
  • Geographic region (tropical, temperate).

Metrics

  • Accuracy: 92% on crop identification tasks.
  • Precision/Recall/F1-score: Precision: 0.89, Recall: 0.91, F1-score: 0.90
  • Latency: Average response time of 0.5 seconds on a T4 GPU.

Results

  • The model achieved high accuracy on crop identification and disease diagnosis tasks.
  • Performance was slightly lower for region-specific recommendations due to limited training data for certain regions.

Summary

CropSeek-LLM performs well on a wide range of agricultural tasks, making it a useful tool for farmers and agricultural professionals. However, performance may vary for rare crops or region-specific practices.

Model Examination

  • The model was examined using interpretability tools such as attention visualization and feature importance analysis.

Key findings include:

  • The model relies heavily on symptom descriptions for disease diagnosis.
  • Crop-specific keywords play a significant role in crop identification tasks.

Environmental Impact

Carbon emissions estimated.

  • Hardware Type: T4 GPU
  • Hours used: 10 hours
  • Cloud Provider: Google Colab
  • Compute Region: us-central1
  • Carbon Emitted: Approximately 0.5 kg CO2eq

Technical Specifications

Model Architecture and Objective

  • Base model architecture: deepseek-ai/deepseek-R1-14B
  • Objective: Fine-tuned for text generation and classification tasks in the agricultural domain.

Compute Infrastructure

Hardware

  • Training hardware: Google Colab with T4 GPU.

Software

  • Frameworks: PyTorch, Hugging Face Transformers.
  • Libraries: Datasets, Tokenizers, Accelerate.

Attribution Requirement

All use of this model must include clear attribution to organization DARJYO and author/developer Darshani Persadh(~persadian).

Required attribution formats:

  • “Built by Darshani Persadh(~persadian) with DARJYO Technology”
  • “Based on persadian/CropSeek-LLM by DARJYO”
  • “Built with persadian/CropSeek-LLM by DARJYO”
  • “Powered by DARJYO Agri Technology”

Attribution must appear in at least one of:

  • documentation
  • model cards
  • research papers
  • product/system descriptions

Citation Requirement (Mandatory for Research Use)

If this model is used in:

  • academic research
  • benchmarking
  • publications
  • technical reporting
  • evaluations

you must cite:

Citation

@misc{persadian/cropseek-llm,
 author = {Persadh, Darshani .R, DARJYO},
 title = {CropSeek-LLM: Agricultural Domain Language Model},
 year = {2025},
 url = { https://huggingface.co/persadian/CropSeek-LLM },
 doi = { 10.57967/hf/5849 },
 publisher = { Hugging Face }
}

APA: persadian. Darshani Persadh (2025). CropSeek-LLM: Agricultural Domain Language Model. Hugging Face. https://huggingface.co/persadian/CropSeek-LLM

Glossary

  • Mixed precision: Training using both 16-bit and 32-bit floating-point numbers to improve efficiency.

Limitations

This model may produce outputs that require validation in real-world agricultural contexts. It should not be used as a sole decision-maker for high-risk agricultural, financial, or regulatory decisions.

More Information

This model is a fine-tuned derivative and should not be interpreted as an independently trained foundation model. For more details, visit the CropSeek-LLM space on Hugging Face.

License

This model is released under the DARJYO License v1.3.

See the full license here: https://huggingface.co/persadian/CropSeek-LLM/raw/main/LICENSE

Model Card Authors

  • persadian ~Darshani Persadh

Model Card Contact

Downloads last month
39
Video Preview
loading

Model tree for persadian/CropSeek-LLM

Finetuned
(321)
this model
Finetunes
2 models

Dataset used to train persadian/CropSeek-LLM