Phi-3 Agricultural Analyst

A domain-specific language model fine-tuned for agricultural analysis tasks. This model provides structured insights on farm operations, investment opportunities, risk assessment, and regional agricultural profiling.

Project Scope

This release focuses on 8 major US agricultural counties as a proof-of-concept:

  • Fresno, CA (Tree nuts, Grapes)
  • Kern, CA (Almonds, Pistachios)
  • Lancaster, PA (Dairy, Corn)
  • Sioux, IA (Corn, Soybeans)
  • Yakima, WA (Apples, Hops)
  • Imperial, CA (Alfalfa, Lettuce)
  • Monterey, CA (Strawberries, Lettuce)
  • Deaf Smith, TX (Cattle, Wheat)

Upcoming releases will include:

  • Full US county coverage (3,000+ counties)
  • Specialized models for crop yield prediction
  • Harvest timing optimization
  • Soil health analysis
  • Weather impact assessment

Performance Benchmarks

Evaluated against base Phi-3 Mini on agricultural analysis tasks:

Content Coverage

Base Model    |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                    | 50.0%
Fine-Tuned    |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ               | 63.3%  (+26.7%)

Structure Quality

Base Model    |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                  | 56.7%
Fine-Tuned    |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ       | 83.3%  (+47.1%)

Inference Speed

Base Model    |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 101.9s
Fine-Tuned    |β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                | 62.4s  (38.8% faster)

Summary Table

Metric Base Model Fine-Tuned Improvement
Content Coverage 50.0% 63.3% +26.7%
Structure Quality 56.7% 83.3% +47.1%
Avg Inference Time 101.9s 62.4s 38.8% faster

Training Loss Curve

Epoch 1: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ Loss: 1.605 β†’ 0.115
Epoch 2: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                             Loss: 0.115 β†’ 0.020
Epoch 3: β–ˆβ–ˆβ–ˆβ–ˆ                                     Loss: 0.020 β†’ 0.019

The fine-tuned model shows significant improvement in generating domain-relevant content and producing well-structured analytical outputs.

Training Configuration

Parameter Value
Base Model microsoft/Phi-3-mini-4k-instruct
Method LoRA (Low-Rank Adaptation)
LoRA Rank 64
LoRA Alpha 128
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Training Examples 900
Validation Examples 100
Epochs 3
Batch Size 16
Learning Rate 2e-4
Precision BF16
Training Time ~51 minutes
Final Eval Loss 0.0186
Token Accuracy 99.3%

Hardware: NVIDIA DGX Spark (Blackwell GPU, 128GB RAM)

Use Cases

  • Farm investment analysis
  • Regional agricultural profiling
  • Risk factor identification
  • Technology adoption recommendations
  • Irrigation and water sustainability assessment
  • Crop expansion planning

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load model
model_name = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
model = PeftModel.from_pretrained(base_model, "sarathi-balakrishnan/phi3-agricultural-analyst")

# Create prompt
prompt = """You are an expert agricultural analyst. Analyze the following query using the provided data context.

### Query
What are the key investment opportunities in Fresno County agriculture?

### Data Context
County: Fresno, CA
Operators: 5,847
Average Farm Size: 423 acres
Irrigation Coverage: 89%
Revenue per Acre: $2,340

### Analysis"""

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    inputs["input_ids"],
    max_new_tokens=500,
    temperature=0.7,
    do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Output Format

The model generates structured analysis with:

  • Summary of the agricultural region
  • Key insights based on data patterns
  • Investment/growth opportunities
  • Risk factors and constraints
  • Confidence level

Limitations

  • Currently covers 8 US counties (expansion planned)
  • Training data is synthetically generated
  • Should not replace professional agricultural consulting
  • Performance may vary for regions outside training distribution

Technical Notes

  • Uses eager attention for Phi-3 compatibility
  • Optimized for BF16 inference on modern GPUs
  • LoRA adapters can be merged for deployment: model.merge_and_unload()

License

MIT License - Free for commercial and research use.

Contact

For questions about this model or collaboration on agricultural AI:

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sarathi-balakrishnan/phi3-agricultural-analyst

Adapter
(810)
this model

Evaluation results