AventIQ-AI
/

Crop_Recommendation

Safetensors

distilbert

Model card Files Files and versions

xet

Community

KshitizTayal commited on Jun 2, 2025

Commit

464aca0

verified ·

1 Parent(s): 93c4765

Update README.md

Browse files

Files changed (1) hide show

README.md +149 -3

README.md CHANGED Viewed

@@ -1,3 +1,149 @@
----
-license: apache-2.0
----

+# DistilBERT Model for Crop Recommendation Based on Environmental Parameters
+This repository contains a fine-tuned DistilBERT model trained for crop recommendation using structured agricultural data. By converting numerical environmental features into text format, the model leverages transformer-based NLP techniques to classify the most suitable crop type.
+## 🌾 Problem Statement
+The goal is to recommend the best crop to cultivate based on parameters such as soil nutrients and weather conditions. Traditional ML models handle this as a tabular classification problem. Here, we explore the innovative approach of using NLP models (DistilBERT) on serialized tabular data.
+---
+## 📊 Dataset
+- **Source:** Crop Recommendation Dataset
+- **Features:**
+  - N: Nitrogen content in soil
+  - P: Phosphorus content in soil
+  - K: Potassium content in soil
+  - Temperature: in Celsius
+  - Humidity: %
+  - pH: Acidity of soil
+  - Rainfall: mm
+- **Target:** Crop label (22 crop types)
+The dataset is preprocessed by concatenating all numeric features into a single space-separated string, making it suitable for transformer-based tokenization.
+---
+## 🧠 Model Details
+- **Architecture:** DistilBERT
+- **Tokenizer:** `DistilBertTokenizerFast`
+- **Model:** `DistilBertForSequenceClassification`
+- **Task Type:** Multi-Class Classification (22 classes)
+---
+## 🔧 Installation
+```bash
+pip install transformers datasets pandas scikit-learn torch
+```
+---
+## Loading the Model
+```python
+from transformers import DistilBertTokenizerFast, DistilBertForSequenceClassification
+import torch
+# Load model and tokenizer
+model_path = "path/to/your/saved_model"
+tokenizer = DistilBertTokenizerFast.from_pretrained(model_path)
+model = DistilBertForSequenceClassification.from_pretrained(model_path)
+# Sample input
+sample_text = "90 42 43 20.879744 82.002744 6.502985 202.935536"
+inputs = tokenizer(sample_text, return_tensors="pt")
+# Predict
+with torch.no_grad():
+    outputs = model(**inputs)
+predicted_class = torch.argmax(outputs.logits, dim=1).item()
+print("Predicted class index:", predicted_class)
+```
+---
+## 📈 Performance Metrics
+*Note: These are placeholders. Replace with actual results after evaluation.*
+- **Accuracy:** 0.0477
+- **Precision:** 0.0023
+- **Recall:** 0.0477
+- **F1 Score:** 0.0043
+---
+## 🏋️ Fine-Tuning Details
+### 📚 Dataset
+The dataset is sourced from the publicly available **Crop Recommendation Dataset**. It consists of structured features such as:
+- Nitrogen (N)
+- Phosphorus (P)
+- Potassium (K)
+- Temperature (°C)
+- Humidity (%)
+- pH
+- Rainfall (mm)
+All numerical features were converted into a single textual input string to be used with the DistilBERT tokenizer. Labels were factorized into class indices for training.
+The dataset was split using an 80/20 ratio for training and testing.
+---
+### 🔧 Training Configuration
+- **Epochs:** 3
+- **Batch size:** 8
+- **Learning rate:** 2e-5
+- **Evaluation strategy:** `epoch`
+- **Model Base:** DistilBERT (`distilbert-base-uncased`)
+- **Framework:** Hugging Face Transformers + PyTorch
+---
+## 🔄 Quantization
+Post-training quantization was applied using PyTorch’s `half()` precision (FP16).
+This reduces the model size and speeds up inference with minimal impact on performance.
+The quantized model can be loaded with:
+```python
+model = DistilBertForSequenceClassification.from_pretrained("quantized_model_fp16", torch_dtype=torch.float16)
+```
+---
+## Repository Structure
+```python
+.
+├── quantized-model/               # Contains the quantized model files
+│   ├── config.json
+│   ├── model.safetensors
+│   ├── tokenizer_config.json
+│   ├── vocab.txt
+│   └── special_tokens_map.json
+├── README.md                      # Model documentation
+```
+---
+## Limitations
+- The model is trained specifically for binary sentiment classification on movie reviews.
+- FP16 quantization may result in slight numerical instability in edge cases.
+- Performance may degrade when used outside the IMDB domain.
+---
+## Contributing
+Feel free to open issues or submit pull requests to improve the model or documentation.