File size: 3,979 Bytes
1f9707a 1b3d38f 7810a0b 1b3d38f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
---
library_name: agri-awwer
tags:
- asr
- evaluation
- agriculture
- metrics
arxiv: "2602.03868"
language:
- hi
- te
- or
---
# Agri AWWER — Agriculture-Weighted Word Error Rate Evaluation Toolkit
A lightweight Python toolkit for evaluating Automatic Speech Recognition (ASR) systems in agricultural domains. Provides the **Agriculture-Weighted Word Error Rate (AWWER)** metric alongside standard metrics (WER, CER, MER).
AWWER penalises errors on domain-critical agricultural terms more heavily than errors on general vocabulary, giving a more realistic picture of how well an ASR system serves agricultural applications.
## Installation
```bash
# From HuggingFace (recommended)
pip install git+https://huggingface.co/DigiGreen/Agri_AWWER_Toolkit
# For improved WER/CER/MER via jiwer
pip install "agri-awwer[jiwer]"
```
**Zero required dependencies** — the toolkit works out of the box with only the Python standard library. `jiwer` is optional and used automatically when available for standard metrics.
## Quick Start
### AWWER — Domain-Weighted Error Rate
```python
from agri_awwer import calculate_awwer
# Define domain word weights (1-4 scale)
weights = {
"gehun": 4, # wheat — core agriculture term
"keet": 4, # pest
"mitti": 3, # soil
"barish": 3, # rain
"gaon": 1, # village — general vocabulary
}
reference = "gehun mein keet laga hai"
hypothesis = "gaon mein keet laga hai"
awwer = calculate_awwer(reference, hypothesis, weights)
print(f"AWWER: {awwer:.3f}")
# gehun→gaon is a weight-4 error, so AWWER > standard WER
```
### Standard Metrics
```python
from agri_awwer import calculate_wer, calculate_cer, calculate_mer
ref = "gehun mein keet laga hai"
hyp = "gaon mein keet laga hai"
print(f"WER: {calculate_wer(ref, hyp):.3f}")
print(f"CER: {calculate_cer(ref, hyp):.3f}")
print(f"MER: {calculate_mer(ref, hyp):.3f}")
```
### Detailed AWWER Breakdown
```python
from agri_awwer import calculate_awwer_components
result = calculate_awwer_components(reference, hypothesis, weights)
print(f"AWWER: {result['awwer']:.3f}")
print(f"Substitutions: {result['n_substitutions']}")
print(f"Deletions: {result['n_deletions']}")
print(f"Insertions: {result['n_insertions']}")
print(f"High-weight errors: {result['high_weight_errors']}")
```
### Parse Weights from JSON
```python
import json
from agri_awwer import calculate_awwer_from_string
weights_json = json.dumps([["gehun", 4], ["keet", 4], ["mitti", 3]])
awwer = calculate_awwer_from_string(ref, hyp, weights_json)
```
## Weight Scale
| Weight | Category | Examples |
|--------|----------|----------|
| **4** | Core agriculture terms | Crop names, pests, farming practices |
| **3** | Strongly agriculture-related | Soil types, weather, planting seasons |
| **2** | Indirectly related | Quantities, measurement units, locations |
| **1** | General vocabulary | Default for words not in the lexicon |
## Language Support
Built-in text normalization for:
- **Hindi** (default) — chandrabindu, visarga, nukta removal
- **Telugu** — candrabindu, visarga removal
- **Odia** — candrabindu, visarga, nukta, isshar removal
Pass the `language` parameter to any metric function:
```python
calculate_awwer(ref, hyp, weights, language="telugu")
calculate_wer(ref, hyp, language="odia")
```
## Related Resources
- **Paper**: [Benchmarking Automatic Speech Recognition for Indian Languages in Agricultural Contexts](https://arxiv.org/abs/2602.03868)
- **Dataset**: [Agri STT Benchmarking Dataset](https://huggingface.co/datasets/DigiGreen/Agri_STT_Benchmarking_Dataset) — 10,864 audio-transcript pairs across Hindi, Telugu, and Odia
## Citation
```bibtex
@misc{digigreen2025awwer,
title = {Agri {AWWER}: Agriculture-Weighted Word Error Rate Evaluation Toolkit},
author = {{Digital Green}},
year = {2025},
url = {https://huggingface.co/DigiGreen/Agri_AWWER_Toolkit},
}
```
## License
Apache 2.0
|