yassine123Z's picture
Update README.md
9c1b594 verified
---
title: EmissionFactor Mapper
emoji: 🌿
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: mit
short_description: AI-powered transaction classifier for carbon accounting
---
# 🌿 Emission Factor Mapper
**Intelligent AI-powered classification system for sustainability transaction data**
Automatically map your financial transactions (like *"hotel booking for conference"*, *"electric vehicle charging"*, or *"office furniture purchase"*) to standardized emission factor categories (Cat1, Cat2) used for accurate COβ‚‚ footprint analysis and ESG reporting.
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![HuggingFace](https://img.shields.io/badge/πŸ€—-HuggingFace-yellow)](https://huggingface.co/yassine123Z/EmissionFactor-mapper2-v2)
---
## 🎯 What Does This Do?
This application solves a critical challenge in **carbon accounting**: automatically categorizing thousands of financial transactions into standardized emission categories. Instead of manually reviewing each purchase, expense, or invoice, the AI model:
- βœ… **Classifies** transactions into 12 primary emission categories
- βœ… **Maps** to 82 detailed subcategories for precise carbon calculations
- βœ… **Provides** confidence scores for quality assurance
- βœ… **Enables** batch processing of CSV files with review capabilities
- βœ… **Tracks** manual corrections for continuous model improvement
- βœ… **Compares** different AI models to optimize accuracy
Perfect for **sustainability teams**, **carbon accountants**, **ESG analysts**, and **finance departments** working on Scope 3 emissions reporting.
---
## πŸš€ Demo
🟒 **Try the web UI:**
[https://yassine123z-emissionfactor-mapper2-v2-gradio2ui.hf.space/](https://yassine123z-emissionfactor-mapper2-v2-gradio2ui.hf.space/)
### πŸ“± Four Powerful Modes:
#### 1️⃣ **Single Transaction** - Quick Classification
Enter any transaction description and get instant predictions:
- **Input**: `"Business class flight from London to New York"`
- **Output**:
- Cat1: `Mobility (passengers)`
- Cat2: `Air transport`
- Confidence: `0.94`
#### 2️⃣ **Batch Review** - Process Hundreds at Once
Upload a CSV file with your transactions and:
- ✨ Get automatic classifications for all rows
- πŸ“Š Review results in an interactive table
- ✏️ Edit predictions directly (dropdown menus included)
- πŸ’Ύ Download corrected dataset
- πŸ“ˆ Export training data for model retraining
#### 3️⃣ **Corrections History** - Track & Improve
- πŸ“‹ View all manual corrections you've made
- πŸ• Timestamp tracking for audit trails
- πŸ“€ Export correction logs for model fine-tuning
- πŸ“Š Analyze patterns in misclassifications
#### 4️⃣ **Model Comparison** - A/B Testing
- πŸ§ͺ Compare current model vs. any HuggingFace model
- πŸ“‰ Side-by-side predictions with match rates
- 🎯 Evaluate performance before deployment
- πŸ”¬ Test on your own dataset
---
## 🧠 API Usage
### Base URL
```
https://yassine123z-emissionfactor-mapper2-v2-gradio.hf.space/map_categories
```
---
### πŸ”Œ Endpoint 1: Batch Classification
**POST** `/map_categories`
Classify multiple transactions in a single API call.
**Example JSON:**
```json
{
"transactions": [
"Train ticket Paris to Berlin",
"Office lighting electricity",
"Laptop purchase for employee"
]
}
```
**Response:**
```json
{
"matches": [
{
"input_text": "Train ticket Paris to Berlin",
"best_Cat1": "Mobility (passengers)",
"best_Cat2": "Train transport",
"similarity": 0.96
},
{
"input_text": "Office lighting electricity",
"best_Cat1": "Use of electricity",
"best_Cat2": "Standard",
"similarity": 0.89
},
{
"input_text": "Laptop purchase for employee",
"best_Cat1": "Purchase of goods",
"best_Cat2": "Electrical equipment",
"similarity": 0.92
}
]
}
```
## πŸ—‚οΈ Emission Categories
### πŸ“‹ Complete Category Structure
The model classifies into **12 primary categories** and **82 subcategories**:
#### 1. **Purchase of Goods** (10 subcategories)
Sporting goods, Buildings, Office supplies, Water consumption, Household appliances, Electrical equipment, Machinery and equipment, Furniture, Textiles and clothing, Vehicles
#### 2. **Purchase of Materials** (6 subcategories)
Construction materials, Organic materials, Paper and cardboard, Plastics and rubber, Chemicals, Refrigerants and others
#### 3. **Purchase of Services** (14 subcategories)
Equipment rental, Building rental, Furniture rental, Vehicle rental, Information/cultural services, Catering, Health services, Specialized crafts, Admin/consulting, Cleaning, IT services, Logistics, Marketing, Technical services
#### 4. **Food & Beverages** (10 subcategories)
Alcoholic beverages, Non-alcoholic beverages, Condiments, Desserts, Fruits and vegetables, Fats and oils, Prepared meals, Animal products, Cereal products, Dairy products
#### 5. **Heating and Air Conditioning** (2 subcategories)
Heat and steam, Air conditioning and refrigeration
#### 6. **Fuels** (6 subcategories)
Fossil fuels, Mobile fossil fuels, Organic fuels, Gaseous fossil fuels, Liquid fossil fuels, Solid fossil fuels
#### 7. **Mobility (Freight)** (5 subcategories)
Air transport, Ship transport, Truck transport, Combined transport, Train transport
#### 8. **Mobility (Passengers)** (11 subcategories)
Air transport, Coach/Urban bus, Ship transport, Combined transport, E-Bike, Accommodation/Events, Soft mobility, Motorcycle/Scooter, Train transport, Public transport, Car
#### 9. **Process and Fugitive Emissions** (3 subcategories)
Agriculture, Global warming potential, Industrial processes
#### 10. **Waste Treatment** (12 subcategories)
Commercial/industrial, Wastewater, Electrical equipment, Households, Metal, Organic materials, Paper and cardboard, Batteries, Plastics, Fugitive emissions, Textiles, Glass
#### 11. **Use of Electricity** (3 subcategories)
Electricity for electric vehicles, Renewables, Standard
---
## πŸ“‚ CSV File Format
### Required Format
Your CSV must contain a column named **`transaction`** (lowercase):
```csv
transaction
Hotel stay in Berlin for 3 nights
Train ticket from Amsterdam to Brussels
Office supplies - pens and notebooks
Electric vehicle charging
Restaurant lunch for team meeting
```
### Processing Results
After processing, you'll get:
```csv
ID,Transaction,Cat1,Cat2,Confidence,Status
1,Hotel stay in Berlin,Mobility (passengers),Accommodation / Events,0.91,βœ… OK
2,Train ticket Amsterdam-Brussels,Mobility (passengers),Train transport,0.96,βœ… OK
3,Office supplies,Purchase of goods,Office supplies,0.93,βœ… OK
```
### Status Indicators
- **βœ… OK**: High confidence (>0.8) - Auto-approved
- **⚠️ Review**: Lower confidence - Needs manual review
---
## 🧠 Model Architecture
### Technical Details
**Model**: `yassine123Z/EmissionFactor-mapper2-v2`
- **Type**: SetFit (Sentence Transformer Fine-tuning)
- **Base**: Optimized sentence transformer architecture
- **Training**: Few-shot learning on emission factor data
- **Embeddings**: 384-dimensional semantic vectors
- **Matching**: Cosine similarity scoring
### Performance Metrics
- ⚑ **Speed**: ~50ms per transaction
- πŸ“Š **Throughput**: 100+ transactions/minute
- 🎯 **Accuracy**: 85%+ on test set
- πŸ’Ύ **Model Size**: ~400MB
- πŸ”‹ **Average Confidence**: 0.87
---
---
## πŸ“š Resources
- πŸ€— **Model Card**: [yassine123Z/EmissionFactor-mapper2-v2](https://huggingface.co/yassine123Z/EmissionFactor-mapper2-v2)
- 🌐 **Live Demo**: [Web Interface](https://yassine123z-emissionfactor-mapper2-v2-gradio2ui.hf.space/)
- πŸ“– **SetFit Documentation**: [GitHub](https://github.com/huggingface/setfit)
---
## πŸ“„ License
MIT License - Feel free to use in commercial and open-source projects.
---
## πŸ‘¨β€πŸ’» Author
**Yassine**
- πŸ€— HuggingFace: [@yassine123Z](https://huggingface.co/yassine123Z)
---
<div align="center">
**🌱 Making sustainability data smarter, one transaction at a time**
Built with ❀️ using SetFit, Gradio & FastAPI
</div>