yassine123Z's picture
Update README.md
9c1b594 verified
metadata
title: EmissionFactor Mapper
emoji: 🌿
colorFrom: purple
colorTo: pink
sdk: docker
pinned: false
license: mit
short_description: AI-powered transaction classifier for carbon accounting

🌿 Emission Factor Mapper

Intelligent AI-powered classification system for sustainability transaction data

Automatically map your financial transactions (like "hotel booking for conference", "electric vehicle charging", or "office furniture purchase") to standardized emission factor categories (Cat1, Cat2) used for accurate COβ‚‚ footprint analysis and ESG reporting.

License: MIT Python 3.8+ HuggingFace


🎯 What Does This Do?

This application solves a critical challenge in carbon accounting: automatically categorizing thousands of financial transactions into standardized emission categories. Instead of manually reviewing each purchase, expense, or invoice, the AI model:

  • βœ… Classifies transactions into 12 primary emission categories
  • βœ… Maps to 82 detailed subcategories for precise carbon calculations
  • βœ… Provides confidence scores for quality assurance
  • βœ… Enables batch processing of CSV files with review capabilities
  • βœ… Tracks manual corrections for continuous model improvement
  • βœ… Compares different AI models to optimize accuracy

Perfect for sustainability teams, carbon accountants, ESG analysts, and finance departments working on Scope 3 emissions reporting.


πŸš€ Demo

🟒 Try the web UI:
https://yassine123z-emissionfactor-mapper2-v2-gradio2ui.hf.space/

πŸ“± Four Powerful Modes:

1️⃣ Single Transaction - Quick Classification

Enter any transaction description and get instant predictions:

  • Input: "Business class flight from London to New York"
  • Output:
    • Cat1: Mobility (passengers)
    • Cat2: Air transport
    • Confidence: 0.94

2️⃣ Batch Review - Process Hundreds at Once

Upload a CSV file with your transactions and:

  • ✨ Get automatic classifications for all rows
  • πŸ“Š Review results in an interactive table
  • ✏️ Edit predictions directly (dropdown menus included)
  • πŸ’Ύ Download corrected dataset
  • πŸ“ˆ Export training data for model retraining

3️⃣ Corrections History - Track & Improve

  • πŸ“‹ View all manual corrections you've made
  • πŸ• Timestamp tracking for audit trails
  • πŸ“€ Export correction logs for model fine-tuning
  • πŸ“Š Analyze patterns in misclassifications

4️⃣ Model Comparison - A/B Testing

  • πŸ§ͺ Compare current model vs. any HuggingFace model
  • πŸ“‰ Side-by-side predictions with match rates
  • 🎯 Evaluate performance before deployment
  • πŸ”¬ Test on your own dataset

🧠 API Usage

Base URL

https://yassine123z-emissionfactor-mapper2-v2-gradio.hf.space/map_categories

πŸ”Œ Endpoint 1: Batch Classification

POST /map_categories

Classify multiple transactions in a single API call.

Example JSON:

{
  "transactions": [
    "Train ticket Paris to Berlin",
    "Office lighting electricity",
    "Laptop purchase for employee"
  ]
}

Response:

{
  "matches": [
    {
      "input_text": "Train ticket Paris to Berlin",
      "best_Cat1": "Mobility (passengers)",
      "best_Cat2": "Train transport",
      "similarity": 0.96
    },
    {
      "input_text": "Office lighting electricity",
      "best_Cat1": "Use of electricity",
      "best_Cat2": "Standard",
      "similarity": 0.89
    },
    {
      "input_text": "Laptop purchase for employee",
      "best_Cat1": "Purchase of goods",
      "best_Cat2": "Electrical equipment",
      "similarity": 0.92
    }
  ]
}

πŸ—‚οΈ Emission Categories

πŸ“‹ Complete Category Structure

The model classifies into 12 primary categories and 82 subcategories:

1. Purchase of Goods (10 subcategories)

Sporting goods, Buildings, Office supplies, Water consumption, Household appliances, Electrical equipment, Machinery and equipment, Furniture, Textiles and clothing, Vehicles

2. Purchase of Materials (6 subcategories)

Construction materials, Organic materials, Paper and cardboard, Plastics and rubber, Chemicals, Refrigerants and others

3. Purchase of Services (14 subcategories)

Equipment rental, Building rental, Furniture rental, Vehicle rental, Information/cultural services, Catering, Health services, Specialized crafts, Admin/consulting, Cleaning, IT services, Logistics, Marketing, Technical services

4. Food & Beverages (10 subcategories)

Alcoholic beverages, Non-alcoholic beverages, Condiments, Desserts, Fruits and vegetables, Fats and oils, Prepared meals, Animal products, Cereal products, Dairy products

5. Heating and Air Conditioning (2 subcategories)

Heat and steam, Air conditioning and refrigeration

6. Fuels (6 subcategories)

Fossil fuels, Mobile fossil fuels, Organic fuels, Gaseous fossil fuels, Liquid fossil fuels, Solid fossil fuels

7. Mobility (Freight) (5 subcategories)

Air transport, Ship transport, Truck transport, Combined transport, Train transport

8. Mobility (Passengers) (11 subcategories)

Air transport, Coach/Urban bus, Ship transport, Combined transport, E-Bike, Accommodation/Events, Soft mobility, Motorcycle/Scooter, Train transport, Public transport, Car

9. Process and Fugitive Emissions (3 subcategories)

Agriculture, Global warming potential, Industrial processes

10. Waste Treatment (12 subcategories)

Commercial/industrial, Wastewater, Electrical equipment, Households, Metal, Organic materials, Paper and cardboard, Batteries, Plastics, Fugitive emissions, Textiles, Glass

11. Use of Electricity (3 subcategories)

Electricity for electric vehicles, Renewables, Standard


πŸ“‚ CSV File Format

Required Format

Your CSV must contain a column named transaction (lowercase):

transaction
Hotel stay in Berlin for 3 nights
Train ticket from Amsterdam to Brussels
Office supplies - pens and notebooks
Electric vehicle charging
Restaurant lunch for team meeting

Processing Results

After processing, you'll get:

ID,Transaction,Cat1,Cat2,Confidence,Status
1,Hotel stay in Berlin,Mobility (passengers),Accommodation / Events,0.91,βœ… OK
2,Train ticket Amsterdam-Brussels,Mobility (passengers),Train transport,0.96,βœ… OK
3,Office supplies,Purchase of goods,Office supplies,0.93,βœ… OK

Status Indicators

  • βœ… OK: High confidence (>0.8) - Auto-approved
  • ⚠️ Review: Lower confidence - Needs manual review

🧠 Model Architecture

Technical Details

Model: yassine123Z/EmissionFactor-mapper2-v2

  • Type: SetFit (Sentence Transformer Fine-tuning)
  • Base: Optimized sentence transformer architecture
  • Training: Few-shot learning on emission factor data
  • Embeddings: 384-dimensional semantic vectors
  • Matching: Cosine similarity scoring

Performance Metrics

  • ⚑ Speed: ~50ms per transaction
  • πŸ“Š Throughput: 100+ transactions/minute
  • 🎯 Accuracy: 85%+ on test set
  • πŸ’Ύ Model Size: ~400MB
  • πŸ”‹ Average Confidence: 0.87


πŸ“š Resources


πŸ“„ License

MIT License - Feel free to use in commercial and open-source projects.


πŸ‘¨β€πŸ’» Author

Yassine


🌱 Making sustainability data smarter, one transaction at a time

Built with ❀️ using SetFit, Gradio & FastAPI