Meraki Tagger

Meraki Tagger is a multi-label sentence classifier designed to analyze humanitarian text data. It identifies key themes and actionable insights from reports, interviews, and field notes, tagging sentences with relevant humanitarian sectors and indicators (e.g., Food Security, Health, Protection, Advocacy Achievement).

This model is fine-tuned on a domain-adapted version of microsoft/deberta-v3-large.

Model Details

Base Model: microsoft/deberta-v3-large
Fine-tuning: Multi-label sequence classification using a custom humanitarian dataset.
Domain Adaptation: Pre-trained (MLM) on a corpus of domain-specific documents (PDFs/DOCX) to better understand sector-specific terminology.
Architecture: Transformer-based encoder with a multi-label classification head.

Usage

Inference API

You can use the Hugging Face Inference API to query this model directly.

import requests

API_URL = "https://api-inference.huggingface.co/models/AaranNihalani/MerakiTagger"
headers = {"Authorization": "Bearer YOUR_HF_TOKEN"}

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

output = query({
    "inputs": "The refugee camp is facing a severe shortage of clean water.",
})
print(output)

Local Usage (Transformers)

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "AaranNihalani/MerakiTagger"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

text = "We need urgent medical supplies for the clinic."
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.sigmoid(logits)

# Mapping IDs to labels
id2label = model.config.id2label
# Get labels with > 50% confidence (or use custom thresholds)
for idx, score in enumerate(probs[0]):
    if score > 0.5:
        print(f"{id2label[idx]}: {score:.4f}")

Thresholding

This model is trained with class-imbalanced data. For optimal performance, it is recommended to use per-label thresholds rather than a global 0.5 cutoff.

A thresholds.json file is included in the model repository containing optimized thresholds for each tag based on validation set F1 maximization.

Intended Use

Primary Use Case: Automated tagging of humanitarian daily reports, needs assessments, and qualitative survey responses.
Target Audience: NGOs, aid workers, and data analysts in the humanitarian sector.

Training Procedure

Domain Adaptation: Masked Language Modeling (MLM) on a collection of sector-specific reports.
Fine-Tuning: Supervised multi-label classification on a labeled dataset of sentences.
Optimization: Trained with class-aware loss weights to handle label imbalance.

Limitations

The model is specialized for humanitarian English text and may not perform well on general domain text or other languages.
As a statistical model, it may occasionally hallucinate tags or miss nuanced context. Human review is recommended for critical decision-making.

Downloads last month: 29

Safetensors

Model size

0.4B params

Tensor type

F32

AaranNihalani
/

MerakiTagger