MyVillage Project - Intent Router Model

This is a fine-tuned DistilBERT model designed to route user queries within the MyVillage Project (Coding in Color) chatbot ecosystem.

Unlike a standard chatbot that answers everything directly, this model acts as a Traffic Controller. It analyzes the user's metadata and conversation history (last 5 messages) to classify their intent into one of 6 organizational categories. The system then routes the request to the correct database or API endpoint (e.g., directing "Invoice questions" to the Finance System).

🎯 Intent Categories (Labels)

The model predicts one of the following 6 distinct topics:

Label ID	Label Name	Description	Key Indicators (Examples)
0	`FINANCIAL`	Money, Payments, Invoices	"Where do I upload receipt?", "W9 form", "Reimbursement", "Vendor payment"
1	`CIC_EVENTS`	Coding in Color Events	"Student showcase", "Hackathon", "Robot demo", "Registration deadline"
2	`CIC_ACTIVITIES`	Internal Dev Work	"Slack check-in", "n8n workflow", "Pushing code", "Daily standup", "API error"
3	`ORG_RESOURCES`	General Admin/IT Support	"Lost password", "Employee handbook", "Laptop request", "HR contact"
4	`ORG_EVENTS`	Strategic/Community Events	"Board meeting", "Town hall", "Fundraising gala", "Demographic analysis"
5	`STAFF_GRANTS`	Funding & Proposals	"NSF proposal", "Grant submission", "Budget review", "Logic model", "Impact metrics"

📊 Model Performance

Training Results

The model achieved 100% Accuracy on the validation set by Epoch 2, demonstrating rapid convergence on the synthetic dataset.

Metric	Score	Note
Validation Accuracy	1.0000	Perfect memorization of validation patterns.
Validation Loss	0.0553	Extremely high confidence in predictions.

Real-World Inference Test

When tested on 30+ unseen edge cases (including trick questions and overlapping concepts), the model achieved:

Inference Accuracy: 90.91%
Known Weakness: The model occasionally confuses Logistics for Org Events (e.g., ordering lunch for a board meeting) with CIC Events (ordering pizza for students).
Strength: Excellent distinction between "Dev Work" (CIC_ACTIVITIES) and "IT Support" (ORG_RESOURCES).

🚀 How to Use

Crucial: This model expects a specific input format. You must concatenate the user's metadata and query history into a single string.

Input Format: Role: {role} | Name: {name} | ID: {id} | Phone: {phone} | Email: {email} | History: 'msg1', 'msg2', 'msg3', 'msg4', 'msg5'

Python Example

from transformers import pipeline

# 1. Load Model
router = pipeline("text-classification", model="your-username/myvillage-router-v1")

# 2. Formulate Input (Simulating a Director asking about Grants)
input_text = "Role: Director | Name: Sarah Boss | ID: 0012 | Phone: 555-0000 | Email: s.boss@mvp.org | History: 'Draft the narrative for the NSF proposal.', 'Review the budget section.', 'Did we get the funding?', 'Attach the logic model PDF.', 'When is the submission deadline?'"

# 3. Predict
result = router(input_text)

print(f"Routed To: {result[0]['label']} (Confidence: {result[0]['score']:.4f})")
# Output: STAFF_GRANTS (Confidence: 0.9823)

⚠️ Limitations

Context Window: The model relies heavily on the last 5 messages. If the intent is not clear in that window, accuracy may drop.
Synthetic Bias: The model was trained on synthetic data. While it handles natural language well, it may struggle with highly specific slang or typos not present in the training set.
Role vs. Content: The model is trained to prioritize Content over Role. (e.g., A "Director" asking about "Python Code" will be routed to CIC_ACTIVITIES, not STAFF_GRANTS).

🛠️ Training Data

The model was trained on 250+ synthetic examples generated to mimic the specific operational workflows of the MyVillage Project. The data includes:

Redundant History Patterns: Users often repeat intents in different ways.
Role Variation: Every intent is paired with every role (e.g., Admins asking Student questions) to prevent role-based overfitting.

Downloads last month: 2

Safetensors

Model size

67M params

Tensor type

F32