π RoBERTa-Travel-Scorer (10-Dimensional Regression)
π Overview
This model is a high-precision Multi-output Regression model fine-tuned from DeBERTa-v3-small. Unlike standard classifiers that assign a text to a single category, this model predicts 10 continuous scores (0-10) simultaneously. Each score represents the "intensity" or "vibe" of a travel destination across different experiential dimensions.
The model serves as the analytical engine for the RouteMaker project, transforming raw travel blog descriptions into structured, quantifiable data for intelligent itinerary building.
π§ Model Architecture & Training
The Distillation Process
To achieve "large model" reasoning in a "small model" footprint (suitable for real-time edge processing on a MacBook Air), we employed Knowledge Distillation:
- Teacher Model: Llama 3 (3B) was used to label a custom dataset of thousands of travel blog posts (scraped from Bucketlistly, The Blonde Abroad, etc.).
- Student Model: This DeBERTa-based model was trained to minimize the distance (MSE Loss) between its predictions and the Llama-3 generated distributions.
The 10 Dimensions (Outputs)
The model outputs a numerical value for each of the following:
- Romance: Suitability for couples/honeymoons.
- Family: Child-friendly infrastructure and activities.
- Cost: Budget (0) to Luxury (10) scale.
- Nature: Focus on landscapes, wildlife, and outdoors.
- Adventure: Adrenaline, physical challenge, and exploration.
- Culture: Historical depth, art, and local traditions.
- Food: Culinary quality and diversity.
- Relaxation: Wellness, slow pace, and chill atmosphere.
- Service: Quality of hospitality and amenities.
- Accessibility: Ease of transport and navigation.
π Performance & Data
- Dataset: Custom-scraped travel blog data with heavy deduplication (Fuzzy Matching > 85% similarity).
- Input Length: Optimized for 128 tokens (ideal for blog snippets and reviews).
- Training Objective: Multi-target Mean Absolute Error (MAE).
π Usage
Since this is a regression model, it is best used with the AutoModelForSequenceClassification but treated as a scorer.
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "noamiman/roberta-finetuned-blog-analysis-10-labels"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "An incredible viewpoint on top of a mountain overlooking the beautiful remote Thung Yang beach. Perfect for sunsets."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
scores = outputs.logits[0].tolist()
labels = ["Romance", "Family", "Cost", "Nature", "Adventure", "Culture", "Food", "Relaxation", "Service", "Accessibility"]
results = dict(zip(labels, [round(s, 2) for s in scores]))
print(results)
- Downloads last month
- 25
Model tree for noamiman/roberta-finetuned-blog-analysis-10-labels
Base model
microsoft/deberta-v3-small