🌍 RoBERTa-Travel-Scorer (10-Dimensional Regression)

📌 Overview

This model is a high-precision Multi-output Regression model fine-tuned from DeBERTa-v3-small. Unlike standard classifiers that assign a text to a single category, this model predicts 10 continuous scores (0-10) simultaneously. Each score represents the "intensity" or "vibe" of a travel destination across different experiential dimensions.

The model serves as the analytical engine for the RouteMaker project, transforming raw travel blog descriptions into structured, quantifiable data for intelligent itinerary building.

🧠 Model Architecture & Training

The Distillation Process

To achieve "large model" reasoning in a "small model" footprint (suitable for real-time edge processing on a MacBook Air), we employed Knowledge Distillation:

Teacher Model: Llama 3 (3B) was used to label a custom dataset of thousands of travel blog posts (scraped from Bucketlistly, The Blonde Abroad, etc.).
Student Model: This DeBERTa-based model was trained to minimize the distance (MSE Loss) between its predictions and the Llama-3 generated distributions.

The 10 Dimensions (Outputs)

The model outputs a numerical value for each of the following:

Romance: Suitability for couples/honeymoons.
Family: Child-friendly infrastructure and activities.
Cost: Budget (0) to Luxury (10) scale.
Nature: Focus on landscapes, wildlife, and outdoors.
Adventure: Adrenaline, physical challenge, and exploration.
Culture: Historical depth, art, and local traditions.
Food: Culinary quality and diversity.
Relaxation: Wellness, slow pace, and chill atmosphere.
Service: Quality of hospitality and amenities.
Accessibility: Ease of transport and navigation.

📊 Performance & Data

Dataset: Custom-scraped travel blog data with heavy deduplication (Fuzzy Matching > 85% similarity).
Input Length: Optimized for 128 tokens (ideal for blog snippets and reviews).
Training Objective: Multi-target Mean Absolute Error (MAE).

🚀 Usage

Since this is a regression model, it is best used with the AutoModelForSequenceClassification but treated as a scorer.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "noamiman/roberta-finetuned-blog-analysis-10-labels"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "An incredible viewpoint on top of a mountain overlooking the beautiful remote Thung Yang beach. Perfect for sunsets."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    outputs = model(**inputs)
    scores = outputs.logits[0].tolist()

labels = ["Romance", "Family", "Cost", "Nature", "Adventure", "Culture", "Food", "Relaxation", "Service", "Accessibility"]
results = dict(zip(labels, [round(s, 2) for s in scores]))
print(results)

Downloads last month: 25

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for noamiman/roberta-finetuned-blog-analysis-10-labels

Base model

microsoft/deberta-v3-small

Finetuned

(188)

this model