🌍 RoBERTa-Travel-Scorer (10-Dimensional Regression)

πŸ“Œ Overview

This model is a high-precision Multi-output Regression model fine-tuned from DeBERTa-v3-small. Unlike standard classifiers that assign a text to a single category, this model predicts 10 continuous scores (0-10) simultaneously. Each score represents the "intensity" or "vibe" of a travel destination across different experiential dimensions.

The model serves as the analytical engine for the RouteMaker project, transforming raw travel blog descriptions into structured, quantifiable data for intelligent itinerary building.

🧠 Model Architecture & Training

The Distillation Process

To achieve "large model" reasoning in a "small model" footprint (suitable for real-time edge processing on a MacBook Air), we employed Knowledge Distillation:

  1. Teacher Model: Llama 3 (3B) was used to label a custom dataset of thousands of travel blog posts (scraped from Bucketlistly, The Blonde Abroad, etc.).
  2. Student Model: This DeBERTa-based model was trained to minimize the distance (MSE Loss) between its predictions and the Llama-3 generated distributions.

The 10 Dimensions (Outputs)

The model outputs a numerical value for each of the following:

  • Romance: Suitability for couples/honeymoons.
  • Family: Child-friendly infrastructure and activities.
  • Cost: Budget (0) to Luxury (10) scale.
  • Nature: Focus on landscapes, wildlife, and outdoors.
  • Adventure: Adrenaline, physical challenge, and exploration.
  • Culture: Historical depth, art, and local traditions.
  • Food: Culinary quality and diversity.
  • Relaxation: Wellness, slow pace, and chill atmosphere.
  • Service: Quality of hospitality and amenities.
  • Accessibility: Ease of transport and navigation.

πŸ“Š Performance & Data

  • Dataset: Custom-scraped travel blog data with heavy deduplication (Fuzzy Matching > 85% similarity).
  • Input Length: Optimized for 128 tokens (ideal for blog snippets and reviews).
  • Training Objective: Multi-target Mean Absolute Error (MAE).

πŸš€ Usage

Since this is a regression model, it is best used with the AutoModelForSequenceClassification but treated as a scorer.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "noamiman/roberta-finetuned-blog-analysis-10-labels"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "An incredible viewpoint on top of a mountain overlooking the beautiful remote Thung Yang beach. Perfect for sunsets."

inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
    outputs = model(**inputs)
    scores = outputs.logits[0].tolist()

labels = ["Romance", "Family", "Cost", "Nature", "Adventure", "Culture", "Food", "Relaxation", "Service", "Accessibility"]
results = dict(zip(labels, [round(s, 2) for s in scores]))
print(results)
Downloads last month
25
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for noamiman/roberta-finetuned-blog-analysis-10-labels

Finetuned
(188)
this model