Ubuntu commited on
Commit
a8fde3d
Β·
1 Parent(s): 712626c

added new data files

Browse files
README.md CHANGED
@@ -1,188 +1,124 @@
1
  ---
2
- title: Swiper Match
3
- emoji: 🌍
4
- colorFrom: purple
5
- colorTo: green
6
- sdk: gradio
7
- sdk_version: 5.31.0
8
- app_file: app.py
9
- pinned: false
10
- short_description: Match cars B2B
11
- models:
12
- - mzx/Swiper-Match
 
 
 
 
 
 
13
  ---
14
 
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
16
 
17
- # πŸš— Swiper-Match: HuggingFace-Enabled Car Dealer Predictor
18
 
19
- This Gradio app now uses pretrained models from HuggingFace Hub instead of local models, with automatic fallback support.
20
 
21
- ## Features
22
-
23
- - **Smart Dealer Matching**: Uses trained AutoGluon ML models to predict the best dealers for your car preferences
24
- - **Interactive Interface**: Easy-to-use web interface with dropdown selections and number inputs
25
- - **Top-K Predictions**: Get the top 5 recommended dealers with confidence scores
26
- - **Real-time Predictions**: Instant results as you adjust your car specifications
27
- - **No Data Leakage**: Models trained with carefully selected features to avoid bias
28
-
29
- ## How It Works
30
-
31
- The app uses a trained AutoGluon TabularPredictor model that:
32
-
33
- 1. **Takes car specifications** as input (make, model, year, body type, fuel type, etc.)
34
- 2. **Predicts dealer preferences** based on historical car sales and inventory data
35
- 3. **Returns ranked dealers** with confidence scores for each recommendation
36
 
37
  ## πŸš€ Quick Start
38
 
39
- ### 1. Install Dependencies
40
-
41
- ```bash
42
- pip install -r requirements.txt
43
- ```
44
-
45
- ### 2. Run the App
46
-
47
- ```bash
48
- python app.py
49
- ```
50
-
51
- The app will automatically:
52
- - βœ… Try to load the model from HuggingFace Hub (`mzx/Swiper-Match`)
53
- - πŸ”„ Fall back to local AutoGluon models if HF is unavailable
54
- - πŸ“Š Display which model source is being used in the interface
55
-
56
- ## πŸ€— HuggingFace Integration
57
-
58
- ### Model Loading Priority:
59
-
60
- 1. **Primary**: HuggingFace Hub model (`mzx/Swiper-Match`)
61
- - Uses custom `AutoGluonSwiperModel` wrapper
62
- - Enhanced feature compatibility
63
- - No local files required
64
-
65
- 2. **Fallback**: Local AutoGluon models
66
- - Searches in `../../../src/experiments/autogluon/models_swiper_hf/`
67
- - Original AutoGluon TabularPredictor interface
68
-
69
- ### Model Features:
70
-
71
- - **No Data Leakage**: Excludes dealer-identifying features
72
- - **GPU Optimized**: Trained with CUDA acceleration
73
- - **Ensemble Methods**: XGBoost, Neural Networks, Random Forest
74
- - **Auto-Stacking**: Combines best models automatically
75
-
76
- ## πŸ“Š Model Information
77
-
78
- The app displays real-time information about:
79
- - βœ… Model loading status
80
- - πŸ”§ Model type (HuggingFace Hub vs Local)
81
- - πŸ“ Repository/file location
82
- - πŸ‘₯ Number of trained dealers
83
- - 🎯 Feature engineering approach
84
-
85
- ## πŸ”§ Troubleshooting
86
-
87
- ### If HuggingFace loading fails:
88
- - Check internet connection
89
- - Verify `transformers` and `huggingface-hub` are installed
90
- - The app will automatically fall back to local models
91
-
92
- ### If both models fail:
93
- - Ensure AutoGluon is installed: `pip install autogluon.tabular`
94
- - Check that local model files exist in the expected directory
95
- - Review the console logs for detailed error messages
96
-
97
- ## πŸš€ Training Your Own Models
98
-
99
- To upload new models to HuggingFace Hub:
100
-
101
- 1. Run the training script:
102
- ```bash
103
- cd ../../../src/experiments/autogluon/
104
- python buyer_prediction_v4_hf.py
105
- ```
106
-
107
- 2. Set your HuggingFace token:
108
- ```bash
109
- export HF_TOKEN="your_token_here"
110
- # or add HF_TOKEN=your_token_here to .env file
111
- ```
112
-
113
- 3. The script will automatically upload to `mzx/Swiper-Match`
114
-
115
- ## πŸ“ Example Usage
116
-
117
  ```python
118
- # The app automatically handles model loading and prediction
119
- # Just use the Gradio interface or call directly:
120
-
121
- matcher = CarDealerMatcher(hf_repo="mzx/Swiper-Match", use_hf_hub=True)
122
- results = matcher.predict_dealers(
123
- make="Toyota", model="Camry", year=2020,
124
- body_type="Sedan", fuel_type="Petrol", transmission="Automatic",
125
- price=25000, odometer=50000, doors=4, seats=5
126
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
127
  ```
128
 
129
- ## 🎯 Features
130
 
131
- - **Smart Loading**: Automatic HuggingFace Hub integration
132
- - **Graceful Fallback**: Local model support when HF is unavailable
133
- - **Real-time Status**: Live model information display
134
- - **Enhanced Compatibility**: Comprehensive feature mapping
135
- - **Error Handling**: Clear error messages and recovery
 
136
 
137
- ## πŸ“š Dependencies
 
 
 
 
138
 
139
- - `gradio>=4.0.0` - Web interface
140
- - `transformers>=4.30.0` - HuggingFace model support
141
- - `huggingface-hub>=0.16.0` - Model downloading
142
- - `autogluon.tabular>=1.0.0` - Local model fallback
143
- - `torch>=2.0.0` - Neural network support
144
- - `pandas`, `numpy`, `scikit-learn` - Data processing
145
 
146
- ## Example Use Cases
 
 
 
 
147
 
148
- - **Car Buyers**: Find dealers most likely to have your dream car
149
- - **Market Research**: Understand dealer-vehicle relationships
150
- - **Inventory Planning**: Predict which dealers to approach for specific vehicles
151
 
152
- ## Technical Details
 
 
 
153
 
154
- The model uses these key features for prediction:
155
- - Vehicle specifications (make, model, year, body type)
156
- - Technical details (fuel type, transmission, engine specs)
157
- - Market factors (price range, mileage, physical attributes)
158
 
159
- **Note**: The model deliberately excludes dealer-identifying features to prevent data leakage and ensure fair predictions based on vehicle characteristics alone.
 
 
 
160
 
161
- ## Development
162
 
163
- ### Project Structure
164
- ```
165
- swiper-match/
166
- β”œβ”€β”€ src/experiments/autogluon/ # Model training code
167
- β”œβ”€β”€ huggingface-frontend/Swiper-match/ # Gradio app
168
- β”‚ β”œβ”€β”€ app.py # Main Gradio application
169
- β”‚ β”œβ”€β”€ requirements.txt # Python dependencies
170
- β”‚ └── README.md # This file
171
- └── data/ # Training data
172
- ```
173
 
174
- ### Extending the App
175
 
176
- To add new features:
 
 
 
 
 
 
 
 
177
 
178
- 1. **Modify the input form** in `app.py` by adding new Gradio components
179
- 2. **Update the prediction function** to handle new inputs
180
- 3. **Enhance the model** by retraining with additional features
181
- 4. **Improve the UI** by customizing the Gradio Blocks interface
182
 
183
- ## Support
 
 
 
 
184
 
185
- For issues or questions:
186
- 1. Check the model loading logs for detailed error messages
187
- 2. Verify all dependencies are correctly installed
188
- 3. Ensure the trained model files exist in the expected location
 
1
  ---
2
+ license: mit
3
+ tags:
4
+ - autogluon
5
+ - tabular
6
+ - automotive
7
+ - dealer-prediction
8
+ - swiper-match
9
+ - no-data-leakage
10
+ - gpu-optimized
11
+ language:
12
+ - en
13
+ datasets:
14
+ - custom
15
+ metrics:
16
+ - accuracy
17
+ - top-k-accuracy
18
+ library_name: autogluon
19
  ---
20
 
21
+ # πŸš— Swiper-Match: Car Dealer Prediction Model
22
 
23
+ This model predicts which car dealer is most likely to have a specific vehicle based **solely on vehicle characteristics**, ensuring no data leakage from dealer-identifying features.
24
 
25
+ ## 🎯 Model Details
26
 
27
+ - **Framework**: AutoGluon Tabular v1.3+
28
+ - **Training**: GPU-accelerated ensemble with early stopping
29
+ - **Dealers**: 73 different car dealers
30
+ - **Features**: Vehicle characteristics only (no dealer-identifying info)
31
+ - **No Leakage**: Strict exclusion of 29 dealer-identifying features
 
 
 
 
 
 
 
 
 
 
32
 
33
  ## πŸš€ Quick Start
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ```python
36
+ from transformers import AutoModel
37
+ import pandas as pd
38
+
39
+ # Load model
40
+ model = AutoModel.from_pretrained("mzx/Swiper-Match", trust_remote_code=True)
41
+
42
+ # Prepare input
43
+ vehicle_data = pd.DataFrame({
44
+ 'make': ['Toyota'],
45
+ 'model': ['Camry'],
46
+ 'year': [2020],
47
+ 'vehicle_type': ['Passenger'],
48
+ 'odometer': [50000],
49
+ 'condition': ['Used'],
50
+ 'car_age': [4]
51
+ })
52
+
53
+ # Get top-5 predictions
54
+ results = model.predict_top_k(vehicle_data, k=5)
55
+ print(f"Most likely dealer: {results['top_prediction']}")
56
+ print(f"Confidence: {results['top_confidence']:.2%}")
57
+ print(f"Top 5: {results['top_k_dict']}")
58
  ```
59
 
60
+ ## πŸ“Š Features Used
61
 
62
+ **Vehicle Characteristics**:
63
+ - Make, Model, Year, Variant, Series
64
+ - Body Type, Vehicle Type, Drive Type
65
+ - Engine specs (power, size, cylinders, fuel type)
66
+ - Transmission, Seats, Doors
67
+ - Condition, Odometer reading
68
 
69
+ **Excluded (No Leakage)**:
70
+ - Dealer names, IDs, locations
71
+ - Geographic information
72
+ - Dealer-specific business features
73
+ - URLs and source identifiers
74
 
75
+ ## πŸ”¬ Methodology
 
 
 
 
 
76
 
77
+ 1. **Data Preprocessing**: Removed all 29 dealer-identifying features
78
+ 2. **Balanced Training**: Oversampling ensures all dealers represented
79
+ 3. **GPU Training**: CUDA-accelerated with ensemble methods
80
+ 4. **Early Stopping**: Prevents overfitting, optimizes training time
81
+ 5. **Auto-Stacking**: AutoGluon combines best models automatically
82
 
83
+ ## πŸ“ˆ Performance
 
 
84
 
85
+ - **Training Data**: 34a7fad7... hash
86
+ - **GPU Enabled**: True
87
+ - **Models**: Ensemble of XGBoost, Neural Networks, CatBoost, Random Forest
88
+ - **Accuracy**: Top-1 and Top-5 accuracy on vehicle-dealer matching
89
 
90
+ ## ⚠️ Limitations
 
 
 
91
 
92
+ - Predictions based on historical patterns in training data
93
+ - Performance depends on similarity to training distribution
94
+ - May not generalize to dealers not seen during training
95
+ - Results are for research/demonstration purposes
96
 
97
+ ## πŸ”§ Technical Implementation
98
 
99
+ - **AutoGluon Backend**: High-performance ensemble learning
100
+ - **HuggingFace Wrapper**: Seamless integration with HF ecosystem
101
+ - **GPU Optimization**: CUDA acceleration for training and inference
102
+ - **Smart Caching**: Efficient model storage and loading
 
 
 
 
 
 
103
 
104
+ ## πŸ“ Citation
105
 
106
+ ```bibtex
107
+ @misc{swiper-match-2024,
108
+ title={Swiper-Match: GPU-Optimized Car Dealer Prediction},
109
+ author={Swiper-Match Team},
110
+ year={2024},
111
+ publisher={HuggingFace},
112
+ url={https://huggingface.co/mzx/Swiper-Match}
113
+ }
114
+ ```
115
 
116
+ ## 🀝 Usage Guidelines
 
 
 
117
 
118
+ This model is designed for:
119
+ - Research and educational purposes
120
+ - Automotive market analysis
121
+ - Dealer recommendation systems
122
+ - Machine learning demonstrations
123
 
124
+ **Please ensure compliance with applicable data privacy and usage regulations.**
 
 
 
app_new.py CHANGED
@@ -49,10 +49,12 @@ def create_app():
49
 
50
  # Create tabs
51
  with gr.Tabs():
 
 
52
  simple_tab = create_simple_tab(simple_matcher)
53
  detailed_tab = create_detailed_tab(matcher)
54
- traditional_tab = create_traditional_tab(matcher)
55
- simple_search_tab = create_simple_search_tab(matcher)
56
 
57
  # Model Information Footer
58
  gr.Markdown("---")
 
49
 
50
  # Create tabs
51
  with gr.Tabs():
52
+ simple_search_tab = create_simple_search_tab(matcher)
53
+ traditional_tab = create_traditional_tab(matcher)
54
  simple_tab = create_simple_tab(simple_matcher)
55
  detailed_tab = create_detailed_tab(matcher)
56
+
57
+
58
 
59
  # Model Information Footer
60
  gr.Markdown("---")
core/matcher.py CHANGED
@@ -274,11 +274,18 @@ class CarDealerMatcher:
274
  return '4 Door'
275
 
276
  def load_data_files(self):
277
- """Load available CSV data files"""
278
  try:
279
  if os.path.exists(DATA_DIR):
280
  csv_files = glob.glob(os.path.join(DATA_DIR, "*.csv"))
281
- self.data_files = [os.path.basename(f) for f in csv_files]
 
 
 
 
 
 
 
282
  logger.info(f"βœ… Found {len(self.data_files)} CSV files: {self.data_files}")
283
  else:
284
  self.data_files = []
 
274
  return '4 Door'
275
 
276
  def load_data_files(self):
277
+ """Load available CSV data files with combined data first"""
278
  try:
279
  if os.path.exists(DATA_DIR):
280
  csv_files = glob.glob(os.path.join(DATA_DIR, "*.csv"))
281
+ file_names = [os.path.basename(f) for f in csv_files]
282
+
283
+ # Sort files to ensure Combined_Car_Listings.csv appears first
284
+ combined_files = [f for f in file_names if 'Combined' in f or 'combined' in f]
285
+ other_files = [f for f in file_names if 'Combined' not in f and 'combined' not in f]
286
+
287
+ # Put combined files first, then sort the rest alphabetically
288
+ self.data_files = sorted(combined_files) + sorted(other_files)
289
  logger.info(f"βœ… Found {len(self.data_files)} CSV files: {self.data_files}")
290
  else:
291
  self.data_files = []
ui/tabs/simple_search_tab.py CHANGED
@@ -9,7 +9,7 @@ from core.config import CAR_MAKES, MAKE_MODEL_DATA, DEFAULT_VALUES
9
  def create_simple_search_tab(matcher):
10
  """Create the simple traditional search tab"""
11
 
12
- with gr.Tab("πŸ” Traditional Simple Search"):
13
  gr.Markdown("### Quick CSV Data Search")
14
  gr.Markdown("Simple search through car listing CSV files with basic filters")
15
 
 
9
  def create_simple_search_tab(matcher):
10
  """Create the simple traditional search tab"""
11
 
12
+ with gr.Tab("1️⃣ SQL Search Simple"):
13
  gr.Markdown("### Quick CSV Data Search")
14
  gr.Markdown("Simple search through car listing CSV files with basic filters")
15
 
ui/tabs/traditional_tab.py CHANGED
@@ -9,7 +9,7 @@ from core.config import DEFAULT_VALUES
9
  def create_traditional_tab(matcher):
10
  """Create the traditional search tab"""
11
 
12
- with gr.Tab("πŸ“Š Traditional Search"):
13
  gr.Markdown("### CSV Data File Search")
14
  gr.Markdown("Search through car listing CSV files and rank dealers by inventory size")
15
 
 
9
  def create_traditional_tab(matcher):
10
  """Create the traditional search tab"""
11
 
12
+ with gr.Tab("2️⃣ SQL Search Detailed"):
13
  gr.Markdown("### CSV Data File Search")
14
  gr.Markdown("Search through car listing CSV files and rank dealers by inventory size")
15