Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,13 +1,292 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Fetii AI Assistant
|
| 2 |
+
|
| 3 |
+
A sophisticated Streamlit-based analytics dashboard and conversational AI system for analyzing Austin rideshare patterns and trip data.
|
| 4 |
+
|
| 5 |
+
## Overview
|
| 6 |
+
|
| 7 |
+
Fetii AI Assistant combines advanced data processing, interactive visualizations, and natural language query processing to provide insights into Austin rideshare operations. The system processes trip data to identify patterns, peak hours, popular locations, and group size distributions while offering an intuitive chat interface for data exploration.
|
| 8 |
+
|
| 9 |
+
## Architecture
|
| 10 |
+
|
| 11 |
+
```mermaid
|
| 12 |
+
graph TB
|
| 13 |
+
A[User Interface] --> B[Streamlit Frontend]
|
| 14 |
+
B --> C[Main Application]
|
| 15 |
+
C --> D[Data Processor]
|
| 16 |
+
C --> E[Chatbot Engine]
|
| 17 |
+
C --> F[Visualizations Module]
|
| 18 |
+
|
| 19 |
+
D --> G[CSV Data Source]
|
| 20 |
+
D --> H[Sample Data Generator]
|
| 21 |
+
|
| 22 |
+
E --> I[Query Parser]
|
| 23 |
+
E --> J[Response Generator]
|
| 24 |
+
E --> K[Location Matcher]
|
| 25 |
+
|
| 26 |
+
F --> L[Plotly Charts]
|
| 27 |
+
F --> M[D3.js Network Viz]
|
| 28 |
+
F --> N[Interactive Heatmaps]
|
| 29 |
+
|
| 30 |
+
style A fill:#e1f5fe
|
| 31 |
+
style B fill:#f3e5f5
|
| 32 |
+
style C fill:#fff3e0
|
| 33 |
+
style D fill:#e8f5e8
|
| 34 |
+
style E fill:#fce4ec
|
| 35 |
+
style F fill:#f1f8e9
|
| 36 |
+
```
|
| 37 |
+
|
| 38 |
+
## System Components
|
| 39 |
+
|
| 40 |
+
### Core Modules
|
| 41 |
+
|
| 42 |
+
```mermaid
|
| 43 |
+
classDiagram
|
| 44 |
+
class DataProcessor {
|
| 45 |
+
+load_and_process_data()
|
| 46 |
+
+get_quick_insights()
|
| 47 |
+
+get_location_stats()
|
| 48 |
+
+get_time_patterns()
|
| 49 |
+
+query_data()
|
| 50 |
+
-_clean_data()
|
| 51 |
+
-_extract_temporal_features()
|
| 52 |
+
-_extract_location_features()
|
| 53 |
+
}
|
| 54 |
+
|
| 55 |
+
class FetiiChatbot {
|
| 56 |
+
+process_query()
|
| 57 |
+
+get_conversation_history()
|
| 58 |
+
+clear_history()
|
| 59 |
+
-_parse_query()
|
| 60 |
+
-_generate_response()
|
| 61 |
+
-_fuzzy_search_location()
|
| 62 |
+
}
|
| 63 |
+
|
| 64 |
+
class Visualizations {
|
| 65 |
+
+create_visualizations()
|
| 66 |
+
+create_hourly_chart()
|
| 67 |
+
+create_group_size_chart()
|
| 68 |
+
+create_time_heatmap()
|
| 69 |
+
+create_distance_analysis()
|
| 70 |
+
}
|
| 71 |
+
|
| 72 |
+
DataProcessor --> FetiiChatbot : uses
|
| 73 |
+
DataProcessor --> Visualizations : feeds data
|
| 74 |
+
FetiiChatbot --> Visualizations : requests charts
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
## Data Flow
|
| 78 |
+
|
| 79 |
+
```mermaid
|
| 80 |
+
sequenceDiagram
|
| 81 |
+
participant U as User
|
| 82 |
+
participant S as Streamlit UI
|
| 83 |
+
participant C as Chatbot
|
| 84 |
+
participant D as Data Processor
|
| 85 |
+
participant V as Visualizations
|
| 86 |
+
|
| 87 |
+
U->>S: Asks question about rideshare data
|
| 88 |
+
S->>C: Forward user query
|
| 89 |
+
C->>C: Parse query intent and parameters
|
| 90 |
+
C->>D: Request relevant data analysis
|
| 91 |
+
D->>D: Process data and calculate insights
|
| 92 |
+
D-->>C: Return analysis results
|
| 93 |
+
C->>C: Generate natural language response
|
| 94 |
+
C-->>S: Return formatted response
|
| 95 |
+
S->>V: Request updated visualizations
|
| 96 |
+
V->>D: Get processed data
|
| 97 |
+
D-->>V: Return visualization data
|
| 98 |
+
V-->>S: Return interactive charts
|
| 99 |
+
S-->>U: Display response and updated charts
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
## Features
|
| 103 |
+
|
| 104 |
+
### 1. Data Processing Engine
|
| 105 |
+
- **CSV Data Loading**: Robust parsing of rideshare trip data
|
| 106 |
+
- **Data Cleaning**: Handles missing values, invalid entries, and data standardization
|
| 107 |
+
- **Feature Engineering**: Extracts temporal patterns, location categories, and group classifications
|
| 108 |
+
- **Real-time Analytics**: Calculates insights on-demand for responsive user experience
|
| 109 |
+
|
| 110 |
+
### 2. Conversational AI Interface
|
| 111 |
+
- **Natural Language Processing**: Understands complex queries about locations, times, and patterns
|
| 112 |
+
- **Context-Aware Responses**: Maintains conversation history and provides relevant follow-up suggestions
|
| 113 |
+
- **Fuzzy Matching**: Intelligent location search with partial name matching
|
| 114 |
+
- **Query Intent Recognition**: Identifies whether users want statistics, comparisons, or general information
|
| 115 |
+
|
| 116 |
+
### 3. Interactive Visualizations
|
| 117 |
+
- **Peak Hour Analysis**: Dynamic bar charts showing trip distribution across hours
|
| 118 |
+
- **Group Size Patterns**: Pie charts and breakdowns of passenger group sizes
|
| 119 |
+
- **Location Popularity**: Horizontal bar charts of top pickup and dropoff spots
|
| 120 |
+
- **Time Heatmaps**: Day-hour heatmaps revealing temporal patterns
|
| 121 |
+
- **Network Diagrams**: D3.js-powered flow visualizations showing location connections
|
| 122 |
+
|
| 123 |
+
### 4. Modern UI/UX Design
|
| 124 |
+
- **Clean Interface**: Professional design with Inter font family and optimized spacing
|
| 125 |
+
- **Responsive Layout**: Adapts to different screen sizes and devices
|
| 126 |
+
- **Real-time Updates**: Live data refresh and interactive chart updates
|
| 127 |
+
- **Accessibility**: High contrast ratios and semantic markup for screen readers
|
| 128 |
+
|
| 129 |
+
## Query Types Supported
|
| 130 |
+
|
| 131 |
+
The chatbot recognizes and responds to several query patterns:
|
| 132 |
+
|
| 133 |
+
```mermaid
|
| 134 |
+
mindmap
|
| 135 |
+
root((Query Types))
|
| 136 |
+
Location Stats
|
| 137 |
+
Specific venue analysis
|
| 138 |
+
Pickup vs dropoff comparison
|
| 139 |
+
Popular destination ranking
|
| 140 |
+
Time Patterns
|
| 141 |
+
Peak hours identification
|
| 142 |
+
Day-of-week trends
|
| 143 |
+
Seasonal variations
|
| 144 |
+
Group Analysis
|
| 145 |
+
Size distribution
|
| 146 |
+
Large group behavior
|
| 147 |
+
Average party metrics
|
| 148 |
+
General Insights
|
| 149 |
+
Trip summaries
|
| 150 |
+
Overall statistics
|
| 151 |
+
Data overview
|
| 152 |
+
```
|
| 153 |
+
|
| 154 |
+
## Technical Implementation
|
| 155 |
+
|
| 156 |
+
### Query Processing Pipeline
|
| 157 |
+
|
| 158 |
+
```mermaid
|
| 159 |
+
flowchart LR
|
| 160 |
+
A[User Input] --> B[Text Preprocessing]
|
| 161 |
+
B --> C[Pattern Matching]
|
| 162 |
+
C --> D[Parameter Extraction]
|
| 163 |
+
D --> E[Intent Classification]
|
| 164 |
+
E --> F[Data Query]
|
| 165 |
+
F --> G[Response Generation]
|
| 166 |
+
G --> H[Format Output]
|
| 167 |
+
H --> I[Display Result]
|
| 168 |
+
|
| 169 |
+
style A fill:#bbdefb
|
| 170 |
+
style E fill:#c8e6c9
|
| 171 |
+
style G fill:#ffcdd2
|
| 172 |
+
style I fill:#f8bbd9
|
| 173 |
+
```
|
| 174 |
+
|
| 175 |
+
### Data Processing Workflow
|
| 176 |
+
|
| 177 |
+
```mermaid
|
| 178 |
+
graph TD
|
| 179 |
+
A[Raw CSV Data] --> B[Data Validation]
|
| 180 |
+
B --> C[Missing Value Handling]
|
| 181 |
+
C --> D[Feature Extraction]
|
| 182 |
+
D --> E[Temporal Processing]
|
| 183 |
+
D --> F[Location Processing]
|
| 184 |
+
D --> G[Group Classification]
|
| 185 |
+
E --> H[Time Categories]
|
| 186 |
+
F --> I[Address Parsing]
|
| 187 |
+
G --> J[Size Buckets]
|
| 188 |
+
H --> K[Insights Cache]
|
| 189 |
+
I --> K
|
| 190 |
+
J --> K
|
| 191 |
+
K --> L[API Endpoints]
|
| 192 |
+
```
|
| 193 |
+
|
| 194 |
+
## File Structure
|
| 195 |
+
|
| 196 |
+
```
|
| 197 |
+
fetii-ai/
|
| 198 |
+
βββ main.py # Main Streamlit application
|
| 199 |
+
βββ data_processor.py # Core data processing logic
|
| 200 |
+
βββ chatbot_engine.py # Natural language processing
|
| 201 |
+
βββ visualizations.py # Chart generation and styling
|
| 202 |
+
βββ config.py # Configuration and constants
|
| 203 |
+
βββ utils.py # Utility functions
|
| 204 |
+
βββ requirements.txt # Python dependencies
|
| 205 |
+
βββ README.md # This documentation
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
+
## Key Technologies
|
| 209 |
+
|
| 210 |
+
- **Streamlit**: Web application framework for rapid prototyping
|
| 211 |
+
- **Plotly**: Interactive visualization library with modern styling
|
| 212 |
+
- **D3.js**: Advanced network and flow diagram generation
|
| 213 |
+
- **Pandas**: Data manipulation and analysis
|
| 214 |
+
- **NumPy**: Numerical computing for statistical operations
|
| 215 |
+
- **Regular Expressions**: Pattern matching for query parsing
|
| 216 |
+
|
| 217 |
+
## Installation & Setup
|
| 218 |
+
|
| 219 |
+
```bash
|
| 220 |
+
# Clone the repository
|
| 221 |
+
git clone <repository-url>
|
| 222 |
+
cd fetii-ai
|
| 223 |
+
|
| 224 |
+
# Install dependencies
|
| 225 |
+
pip install -r requirements.txt
|
| 226 |
+
|
| 227 |
+
# Run the application
|
| 228 |
+
streamlit run main.py
|
| 229 |
+
```
|
| 230 |
+
|
| 231 |
+
## Configuration Options
|
| 232 |
+
|
| 233 |
+
The system provides extensive configuration through `config.py`:
|
| 234 |
+
|
| 235 |
+
- **Color Schemes**: Modern blue-based palette with accessibility considerations
|
| 236 |
+
- **Chart Settings**: Consistent styling across all visualizations
|
| 237 |
+
- **Query Patterns**: Customizable regex patterns for intent recognition
|
| 238 |
+
- **Data Thresholds**: Adjustable limits for analysis and filtering
|
| 239 |
+
- **UI Components**: Font families, spacing, and responsive breakpoints
|
| 240 |
+
|
| 241 |
+
## Data Schema
|
| 242 |
+
|
| 243 |
+
Expected CSV format:
|
| 244 |
+
```
|
| 245 |
+
Trip ID, Booking User ID, Pick Up Latitude, Pick Up Longitude,
|
| 246 |
+
Drop Off Latitude, Drop Off Longitude, Pick Up Address,
|
| 247 |
+
Drop Off Address, Trip Date and Time, Total Passengers
|
| 248 |
+
```
|
| 249 |
+
|
| 250 |
+
## Advanced Features
|
| 251 |
+
|
| 252 |
+
### Fuzzy Location Matching
|
| 253 |
+
The system implements intelligent location search that handles:
|
| 254 |
+
- Exact name matches
|
| 255 |
+
- Partial string matching
|
| 256 |
+
- Word-based similarity
|
| 257 |
+
- Common abbreviation recognition
|
| 258 |
+
|
| 259 |
+
### Context-Aware Responses
|
| 260 |
+
Chatbot responses adapt based on:
|
| 261 |
+
- Previous conversation history
|
| 262 |
+
- Query complexity level
|
| 263 |
+
- Available data completeness
|
| 264 |
+
- User expertise inference
|
| 265 |
+
|
| 266 |
+
### Performance Optimizations
|
| 267 |
+
- Data caching for repeated queries
|
| 268 |
+
- Efficient pandas operations
|
| 269 |
+
- Lazy loading of visualizations
|
| 270 |
+
- Memory-conscious data processing
|
| 271 |
+
|
| 272 |
+
## Future Enhancements
|
| 273 |
+
|
| 274 |
+
- Machine learning predictions for trip demand
|
| 275 |
+
- Real-time data streaming integration
|
| 276 |
+
- Advanced geographic clustering
|
| 277 |
+
- Multi-city dataset support
|
| 278 |
+
- Export capabilities for reports
|
| 279 |
+
- API endpoints for external integration
|
| 280 |
+
|
| 281 |
+
## Contributing
|
| 282 |
+
|
| 283 |
+
When contributing to this project:
|
| 284 |
+
1. Follow the established code structure and naming conventions
|
| 285 |
+
2. Update visualizations to maintain consistent styling
|
| 286 |
+
3. Test query patterns thoroughly with various input formats
|
| 287 |
+
4. Ensure responsive design principles are maintained
|
| 288 |
+
5. Document any new configuration options
|
| 289 |
+
|
| 290 |
+
## License
|
| 291 |
+
|
| 292 |
+
This project is designed for analytics and insights generation. Ensure compliance with data privacy regulations when processing real rideshare data.
|