Yash Sakhale commited on
Commit
329b91e
·
0 Parent(s):

Initial commit: Python Dependency Compatibility Board with ML and LLM features

Browse files
.gitignore ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ *.egg-info/
8
+ dist/
9
+ build/
10
+
11
+ # Virtual environments
12
+ venv/
13
+ env/
14
+ ENV/
15
+
16
+ # IDE
17
+ .vscode/
18
+ .idea/
19
+ *.swp
20
+ *.swo
21
+
22
+ # OS
23
+ .DS_Store
24
+ Thumbs.db
25
+
26
+ # Training scripts and data (not needed for deployment)
27
+ train_conflict_model.py
28
+ generate_embeddings.py
29
+ Synthetic data.py
30
+ validation_tools.py
31
+ scripts/
32
+ synthetic_requirements_txt/
33
+ synthetic_requirements_dataset.json
34
+
35
+ # Problem3 folder (separate project)
36
+ problem3/
37
+
38
+ # Temporary files
39
+ *.tmp
40
+ *.log
41
+
42
+ # Model files (optional - include if you want to deploy with models)
43
+ # Uncomment the line below if you DON'T want to include trained models
44
+ # models/
45
+
ML_MODELS_README.md ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ML Models Integration Guide
2
+
3
+ This document explains how to train and use the ML models for conflict prediction and package similarity.
4
+
5
+ ## Overview
6
+
7
+ The project includes two ML models:
8
+
9
+ 1. **Conflict Prediction Model**: A Random Forest classifier that predicts whether a set of dependencies will have conflicts
10
+ 2. **Package Embeddings**: Pre-computed semantic embeddings for common Python packages for similarity matching
11
+
12
+ ## Training the Models
13
+
14
+ ### Step 1: Install Training Dependencies
15
+
16
+ ```bash
17
+ pip install scikit-learn sentence-transformers numpy
18
+ ```
19
+
20
+ ### Step 2: Train Conflict Prediction Model
21
+
22
+ ```bash
23
+ cd "code to upload"
24
+ python train_conflict_model.py
25
+ ```
26
+
27
+ This will:
28
+ - Load the synthetic dataset (`synthetic_requirements_dataset.json`)
29
+ - Extract features from requirements
30
+ - Train a Random Forest classifier
31
+ - Save the model to `models/conflict_predictor.pkl`
32
+ - Display accuracy and feature importance
33
+
34
+ **Expected Output:**
35
+ - Model size: ~2-5 MB
36
+ - Test accuracy: ~85-95% (depending on dataset)
37
+
38
+ ### Step 3: Generate Package Embeddings
39
+
40
+ ```bash
41
+ python generate_embeddings.py
42
+ ```
43
+
44
+ This will:
45
+ - Load a sentence transformer model
46
+ - Generate embeddings for common Python packages
47
+ - Save embeddings to `models/package_embeddings.json`
48
+ - Save model info to `models/embedding_info.json`
49
+
50
+ **Expected Output:**
51
+ - Embeddings file: ~5-10 MB
52
+ - Embedding dimension: 384
53
+ - Number of packages: ~100+
54
+
55
+ ## Model Files Structure
56
+
57
+ After training, you should have:
58
+
59
+ ```
60
+ code to upload/
61
+ ├── models/
62
+ │ ├── conflict_predictor.pkl # Classification model
63
+ │ ├── package_embeddings.json # Pre-computed embeddings
64
+ │ └── embedding_info.json # Model metadata
65
+ ```
66
+
67
+ ## Integration in Main App
68
+
69
+ The models are automatically loaded when available:
70
+
71
+ 1. **Conflict Prediction**: Runs before detailed analysis to provide early warnings
72
+ 2. **Package Similarity**: Enhances spell-checking with semantic matching
73
+
74
+ ### Features
75
+
76
+ - **Graceful Fallback**: If models aren't available, the app works with rule-based methods
77
+ - **Lazy Loading**: Models load only when needed
78
+ - **Error Handling**: ML failures don't break the app
79
+
80
+ ## Usage in Code
81
+
82
+ ### Conflict Prediction
83
+
84
+ ```python
85
+ from ml_models import ConflictPredictor
86
+
87
+ predictor = ConflictPredictor()
88
+ has_conflict, confidence = predictor.predict(requirements_text)
89
+
90
+ if has_conflict:
91
+ print(f"Conflict predicted with {confidence:.1%} confidence")
92
+ ```
93
+
94
+ ### Package Similarity
95
+
96
+ ```python
97
+ from ml_models import PackageEmbeddings
98
+
99
+ embeddings = PackageEmbeddings()
100
+ similar = embeddings.find_similar("numpyy", top_k=3)
101
+ # Returns: [('numpy', 0.95), ('scipy', 0.72), ...]
102
+
103
+ best_match = embeddings.get_best_match("pandaz")
104
+ # Returns: 'pandas'
105
+ ```
106
+
107
+ ## Hugging Face Spaces Deployment
108
+
109
+ ### Option 1: Include Models in Repo
110
+
111
+ 1. Train models locally
112
+ 2. Commit model files to the repo
113
+ 3. Models load automatically on Spaces
114
+
115
+ **Pros**: Simple, no external dependencies
116
+ **Cons**: Larger repo size (~10-15 MB)
117
+
118
+ ### Option 2: Upload to Hugging Face Hub
119
+
120
+ 1. Train models locally
121
+ 2. Upload to Hugging Face Hub:
122
+ ```python
123
+ from huggingface_hub import upload_file
124
+ upload_file("models/conflict_predictor.pkl", repo_id="your-username/conflict-predictor")
125
+ ```
126
+ 3. Load from Hub in app:
127
+ ```python
128
+ from huggingface_hub import hf_hub_download
129
+ model_path = hf_hub_download(repo_id="your-username/conflict-predictor", filename="conflict_predictor.pkl")
130
+ ```
131
+
132
+ **Pros**: Smaller repo, version control for models
133
+ **Cons**: Requires internet connection at startup
134
+
135
+ ## Performance
136
+
137
+ - **Conflict Prediction**: <10ms per prediction
138
+ - **Embedding Lookup**: <1ms (pre-computed) or ~50ms (on-the-fly)
139
+ - **Model Loading**: ~1-2 seconds at startup
140
+
141
+ ## Troubleshooting
142
+
143
+ ### Models Not Loading
144
+
145
+ - Check that `models/` directory exists
146
+ - Verify model files are present
147
+ - Check file permissions
148
+
149
+ ### Low Prediction Accuracy
150
+
151
+ - Retrain with more data
152
+ - Adjust feature engineering
153
+ - Try different model parameters
154
+
155
+ ### Embeddings Not Working
156
+
157
+ - Ensure `sentence-transformers` is installed
158
+ - Check internet connection (for first-time model download)
159
+ - Verify embeddings file format
160
+
161
+ ## Future Improvements
162
+
163
+ - [ ] Train on larger, real-world dataset
164
+ - [ ] Add version-specific embeddings
165
+ - [ ] Implement online learning
166
+ - [ ] Add confidence intervals
167
+ - [ ] Support for custom model paths
168
+
README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Python Dependency Compatibility Board
3
+ emoji: 🐍
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: gradio
7
+ sdk_version: 4.0.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ # 🐍 Python Dependency Compatibility Board
14
+
15
+ A powerful tool to analyze and resolve Python package dependencies. Check for version conflicts, compatibility issues, and generate clean `requirements.txt` files.
16
+
17
+ ## ✨ Features
18
+
19
+ - **Multiple Input Methods**: Library list, requirements.txt paste, or file upload
20
+ - **Conflict Detection**: Automatically detects version conflicts and compatibility issues
21
+ - **🤖 AI-Powered Explanations**: Uses LLM to generate intelligent, natural language explanations for conflicts (with fallback to rule-based)
22
+ - **Dependency Resolution**: Uses pip's resolver to find compatible versions
23
+ - **Environment Aware**: Configure Python version, device (CPU/GPU), and OS
24
+ - **Analysis Modes**: Quick (top-level) or Deep (with transitive dependencies)
25
+ - **Resolution Strategies**: Latest compatible, stable/pinned, keep existing, or minimal changes
26
+ - **Spell Checking**: Auto-corrects common spelling mistakes in package names
27
+ - **Validation Utilities**: Benchmark against the bundled synthetic dataset and generate perturbed requirements for stress testing
28
+
29
+ ## 🚀 How to Use
30
+
31
+ ### Input Your Dependencies
32
+
33
+ You can provide dependencies in three ways:
34
+
35
+ 1. **Library List**: Enter package names one per line
36
+ ```
37
+ pandas
38
+ torch
39
+ langchain
40
+ fastapi
41
+ ```
42
+
43
+ 2. **Requirements Text**: Paste your existing requirements.txt
44
+ ```
45
+ pandas==2.0.3
46
+ torch>=2.0.0
47
+ langchain==0.1.0
48
+ ```
49
+
50
+ 3. **File Upload**: Upload a requirements.txt file directly
51
+
52
+ ### Configure Environment
53
+
54
+ - **Python Version**: Select your target Python version (3.8-3.12)
55
+ - **Device**: CPU only, NVIDIA GPU (CUDA), Apple Silicon (MPS), or Custom
56
+ - **Operating System**: Any, Linux, Windows, or macOS
57
+
58
+ ### Analysis & Resolution
59
+
60
+ 1. Choose **Analysis Mode**:
61
+ - **Quick**: Fast analysis of top-level dependencies
62
+ - **Deep**: Complete dependency tree with transitive dependencies
63
+
64
+ 2. Select **Resolution Strategy**:
65
+ - **latest_compatible**: Resolve to latest compatible versions
66
+ - **stable/pinned**: Prefer stable, pinned versions
67
+ - **keep_existing_pins**: Preserve your existing version pins
68
+ - **minimal_changes**: Make minimal changes to resolve conflicts
69
+
70
+ 3. Click **"Analyze & Resolve Dependencies"**
71
+
72
+ 4. Review the results and download your resolved `requirements.txt`
73
+
74
+ ## 🔍 What It Detects
75
+
76
+ The tool automatically detects:
77
+
78
+ - **Duplicate Packages**: Same package specified multiple times with conflicting versions
79
+ - **PyTorch Compatibility**: Ensures pytorch-lightning>=2.0 works with torch>=2.0
80
+ - **FastAPI/Pydantic**: Checks version compatibility (e.g., fastapi 0.78.x requires pydantic v1)
81
+ - **TensorFlow/Keras**: Validates TensorFlow/Keras version pairs
82
+ - **Version Conflicts**: Identifies incompatible version specifications
83
+
84
+ ## 🤖 AI Explanations
85
+
86
+ When enabled, the tool uses LLM reasoning to provide:
87
+ - **Clear Explanations**: Natural language descriptions of what the conflict is
88
+ - **Why It Happens**: Technical reasons behind the conflict
89
+ - **How to Fix**: Actionable solutions with specific version recommendations
90
+
91
+ The LLM explanations use Hugging Face Inference API (free tier) and automatically fall back to rule-based explanations if the API is unavailable.
92
+
93
+ ## 📋 Example
94
+
95
+ **Input:**
96
+ ```
97
+ torch==1.8.0
98
+ pytorch-lightning==2.2.0
99
+ pandas==2.0.3
100
+ ```
101
+
102
+ **Output:**
103
+ ```
104
+ ⚠️ Compatibility Issues Found:
105
+ - pytorch-lightning>=2.0 requires torch>=2.0, but torch<2.0 is specified
106
+
107
+ Resolved requirements.txt:
108
+ torch==2.1.0
109
+ pytorch-lightning==2.2.0
110
+ pandas==2.0.3
111
+ ...
112
+ ```
113
+
114
+ ## 🛠️ Technical Details
115
+
116
+ - Built with [Gradio](https://gradio.app/)
117
+ - Uses `packaging` library for version parsing
118
+ - Leverages pip's dependency resolver
119
+ - Supports PEP 508 requirement specifications
120
+
121
+ ## 📝 Notes
122
+
123
+ - Full dependency resolution requires pip >= 22.2
124
+ - Deep mode may take longer for large dependency sets
125
+ - The tool works best with packages available on PyPI
126
+ - Platform-specific dependencies (e.g., CUDA) are detected but resolution may vary
127
+ - Run `python validation_tools.py` to benchmark the built-in compatibility checks against synthetic cases.
128
+ - Use `python scripts/perturb_requirements.py --help` to generate noisy/invalid requirements for robustness testing.
129
+
130
+ ## 🤝 Contributing
131
+
132
+ Feel free to test the tool and report any issues! This tool is designed to help developers manage Python dependencies more effectively.
133
+
134
+ ## 📄 License
135
+
136
+ MIT License - feel free to use and modify as needed.
137
+
138
+ ---
139
+
140
+ **Made with ❤️ for the Python community**
app.py ADDED
@@ -0,0 +1,1030 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Python Dependency Compatibility Board
3
+ A tool to parse, analyze, and resolve Python package dependencies.
4
+ """
5
+
6
+ import re
7
+ import json
8
+ import tempfile
9
+ import subprocess
10
+ from pathlib import Path
11
+ from typing import List, Dict, Tuple, Optional, Set
12
+ from difflib import get_close_matches
13
+ import requests
14
+ from packaging.requirements import Requirement
15
+ from packaging.specifiers import SpecifierSet
16
+ from packaging.version import Version
17
+
18
+ # Import ML models (with graceful fallback)
19
+ try:
20
+ from ml_models import ConflictPredictor, PackageEmbeddings
21
+ ML_AVAILABLE = True
22
+ except ImportError:
23
+ ML_AVAILABLE = False
24
+ print("Warning: ML models not available. Some features will be disabled.")
25
+
26
+
27
+ class DependencyParser:
28
+ """Parse requirements.txt and library lists into structured dependencies."""
29
+
30
+ @staticmethod
31
+ def parse_requirements_text(text: str) -> List[Dict]:
32
+ """Parse requirements.txt content into structured format."""
33
+ dependencies = []
34
+ seen_packages = {}
35
+
36
+ for line in text.strip().split('\n'):
37
+ line = line.strip()
38
+ if not line or line.startswith('#'):
39
+ continue
40
+
41
+ # Remove comments
42
+ if '#' in line:
43
+ line = line[:line.index('#')].strip()
44
+
45
+ try:
46
+ req = Requirement(line)
47
+ package_name = req.name.lower()
48
+
49
+ # Handle duplicate packages
50
+ if package_name in seen_packages:
51
+ # Merge or warn about duplicates
52
+ existing = seen_packages[package_name]
53
+ if existing['specifier'] != str(req.specifier):
54
+ dependencies.append({
55
+ 'package': package_name,
56
+ 'specifier': str(req.specifier) if req.specifier else '',
57
+ 'extras': list(req.extras) if req.extras else [],
58
+ 'marker': str(req.marker) if req.marker else '',
59
+ 'original': line,
60
+ 'conflict': f"Duplicate: {existing['original']} vs {line}"
61
+ })
62
+ continue
63
+
64
+ dep = {
65
+ 'package': package_name,
66
+ 'specifier': str(req.specifier) if req.specifier else '',
67
+ 'extras': list(req.extras) if req.extras else [],
68
+ 'marker': str(req.marker) if req.marker else '',
69
+ 'original': line,
70
+ 'conflict': None
71
+ }
72
+ dependencies.append(dep)
73
+ seen_packages[package_name] = dep
74
+ except Exception as e:
75
+ # Handle malformed lines
76
+ dependencies.append({
77
+ 'package': line.split('==')[0].split('>=')[0].split('<=')[0].split('[')[0].strip(),
78
+ 'specifier': '',
79
+ 'extras': [],
80
+ 'marker': '',
81
+ 'original': line,
82
+ 'conflict': f"Parse error: {str(e)}"
83
+ })
84
+
85
+ return dependencies
86
+
87
+ @staticmethod
88
+ def parse_library_list(text: str) -> List[Dict]:
89
+ """Parse a simple list of library names."""
90
+ dependencies = []
91
+ for line in text.strip().split('\n'):
92
+ line = line.strip()
93
+ if not line or line.startswith('#'):
94
+ continue
95
+
96
+ # Extract package name (remove version specifiers if present)
97
+ package_name = re.split(r'[<>=!]', line)[0].strip()
98
+ package_name = re.split(r'\[', package_name)[0].strip()
99
+
100
+ if package_name:
101
+ dependencies.append({
102
+ 'package': package_name.lower(),
103
+ 'specifier': '',
104
+ 'extras': [],
105
+ 'marker': '',
106
+ 'original': package_name,
107
+ 'conflict': None
108
+ })
109
+
110
+ return dependencies
111
+
112
+
113
+ class DependencyResolver:
114
+ """Resolve dependencies and check compatibility."""
115
+
116
+ def __init__(self, python_version: str = "3.10", platform: str = "any", device: str = "cpu"):
117
+ self.python_version = python_version
118
+ self.platform = platform
119
+ self.device = device
120
+
121
+ def build_dependency_graph(self, dependencies: List[Dict], deep_mode: bool = False) -> Dict:
122
+ """Build dependency graph (simplified - in production would query PyPI)."""
123
+ graph = {
124
+ 'nodes': {},
125
+ 'edges': [],
126
+ 'conflicts': []
127
+ }
128
+
129
+ for dep in dependencies:
130
+ package = dep['package']
131
+ graph['nodes'][package] = {
132
+ 'specifier': dep['specifier'],
133
+ 'extras': dep['extras'],
134
+ 'marker': dep['marker'],
135
+ 'conflict': dep.get('conflict')
136
+ }
137
+
138
+ if dep.get('conflict'):
139
+ graph['conflicts'].append({
140
+ 'package': package,
141
+ 'reason': dep['conflict']
142
+ })
143
+
144
+ # In deep mode, would fetch transitive dependencies from PyPI
145
+ # For now, we'll use a simplified approach
146
+
147
+ return graph
148
+
149
+ def check_compatibility(self, graph: Dict) -> Tuple[bool, List[str]]:
150
+ """Check version compatibility across the graph."""
151
+ issues = []
152
+
153
+ # Check for duplicate package conflicts
154
+ for conflict in graph['conflicts']:
155
+ issues.append(f"Conflict in {conflict['package']}: {conflict['reason']}")
156
+
157
+ # Check known compatibility issues
158
+ nodes = graph['nodes']
159
+
160
+ # PyTorch Lightning + PyTorch compatibility
161
+ if 'pytorch-lightning' in nodes and 'torch' in nodes:
162
+ pl_spec = nodes['pytorch-lightning']['specifier']
163
+ torch_spec = nodes['torch']['specifier']
164
+
165
+ # Simplified check - in production would parse versions properly
166
+ if '==2.' in pl_spec or '>=2.' in pl_spec:
167
+ if '==1.' in torch_spec or ('<2.' in torch_spec and '==1.' in torch_spec):
168
+ issues.append("pytorch-lightning>=2.0 requires torch>=2.0, but torch<2.0 is specified")
169
+
170
+ # FastAPI + Pydantic compatibility
171
+ if 'fastapi' in nodes and 'pydantic' in nodes:
172
+ fastapi_spec = nodes['fastapi']['specifier']
173
+ pydantic_spec = nodes['pydantic']['specifier']
174
+
175
+ if '==0.78' in fastapi_spec or '==0.7' in fastapi_spec:
176
+ if '==2.' in pydantic_spec or '>=2.' in pydantic_spec:
177
+ issues.append("fastapi==0.78.x requires pydantic v1, but pydantic v2 is specified")
178
+
179
+ # TensorFlow + Keras compatibility
180
+ if 'tensorflow' in nodes and 'keras' in nodes:
181
+ tf_spec = nodes['tensorflow']['specifier']
182
+ keras_spec = nodes['keras']['specifier']
183
+
184
+ if '==1.' in tf_spec:
185
+ if '==3.' in keras_spec or '>=3.' in keras_spec:
186
+ issues.append("keras>=3.0 requires TensorFlow 2.x, but TensorFlow 1.x is specified")
187
+
188
+ return len(issues) == 0, issues
189
+
190
+ def resolve_dependencies(
191
+ self,
192
+ dependencies: List[Dict],
193
+ strategy: str = "latest_compatible"
194
+ ) -> Tuple[str, List[str]]:
195
+ """Resolve dependencies using specified strategy."""
196
+ # Remove duplicates and conflicts
197
+ seen_packages = {}
198
+ clean_dependencies = []
199
+
200
+ for dep in dependencies:
201
+ if dep.get('conflict'):
202
+ continue
203
+
204
+ package = dep['package']
205
+ if package in seen_packages:
206
+ # Keep the one with more specific version if available
207
+ existing = seen_packages[package]
208
+ if dep['specifier'] and not existing['specifier']:
209
+ clean_dependencies.remove(existing)
210
+ clean_dependencies.append(dep)
211
+ seen_packages[package] = dep
212
+ continue
213
+
214
+ clean_dependencies.append(dep)
215
+ seen_packages[package] = dep
216
+
217
+ # Create a temporary requirements file
218
+ with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as f:
219
+ req_lines = []
220
+ for dep in clean_dependencies:
221
+ req_lines.append(dep['original'])
222
+ f.write('\n'.join(req_lines))
223
+ temp_req_file = f.name
224
+
225
+ warnings = []
226
+
227
+ try:
228
+ # Try using pip's resolver with --dry-run and --report (pip 22.2+)
229
+ result = subprocess.run(
230
+ ['pip', 'install', '--dry-run', '--report', '-', '-r', temp_req_file],
231
+ capture_output=True,
232
+ text=True,
233
+ timeout=60
234
+ )
235
+
236
+ if result.returncode == 0 and result.stdout.strip():
237
+ # Parse the JSON report
238
+ try:
239
+ report = json.loads(result.stdout)
240
+ resolved = []
241
+ for package in report.get('install', []):
242
+ name = package.get('metadata', {}).get('name', '')
243
+ version = package.get('metadata', {}).get('version', '')
244
+ if name and version:
245
+ resolved.append(f"{name}=={version}")
246
+
247
+ if resolved:
248
+ return '\n'.join(sorted(resolved)), warnings
249
+ except json.JSONDecodeError:
250
+ warnings.append("Could not parse pip resolution report. Using original requirements.")
251
+ except Exception as e:
252
+ warnings.append(f"Error parsing resolution: {str(e)}")
253
+
254
+ # Fallback: try pip-compile if available
255
+ try:
256
+ result = subprocess.run(
257
+ ['pip-compile', '--dry-run', '--output-file', '-', temp_req_file],
258
+ capture_output=True,
259
+ text=True,
260
+ timeout=60
261
+ )
262
+ if result.returncode == 0:
263
+ return result.stdout.strip(), warnings
264
+ except FileNotFoundError:
265
+ pass
266
+ except Exception:
267
+ pass
268
+
269
+ # Final fallback: return cleaned original requirements
270
+ resolved_lines = []
271
+ for dep in clean_dependencies:
272
+ line = dep['original']
273
+ # Apply strategy-based modifications
274
+ if strategy == "stable/pinned" and not dep['specifier']:
275
+ # In a real implementation, would query PyPI for latest stable
276
+ line = f"{dep['package']} # Version not specified"
277
+ elif strategy == "keep_existing_pins":
278
+ # Keep as-is
279
+ pass
280
+ resolved_lines.append(line)
281
+
282
+ if not warnings:
283
+ warnings.append("Using original requirements. For full resolution, ensure pip>=22.2 is installed.")
284
+
285
+ return '\n'.join(resolved_lines), warnings
286
+
287
+ except subprocess.TimeoutExpired:
288
+ warnings.append("Resolution timed out. Showing original requirements.")
289
+ return '\n'.join([d['original'] for d in clean_dependencies]), warnings
290
+ except Exception as e:
291
+ warnings.append(f"Resolution error: {str(e)}")
292
+ return '\n'.join([d['original'] for d in clean_dependencies]), warnings
293
+ finally:
294
+ Path(temp_req_file).unlink(missing_ok=True)
295
+
296
+
297
+ class CatalogValidator:
298
+ """Validate package names against a simple ground-truth catalog."""
299
+
300
+ def __init__(self, catalog_path: Path = Path("data/package_name_catalog.json"), use_ml: bool = True):
301
+ self.catalog_path = catalog_path
302
+ self.valid_packages: Set[str] = set()
303
+ self.invalid_packages: Set[str] = set()
304
+ self.use_ml = use_ml and ML_AVAILABLE
305
+ self.embeddings = None
306
+
307
+ self._load_catalog()
308
+
309
+ # Load embeddings if available
310
+ if self.use_ml:
311
+ try:
312
+ self.embeddings = PackageEmbeddings()
313
+ except Exception as e:
314
+ print(f"Warning: Could not load embeddings: {e}")
315
+ self.use_ml = False
316
+
317
+ def _load_catalog(self) -> None:
318
+ if not self.catalog_path.exists():
319
+ return
320
+ try:
321
+ data = json.loads(self.catalog_path.read_text())
322
+ self.valid_packages = {p.lower() for p in data.get("valid_packages", [])}
323
+ self.invalid_packages = {p.lower() for p in data.get("invalid_packages", [])}
324
+ except Exception as exc:
325
+ # Keep going even if catalog is malformed
326
+ print(f"Warning: could not read catalog {self.catalog_path}: {exc}")
327
+
328
+ def suggest_correction(self, package_name: str, cutoff: float = 0.6) -> Optional[str]:
329
+ """Suggest a corrected package name using fuzzy matching and embeddings."""
330
+ if not self.valid_packages:
331
+ return None
332
+
333
+ package_lower = package_name.lower()
334
+
335
+ # If it's already valid, no correction needed
336
+ if package_lower in self.valid_packages:
337
+ return None
338
+
339
+ # Try ML-based embedding similarity first (more accurate)
340
+ if self.use_ml and self.embeddings:
341
+ try:
342
+ best_match = self.embeddings.get_best_match(package_name, threshold=0.7)
343
+ if best_match and best_match in self.valid_packages:
344
+ return best_match
345
+ except Exception:
346
+ pass
347
+
348
+ # Fallback to fuzzy matching
349
+ matches = get_close_matches(
350
+ package_lower,
351
+ list(self.valid_packages),
352
+ n=1,
353
+ cutoff=cutoff
354
+ )
355
+
356
+ if matches:
357
+ return matches[0]
358
+ return None
359
+
360
+ def check_and_correct_packages(self, dependencies: List[Dict], auto_correct: bool = True) -> Tuple[List[Dict], List[str]]:
361
+ """Check packages and optionally correct spelling mistakes.
362
+
363
+ Returns:
364
+ Tuple of (corrected_dependencies, warnings)
365
+ """
366
+ corrected_deps = []
367
+ warnings: List[str] = []
368
+ seen: Set[str] = set()
369
+ max_warnings = 15
370
+
371
+ for dep in dependencies:
372
+ package = dep["package"]
373
+ package_lower = package.lower()
374
+
375
+ if package_lower in seen:
376
+ corrected_deps.append(dep)
377
+ continue
378
+ seen.add(package_lower)
379
+
380
+ # Check if it's explicitly invalid
381
+ if self.invalid_packages and package_lower in self.invalid_packages:
382
+ warnings.append(f"Package '{package}' is flagged as invalid in the catalog.")
383
+ if len(warnings) >= max_warnings:
384
+ corrected_deps.append(dep)
385
+ continue
386
+
387
+ # Try to suggest a correction
388
+ suggestion = self.suggest_correction(package)
389
+ if suggestion:
390
+ if auto_correct:
391
+ corrected_dep = dep.copy()
392
+ corrected_dep['package'] = suggestion
393
+ corrected_dep['original'] = corrected_dep['original'].replace(package, suggestion, 1)
394
+ corrected_deps.append(corrected_dep)
395
+ warnings.append(f" → Auto-corrected to '{suggestion}'")
396
+ else:
397
+ warnings.append(f" → Did you mean '{suggestion}'?")
398
+ else:
399
+ corrected_deps.append(dep)
400
+ continue
401
+
402
+ # Check if it's not in valid catalog and suggest correction
403
+ if self.valid_packages and package_lower not in self.valid_packages:
404
+ suggestion = self.suggest_correction(package)
405
+ if suggestion:
406
+ if auto_correct:
407
+ corrected_dep = dep.copy()
408
+ corrected_dep['package'] = suggestion
409
+ corrected_dep['original'] = corrected_dep['original'].replace(package, suggestion, 1)
410
+ corrected_deps.append(corrected_dep)
411
+ warnings.append(f"Package '{package}' not found. Auto-corrected to '{suggestion}'")
412
+ else:
413
+ warnings.append(f"Package '{package}' not found. Did you mean '{suggestion}'?")
414
+ if len(warnings) >= max_warnings:
415
+ break
416
+ else:
417
+ warnings.append(
418
+ f"Package '{package}' is not in the curated valid catalog. Check for typos or private packages."
419
+ )
420
+ corrected_deps.append(dep)
421
+ if len(warnings) >= max_warnings:
422
+ break
423
+ else:
424
+ # Package is valid, keep as-is
425
+ corrected_deps.append(dep)
426
+
427
+ if len(warnings) >= max_warnings:
428
+ warnings.append("Additional potential catalog issues omitted for brevity.")
429
+
430
+ return corrected_deps, warnings
431
+
432
+ def check_packages(self, dependencies: List[Dict]) -> List[str]:
433
+ """Return warnings for packages that look suspicious or explicitly invalid."""
434
+ _, warnings = self.check_and_correct_packages(dependencies, auto_correct=False)
435
+ return warnings
436
+
437
+
438
+ class ExplanationEngine:
439
+ """Generate intelligent explanations for dependency conflicts using LLM."""
440
+
441
+ def __init__(self, use_llm: bool = True):
442
+ """
443
+ Initialize explanation engine.
444
+
445
+ Args:
446
+ use_llm: If True, uses Hugging Face Inference API (free tier)
447
+ If False, uses rule-based explanations only
448
+ """
449
+ self.use_llm = use_llm
450
+ # Using Hugging Face Inference API (free tier)
451
+ self.api_url = "https://api-inference.huggingface.co/models/gpt2"
452
+ self.headers = {"Content-Type": "application/json"}
453
+
454
+ def generate_explanation(self, conflict: Dict, dependencies: List[Dict]) -> Dict:
455
+ """
456
+ Generate a detailed explanation for a conflict.
457
+
458
+ Args:
459
+ conflict: Conflict dictionary with type, packages, message, etc.
460
+ dependencies: Full list of dependencies for context
461
+
462
+ Returns:
463
+ Dictionary with explanation, why_it_happens, how_to_fix
464
+ """
465
+ # Build context about the conflict
466
+ conflict_type = conflict.get('type', 'unknown')
467
+ packages = conflict.get('packages', [conflict.get('package', 'unknown')])
468
+ message = conflict.get('message', '')
469
+ details = conflict.get('details', {})
470
+
471
+ # Create prompt for LLM
472
+ prompt = self._create_prompt(conflict, dependencies)
473
+
474
+ # Get LLM explanation
475
+ explanation_text = self._call_llm(prompt) if self.use_llm else self._fallback_explanation(prompt)
476
+
477
+ # Parse and structure the explanation
478
+ return {
479
+ 'summary': message,
480
+ 'explanation': explanation_text,
481
+ 'why_it_happens': self._extract_why(explanation_text, conflict),
482
+ 'how_to_fix': self._extract_fix(explanation_text, conflict),
483
+ 'packages_involved': packages,
484
+ 'severity': conflict.get('severity', 'medium')
485
+ }
486
+
487
+ def _create_prompt(self, conflict: Dict, dependencies: List[Dict]) -> str:
488
+ """Create a prompt for the LLM."""
489
+ conflict_type = conflict.get('type', 'unknown')
490
+ packages = conflict.get('packages', [conflict.get('package', 'unknown')])
491
+ message = conflict.get('message', '')
492
+ details = conflict.get('details', {})
493
+
494
+ # Get relevant dependency info
495
+ relevant_deps = [d for d in dependencies if d['package'] in packages]
496
+
497
+ prompt = f"""You are a Python dependency expert. Explain this dependency conflict clearly:
498
+
499
+ Conflict: {message}
500
+ Type: {conflict_type}
501
+ Packages involved: {', '.join(packages)}
502
+
503
+ Dependency details:
504
+ """
505
+ for dep in relevant_deps:
506
+ prompt += f"- {dep['package']}: {dep['specifier'] or 'no version specified'}\n"
507
+
508
+ if details:
509
+ prompt += f"\nVersion constraints: {json.dumps(details)}\n"
510
+
511
+ prompt += """
512
+ Provide a clear, concise explanation that:
513
+ 1. Explains what the conflict is in simple terms
514
+ 2. Explains why this conflict happens (technical reason)
515
+ 3. Suggests how to fix it (specific version recommendations)
516
+
517
+ Keep it under 150 words and use plain language.
518
+ """
519
+ return prompt
520
+
521
+ def _call_llm(self, prompt: str) -> str:
522
+ """
523
+ Call LLM API to generate explanation.
524
+ Falls back to rule-based explanation if API fails.
525
+ """
526
+ try:
527
+ # Try Hugging Face Inference API (free tier)
528
+ payload = {
529
+ "inputs": prompt,
530
+ "parameters": {
531
+ "max_new_tokens": 200,
532
+ "temperature": 0.7,
533
+ "return_full_text": False
534
+ }
535
+ }
536
+
537
+ response = requests.post(
538
+ self.api_url,
539
+ headers=self.headers,
540
+ json=payload,
541
+ timeout=10
542
+ )
543
+
544
+ if response.status_code == 200:
545
+ result = response.json()
546
+ if isinstance(result, list) and len(result) > 0:
547
+ generated_text = result[0].get('generated_text', '')
548
+ if generated_text:
549
+ return generated_text.strip()
550
+
551
+ # If API fails, fall back to rule-based
552
+ return self._fallback_explanation(prompt)
553
+
554
+ except Exception as e:
555
+ # Fall back to rule-based explanation
556
+ return self._fallback_explanation(prompt)
557
+
558
+ def _fallback_explanation(self, prompt: str) -> str:
559
+ """Generate rule-based explanation when LLM is unavailable."""
560
+ # Extract key info from prompt
561
+ if "pytorch-lightning" in prompt.lower() and "torch" in prompt.lower():
562
+ return """PyTorch Lightning 2.0+ requires PyTorch 2.0 or higher because it uses new PyTorch APIs and features that don't exist in version 1.x. The conflict happens because you're trying to use a newer version of PyTorch Lightning with an older version of PyTorch. To fix this, either upgrade PyTorch to 2.0+ or downgrade PyTorch Lightning to 1.x."""
563
+
564
+ elif "fastapi" in prompt.lower() and "pydantic" in prompt.lower():
565
+ return """FastAPI 0.78.x was built for Pydantic v1, which has a different API than Pydantic v2. The conflict occurs because Pydantic v2 introduced breaking changes that FastAPI 0.78 doesn't support. To fix this, either upgrade FastAPI to 0.99+ (which supports Pydantic v2) or downgrade Pydantic to v1.x."""
566
+
567
+ elif "tensorflow" in prompt.lower() and "keras" in prompt.lower():
568
+ return """Keras 3.0+ requires TensorFlow 2.x because it was redesigned to work with TensorFlow 2's eager execution and new features. TensorFlow 1.x uses a different execution model that Keras 3.0 doesn't support. To fix this, upgrade TensorFlow to 2.x or downgrade Keras to 2.x."""
569
+
570
+ elif "duplicate" in prompt.lower():
571
+ return """You have the same package specified multiple times with different versions. This creates ambiguity about which version should be installed. To fix this, remove duplicate entries and keep only one version specification per package."""
572
+
573
+ else:
574
+ return """This dependency conflict occurs due to incompatible version requirements between packages. Review the version constraints and ensure all packages are compatible with each other. Consider updating to compatible versions or using a dependency resolver."""
575
+
576
+ def _extract_why(self, explanation: str, conflict: Dict) -> str:
577
+ """Extract the 'why it happens' part from explanation."""
578
+ # Simple extraction - look for sentences explaining the reason
579
+ sentences = explanation.split('.')
580
+ why_sentences = [s.strip() for s in sentences if any(word in s.lower() for word in ['because', 'due to', 'requires', 'needs', 'since'])]
581
+ return '. '.join(why_sentences[:2]) + '.' if why_sentences else "Version constraints are incompatible."
582
+
583
+ def _extract_fix(self, explanation: str, conflict: Dict) -> str:
584
+ """Extract the 'how to fix' part from explanation."""
585
+ # Simple extraction - look for fix suggestions
586
+ sentences = explanation.split('.')
587
+ fix_sentences = [s.strip() for s in sentences if any(word in s.lower() for word in ['upgrade', 'downgrade', 'fix', 'change', 'update', 'remove'])]
588
+ return '. '.join(fix_sentences[:2]) + '.' if fix_sentences else "Adjust version constraints to compatible versions."
589
+
590
+
591
+ def process_dependencies(
592
+ library_list: str,
593
+ requirements_text: str,
594
+ uploaded_file,
595
+ python_version: str,
596
+ device: str,
597
+ os_type: str,
598
+ mode: str,
599
+ resolution_strategy: str,
600
+ use_llm_explanations: bool = True,
601
+ use_ml_prediction: bool = True,
602
+ use_ml_spellcheck: bool = True,
603
+ show_ml_details: bool = False
604
+ ) -> Tuple[str, str, str]:
605
+ """Main processing function for Gradio interface."""
606
+
607
+ # Collect dependencies from all sources
608
+ all_dependencies = []
609
+
610
+ # Parse library list
611
+ if library_list:
612
+ parser = DependencyParser()
613
+ deps = parser.parse_library_list(library_list)
614
+ all_dependencies.extend(deps)
615
+
616
+ # Parse requirements text
617
+ if requirements_text:
618
+ parser = DependencyParser()
619
+ deps = parser.parse_requirements_text(requirements_text)
620
+ all_dependencies.extend(deps)
621
+
622
+ # Parse uploaded file
623
+ if uploaded_file:
624
+ try:
625
+ # Handle both string paths and file objects (Gradio 6.x compatibility)
626
+ if isinstance(uploaded_file, str):
627
+ file_path = uploaded_file
628
+ else:
629
+ # If it's a file object, get the path
630
+ file_path = uploaded_file.name if hasattr(uploaded_file, 'name') else str(uploaded_file)
631
+
632
+ with open(file_path, 'r') as f:
633
+ content = f.read()
634
+ parser = DependencyParser()
635
+ deps = parser.parse_requirements_text(content)
636
+ all_dependencies.extend(deps)
637
+ except Exception as e:
638
+ return f"Error reading file: {str(e)}", "", ""
639
+
640
+ if not all_dependencies:
641
+ return "Please provide at least one input: library list, requirements text, or uploaded file.", "", ""
642
+
643
+ catalog_validator = CatalogValidator(use_ml=use_ml_spellcheck and ML_AVAILABLE)
644
+ # Auto-correct spelling mistakes in package names
645
+ all_dependencies, catalog_warnings = catalog_validator.check_and_correct_packages(all_dependencies, auto_correct=True)
646
+
647
+ # ML-based conflict prediction (pre-analysis)
648
+ ml_conflict_prediction = None
649
+ ml_confidence = 0.0
650
+ ml_details = ""
651
+ if use_ml_prediction and ML_AVAILABLE:
652
+ try:
653
+ predictor = ConflictPredictor()
654
+ requirements_text_for_ml = '\n'.join([d['original'] for d in all_dependencies])
655
+ has_conflict, confidence = predictor.predict(requirements_text_for_ml)
656
+ ml_conflict_prediction = has_conflict
657
+ ml_confidence = confidence
658
+
659
+ # Build ML details output
660
+ ml_details = f"""
661
+ ### ML Model Details
662
+
663
+ **Conflict Prediction Model:**
664
+ - Prediction: {"Conflict Detected" if has_conflict else "No Conflict"}
665
+ - Confidence: {confidence:.2%}
666
+ - Model Type: Random Forest Classifier
667
+ - Features Analyzed: Package presence, version specificity, conflict patterns
668
+
669
+ """
670
+ if show_ml_details:
671
+ # Get feature importance or additional details
672
+ ml_details += f"""
673
+ **Raw Prediction:**
674
+ - Has Conflict: {has_conflict}
675
+ - Confidence Score: {confidence:.4f}
676
+ - Probability Distribution: Conflict={confidence:.2%}, No Conflict={1-confidence:.2%}
677
+
678
+ """
679
+
680
+ if has_conflict and confidence > 0.7:
681
+ catalog_warnings.append(
682
+ f"ML Prediction: High probability ({confidence:.1%}) of conflicts detected"
683
+ )
684
+ except Exception as e:
685
+ print(f"ML prediction error: {e}")
686
+ ml_details = f"ML Prediction Error: {str(e)}"
687
+ elif use_ml_prediction and not ML_AVAILABLE:
688
+ ml_details = "ML models not available. Train models using `train_conflict_model.py` to enable this feature."
689
+
690
+ # Build dependency graph
691
+ resolver = DependencyResolver(python_version=python_version, platform=os_type, device=device)
692
+ deep_mode = (mode == "Deep (with transitive dependencies)")
693
+ graph = resolver.build_dependency_graph(all_dependencies, deep_mode=deep_mode)
694
+
695
+ # Check compatibility
696
+ is_compatible, issues = resolver.check_compatibility(graph)
697
+
698
+ # Convert string issues to structured format for LLM explanations
699
+ structured_issues = []
700
+ for issue in issues:
701
+ if isinstance(issue, str):
702
+ # Parse the issue string to extract package names and type
703
+ issue_dict = {
704
+ 'type': 'version_incompatibility',
705
+ 'message': issue,
706
+ 'severity': 'high',
707
+ 'details': {}
708
+ }
709
+
710
+ # Extract package names from known patterns
711
+ packages = []
712
+ issue_lower = issue.lower()
713
+
714
+ # Check for specific known conflicts
715
+ if 'pytorch-lightning' in issue_lower and 'torch' in issue_lower:
716
+ packages = ['pytorch-lightning', 'torch']
717
+ issue_dict['type'] = 'version_incompatibility'
718
+ # Extract version details
719
+ for dep in all_dependencies:
720
+ if dep['package'] in packages:
721
+ issue_dict['details'][dep['package']] = dep.get('specifier', '')
722
+ elif 'fastapi' in issue_lower and 'pydantic' in issue_lower:
723
+ packages = ['fastapi', 'pydantic']
724
+ issue_dict['type'] = 'version_incompatibility'
725
+ for dep in all_dependencies:
726
+ if dep['package'] in packages:
727
+ issue_dict['details'][dep['package']] = dep.get('specifier', '')
728
+ elif 'tensorflow' in issue_lower and 'keras' in issue_lower:
729
+ packages = ['tensorflow', 'keras']
730
+ issue_dict['type'] = 'version_incompatibility'
731
+ for dep in all_dependencies:
732
+ if dep['package'] in packages:
733
+ issue_dict['details'][dep['package']] = dep.get('specifier', '')
734
+ elif 'conflict in' in issue_lower:
735
+ # Duplicate package conflict
736
+ pkg = issue.split('Conflict in')[1].split(':')[0].strip()
737
+ packages = [pkg]
738
+ issue_dict['type'] = 'duplicate'
739
+ issue_dict['package'] = pkg
740
+ else:
741
+ # Generic: try to find packages mentioned in the issue
742
+ for dep in all_dependencies:
743
+ if dep['package'] in issue_lower:
744
+ packages.append(dep['package'])
745
+
746
+ if packages:
747
+ issue_dict['packages'] = packages
748
+ else:
749
+ issue_dict['package'] = 'unknown'
750
+ issue_dict['packages'] = []
751
+
752
+ structured_issues.append(issue_dict)
753
+ else:
754
+ structured_issues.append(issue)
755
+
756
+ # Generate LLM explanations if enabled
757
+ explanations = []
758
+ if use_llm_explanations and structured_issues:
759
+ explanation_engine = ExplanationEngine(use_llm=use_llm_explanations)
760
+ for issue in structured_issues:
761
+ try:
762
+ explanation = explanation_engine.generate_explanation(issue, all_dependencies)
763
+ explanations.append(explanation)
764
+ except Exception as e:
765
+ # If explanation generation fails, just use the issue message
766
+ explanations.append({
767
+ 'summary': issue.get('message', str(issue)),
768
+ 'explanation': issue.get('message', str(issue)),
769
+ 'why_it_happens': 'Unable to generate explanation.',
770
+ 'how_to_fix': 'Review version constraints.',
771
+ 'packages_involved': issue.get('packages', []),
772
+ 'severity': issue.get('severity', 'medium')
773
+ })
774
+
775
+ # Resolve dependencies
776
+ resolved_text, resolver_warnings = resolver.resolve_dependencies(all_dependencies, resolution_strategy)
777
+ warnings = catalog_warnings + resolver_warnings
778
+
779
+ # Build output message
780
+ output_parts = []
781
+ output_parts.append("## Dependency Analysis Results\n\n")
782
+
783
+ # Show ML prediction if available
784
+ if ML_AVAILABLE and ml_conflict_prediction is not None:
785
+ if ml_conflict_prediction:
786
+ output_parts.append(f"### ML Prediction: Potential Conflicts Detected (Confidence: {ml_confidence:.1%})\n\n")
787
+ else:
788
+ output_parts.append(f"### ML Prediction: Low Conflict Risk (Confidence: {ml_confidence:.1%})\n\n")
789
+
790
+ if issues:
791
+ output_parts.append("### Compatibility Issues Found:\n")
792
+ if explanations:
793
+ # Show detailed LLM explanations
794
+ for i, (issue, explanation) in enumerate(zip(issues, explanations), 1):
795
+ output_parts.append(f"#### Issue #{i}: {explanation['summary']}\n\n")
796
+ output_parts.append(f"**Explanation:**\n{explanation['explanation']}\n\n")
797
+ output_parts.append(f"**Why this happens:**\n{explanation['why_it_happens']}\n\n")
798
+ output_parts.append(f"**How to fix:**\n{explanation['how_to_fix']}\n\n")
799
+ output_parts.append("---\n\n")
800
+ else:
801
+ # Fallback to simple list
802
+ for issue in issues:
803
+ output_parts.append(f"- {issue}\n")
804
+ output_parts.append("\n")
805
+
806
+ # Separate corrections from other warnings
807
+ corrections = [w for w in warnings if "Auto-corrected" in w or "→" in w]
808
+ other_warnings = [w for w in warnings if w not in corrections]
809
+
810
+ if corrections:
811
+ output_parts.append("### Spelling Corrections:\n")
812
+ for correction in corrections:
813
+ output_parts.append(f"- {correction}\n")
814
+ output_parts.append("\n")
815
+
816
+ if other_warnings:
817
+ output_parts.append("### Warnings:\n")
818
+ for warning in other_warnings:
819
+ output_parts.append(f"- {warning}\n")
820
+ output_parts.append("\n")
821
+
822
+ if is_compatible and not issues:
823
+ output_parts.append("### No compatibility issues detected!\n\n")
824
+
825
+ output_parts.append(f"### Resolved Requirements ({len(all_dependencies)} packages):\n")
826
+ output_parts.append("```\n")
827
+ output_parts.append(resolved_text)
828
+ output_parts.append("\n```\n")
829
+
830
+ # Add ML details if requested
831
+ if show_ml_details and ml_details:
832
+ output_parts.append(ml_details)
833
+
834
+ return ''.join(output_parts), resolved_text, ml_details
835
+
836
+
837
+ # Gradio Interface
838
+ def create_interface():
839
+ """Create and return the Gradio interface."""
840
+ import gradio as gr
841
+
842
+ with gr.Blocks(title="Python Dependency Compatibility Board") as app:
843
+ gr.Markdown("""
844
+ # Python Dependency Compatibility Board
845
+
846
+ Analyze and resolve Python package dependencies with **AI-powered explanations** and **ML-based conflict prediction**.
847
+
848
+ ## Key Features
849
+
850
+ | Feature | Status | Description |
851
+ |---------|--------|-------------|
852
+ | **LLM Reasoning** | Active | AI-powered natural language explanations for conflicts |
853
+ | **ML Conflict Prediction** | {"Available" if ML_AVAILABLE else "Not Loaded"} | Machine learning model predicts conflicts before analysis |
854
+ | **Embedding-Based Spell Check** | {"Available" if ML_AVAILABLE else "Not Loaded"} | Semantic similarity matching for package names |
855
+ | **Auto-Correction** | Active | Automatically fixes spelling mistakes in package names |
856
+ | **Dependency Resolution** | Active | Resolves conflicts using pip's resolver |
857
+
858
+ """)
859
+
860
+ with gr.Row():
861
+ with gr.Column(scale=1):
862
+ gr.Markdown("### Input Methods")
863
+
864
+ library_input = gr.Textbox(
865
+ label="Library Names (one per line)",
866
+ placeholder="pandas\ntorch\nlangchain\nfastapi",
867
+ lines=5,
868
+ info="Enter package names, one per line"
869
+ )
870
+
871
+ requirements_input = gr.Textbox(
872
+ label="Requirements.txt Content",
873
+ placeholder="pandas==2.0.3\ntorch>=2.0.0\nlangchain==0.1.0",
874
+ lines=10,
875
+ info="Paste your requirements.txt content here"
876
+ )
877
+
878
+ file_upload = gr.File(
879
+ label="Upload requirements.txt",
880
+ file_types=[".txt"]
881
+ )
882
+
883
+ with gr.Column(scale=1):
884
+ gr.Markdown("### Environment Settings")
885
+
886
+ python_version = gr.Dropdown(
887
+ choices=["3.8", "3.9", "3.10", "3.11", "3.12"],
888
+ value="3.10",
889
+ label="Python Version",
890
+ info="Target Python version"
891
+ )
892
+
893
+ device = gr.Dropdown(
894
+ choices=["CPU only", "NVIDIA GPU (CUDA)", "Apple Silicon (MPS)", "Custom / other"],
895
+ value="CPU only",
896
+ label="Device",
897
+ info="Target device/platform"
898
+ )
899
+
900
+ os_type = gr.Dropdown(
901
+ choices=["Any / generic", "Linux (x86_64)", "Windows (x86_64)", "MacOS (Intel)", "MacOS (Apple Silicon)"],
902
+ value="Any / generic",
903
+ label="Operating System",
904
+ info="Target operating system"
905
+ )
906
+
907
+ mode = gr.Radio(
908
+ choices=["Quick (top-level only)", "Deep (with transitive dependencies)"],
909
+ value="Quick (top-level only)",
910
+ label="Analysis Mode",
911
+ info="Quick mode is faster, Deep mode includes all dependencies"
912
+ )
913
+
914
+ resolution_strategy = gr.Dropdown(
915
+ choices=["latest_compatible", "stable/pinned", "keep_existing_pins", "minimal_changes"],
916
+ value="latest_compatible",
917
+ label="Resolution Strategy",
918
+ info="How to resolve version conflicts"
919
+ )
920
+
921
+ gr.Markdown("---")
922
+ gr.Markdown("### AI & ML Features")
923
+
924
+ use_llm = gr.Checkbox(
925
+ label="**LLM Reasoning** - AI Explanations",
926
+ value=True,
927
+ info="Generate intelligent, natural language explanations for conflicts using LLM"
928
+ )
929
+
930
+ use_ml_prediction = gr.Checkbox(
931
+ label="**ML Conflict Prediction**",
932
+ value=True,
933
+ info=f"{'Model available - Predicts conflicts before detailed analysis' if ML_AVAILABLE else 'Model not loaded - Train models to enable'}"
934
+ )
935
+
936
+ use_ml_spellcheck = gr.Checkbox(
937
+ label="**ML Spell Check** (Embedding-based)",
938
+ value=True,
939
+ info=f"{'Model available - Uses semantic similarity for better corrections' if ML_AVAILABLE else 'Model not loaded - Train models to enable'}"
940
+ )
941
+
942
+ show_ml_details = gr.Checkbox(
943
+ label="Show ML Model Details",
944
+ value=False,
945
+ info="Display raw ML predictions and confidence scores"
946
+ )
947
+
948
+ process_btn = gr.Button("Analyze & Resolve Dependencies", variant="primary", size="lg")
949
+
950
+ with gr.Row():
951
+ output_display = gr.Markdown(
952
+ label="Analysis Results",
953
+ value="Results will appear here after processing..."
954
+ )
955
+
956
+ with gr.Row():
957
+ with gr.Column(scale=2):
958
+ resolved_output = gr.Textbox(
959
+ label="Resolved requirements.txt",
960
+ lines=15,
961
+ info="Copy this content to use as your requirements.txt file"
962
+ )
963
+
964
+ download_btn = gr.File(
965
+ label="Download requirements.txt",
966
+ value=None,
967
+ visible=True
968
+ )
969
+
970
+ with gr.Column(scale=1):
971
+ ml_output = gr.Markdown(
972
+ label="ML Model Output",
973
+ value="ML predictions will appear here when enabled...",
974
+ visible=True
975
+ )
976
+
977
+ def process_and_download(*args):
978
+ # Extract all arguments
979
+ result_text, resolved_text, ml_details = process_dependencies(*args)
980
+
981
+ # Create a temporary file for download
982
+ temp_file = None
983
+ if resolved_text and resolved_text.strip():
984
+ try:
985
+ with tempfile.NamedTemporaryFile(mode='w', suffix='.txt', delete=False) as f:
986
+ f.write(resolved_text)
987
+ temp_file = f.name
988
+ except Exception as e:
989
+ print(f"Error creating download file: {e}")
990
+
991
+ # Format ML output
992
+ ml_output_text = ml_details if ml_details else "ML features disabled or models not available."
993
+
994
+ return result_text, resolved_text, temp_file if temp_file else None, ml_output_text
995
+
996
+ process_btn.click(
997
+ fn=process_and_download,
998
+ inputs=[library_input, requirements_input, file_upload, python_version, device, os_type, mode, resolution_strategy, use_llm, use_ml_prediction, use_ml_spellcheck, show_ml_details],
999
+ outputs=[output_display, resolved_output, download_btn, ml_output]
1000
+ )
1001
+
1002
+ gr.Markdown("""
1003
+ ---
1004
+ ### How to Use
1005
+
1006
+ 1. **Input your dependencies** using any of the three methods (or combine them)
1007
+ 2. **Configure your environment** (Python version, device, OS)
1008
+ 3. **Choose analysis mode**: Quick for fast results, Deep for complete dependency tree
1009
+ 4. **Select resolution strategy**: How to handle version conflicts
1010
+ 5. **Click "Analyze & Resolve Dependencies"**
1011
+ 6. **Review the results** and download the resolved requirements.txt
1012
+
1013
+ ### Features
1014
+
1015
+ - Parse multiple input formats
1016
+ - Detect version conflicts
1017
+ - Check compatibility across dependency graph
1018
+ - Resolve dependencies using pip
1019
+ - Generate clean, pip-compatible requirements.txt
1020
+ - Environment-aware (Python version, platform, device)
1021
+ """)
1022
+
1023
+ return app
1024
+
1025
+
1026
+ if __name__ == "__main__":
1027
+ app = create_interface()
1028
+ # For Hugging Face Spaces, use default launch settings
1029
+ # For local development, you can customize
1030
+ app.launch()
data/ground_truth/gt_1 copy.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ torch
2
+ torchvision
3
+ torchvision.transforms as transforms
4
+ torch.utils.data import DataLoader
5
+ numpy as np
6
+ scipy import stats
data/ground_truth/gt_1.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ torch
2
+ torchvision
3
+ torchvision.transforms as transforms
4
+ torch.utils.data import DataLoader
5
+ numpy
6
+ scipy
data/package_name_catalog.json ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "valid_packages": [
3
+ "numpy",
4
+ "pandas",
5
+ "scipy",
6
+ "scikit-learn",
7
+ "pydantic",
8
+ "fastapi",
9
+ "torch",
10
+ "pytorch-lightning",
11
+ "tensorflow",
12
+ "keras",
13
+ "pillow",
14
+ "requests",
15
+ "httpx",
16
+ "langchain",
17
+ "openai",
18
+ "chromadb",
19
+ "uvicorn",
20
+ "starlette",
21
+ "sqlalchemy",
22
+ "alembic",
23
+ "redis"
24
+ ],
25
+ "invalid_packages": [
26
+ "numpyy",
27
+ "pandaz",
28
+ "scipy-pro",
29
+ "fastapi-pro",
30
+ "torchx",
31
+ "pytorch-brightning",
32
+ "tensorflower",
33
+ "kerras",
34
+ "pillow2",
35
+ "requests3",
36
+ "httxx",
37
+ "langchainz",
38
+ "opena1",
39
+ "chromad",
40
+ "uvicornx",
41
+ "starlite",
42
+ "sqalachemy",
43
+ "alembico",
44
+ "redis-plus",
45
+ "fakerlib"
46
+ ]
47
+ }
ml_models.py ADDED
@@ -0,0 +1,217 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ ML Model Loader and Utilities
3
+ Handles loading and using the conflict prediction model and package embeddings.
4
+ """
5
+
6
+ import json
7
+ import pickle
8
+ from pathlib import Path
9
+ from typing import Dict, List, Tuple, Optional
10
+ import numpy as np
11
+ from packaging.requirements import Requirement
12
+
13
+
14
+ class ConflictPredictor:
15
+ """Load and use the conflict prediction model."""
16
+
17
+ def __init__(self, model_path: Optional[Path] = None):
18
+ """Initialize the conflict predictor."""
19
+ if model_path is None:
20
+ model_path = Path(__file__).parent / "models" / "conflict_predictor.pkl"
21
+
22
+ self.model = None
23
+ self.model_path = model_path
24
+
25
+ if model_path.exists():
26
+ try:
27
+ with open(model_path, 'rb') as f:
28
+ self.model = pickle.load(f)
29
+ print(f"✅ Loaded conflict prediction model from {model_path}")
30
+ except Exception as e:
31
+ print(f"⚠️ Could not load conflict prediction model: {e}")
32
+ else:
33
+ print(f"⚠️ Conflict prediction model not found at {model_path}")
34
+
35
+ def extract_features(self, requirements_text: str) -> np.ndarray:
36
+ """Extract features from requirements text (same as training)."""
37
+ features = []
38
+
39
+ packages = {}
40
+ lines = requirements_text.strip().split('\n')
41
+ num_packages = 0
42
+ has_pins = 0
43
+ version_specificity = []
44
+
45
+ for line in lines:
46
+ line = line.strip()
47
+ if not line or line.startswith('#'):
48
+ continue
49
+
50
+ try:
51
+ req = Requirement(line)
52
+ pkg_name = req.name.lower()
53
+ specifier = str(req.specifier) if req.specifier else ''
54
+
55
+ if pkg_name in packages:
56
+ features.append(1) # has_duplicate flag
57
+ else:
58
+ packages[pkg_name] = specifier
59
+ num_packages += 1
60
+
61
+ if specifier:
62
+ has_pins += 1
63
+ if '==' in specifier:
64
+ version_specificity.append(3)
65
+ elif '>=' in specifier or '<=' in specifier:
66
+ version_specificity.append(2)
67
+ else:
68
+ version_specificity.append(1)
69
+ else:
70
+ version_specificity.append(0)
71
+ except:
72
+ pass
73
+
74
+ feature_vec = []
75
+ feature_vec.append(min(num_packages / 20.0, 1.0))
76
+ feature_vec.append(has_pins / max(num_packages, 1))
77
+ feature_vec.append(np.mean(version_specificity) / 3.0 if version_specificity else 0)
78
+ feature_vec.append(1 if len(packages) < num_packages else 0)
79
+
80
+ common_packages = [
81
+ 'torch', 'pytorch-lightning', 'tensorflow', 'keras', 'fastapi', 'pydantic',
82
+ 'numpy', 'pandas', 'scipy', 'scikit-learn', 'matplotlib', 'seaborn',
83
+ 'requests', 'httpx', 'sqlalchemy', 'alembic', 'uvicorn', 'starlette',
84
+ 'langchain', 'openai', 'chromadb', 'redis', 'celery', 'gunicorn',
85
+ 'pillow', 'opencv-python', 'beautifulsoup4', 'scrapy', 'plotly', 'jax'
86
+ ]
87
+
88
+ for pkg in common_packages:
89
+ feature_vec.append(1 if pkg in packages else 0)
90
+
91
+ has_torch = 'torch' in packages
92
+ has_pl = 'pytorch-lightning' in packages
93
+ has_tf = 'tensorflow' in packages
94
+ has_keras = 'keras' in packages
95
+ has_fastapi = 'fastapi' in packages
96
+ has_pydantic = 'pydantic' in packages
97
+
98
+ feature_vec.append(1 if (has_torch and has_pl) else 0)
99
+ feature_vec.append(1 if (has_tf and has_keras) else 0)
100
+ feature_vec.append(1 if (has_fastapi and has_pydantic) else 0)
101
+
102
+ return np.array(feature_vec)
103
+
104
+ def predict(self, requirements_text: str) -> Tuple[bool, float]:
105
+ """
106
+ Predict if requirements have conflicts.
107
+
108
+ Returns:
109
+ (has_conflict, confidence_score)
110
+ """
111
+ if self.model is None:
112
+ return False, 0.0
113
+
114
+ try:
115
+ features = self.extract_features(requirements_text)
116
+ features = features.reshape(1, -1)
117
+
118
+ prediction = self.model.predict(features)[0]
119
+ probability = self.model.predict_proba(features)[0]
120
+
121
+ has_conflict = bool(prediction)
122
+ confidence = float(probability[1] if has_conflict else probability[0])
123
+
124
+ return has_conflict, confidence
125
+ except Exception as e:
126
+ print(f"Error in conflict prediction: {e}")
127
+ return False, 0.0
128
+
129
+
130
+ class PackageEmbeddings:
131
+ """Load and use package embeddings for similarity matching."""
132
+
133
+ def __init__(self, embeddings_path: Optional[Path] = None):
134
+ """Initialize package embeddings."""
135
+ if embeddings_path is None:
136
+ embeddings_path = Path(__file__).parent / "models" / "package_embeddings.json"
137
+
138
+ self.embeddings = {}
139
+ self.embeddings_path = embeddings_path
140
+ self.model = None
141
+
142
+ if embeddings_path.exists():
143
+ try:
144
+ with open(embeddings_path, 'r') as f:
145
+ self.embeddings = json.load(f)
146
+ print(f"✅ Loaded {len(self.embeddings)} package embeddings from {embeddings_path}")
147
+ except Exception as e:
148
+ print(f"⚠️ Could not load embeddings: {e}")
149
+ else:
150
+ print(f"⚠️ Embeddings not found at {embeddings_path}")
151
+
152
+ def _load_model(self):
153
+ """Lazy load the sentence transformer model."""
154
+ if self.model is None:
155
+ try:
156
+ from sentence_transformers import SentenceTransformer
157
+ self.model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
158
+ except ImportError:
159
+ print("⚠️ sentence-transformers not available, embedding similarity disabled")
160
+ return None
161
+ return self.model
162
+
163
+ def get_embedding(self, package_name: str) -> Optional[np.ndarray]:
164
+ """Get embedding for a package (from cache or compute on-the-fly)."""
165
+ package_lower = package_name.lower()
166
+
167
+ # Check cache first
168
+ if package_lower in self.embeddings:
169
+ return np.array(self.embeddings[package_lower])
170
+
171
+ # Compute on-the-fly if model available
172
+ model = self._load_model()
173
+ if model is not None:
174
+ embedding = model.encode([package_name])[0]
175
+ # Cache it
176
+ self.embeddings[package_lower] = embedding.tolist()
177
+ return embedding
178
+
179
+ return None
180
+
181
+ def find_similar(self, package_name: str, top_k: int = 5, threshold: float = 0.6) -> List[Tuple[str, float]]:
182
+ """
183
+ Find similar packages using cosine similarity.
184
+
185
+ Returns:
186
+ List of (package_name, similarity_score) tuples
187
+ """
188
+ query_emb = self.get_embedding(package_name)
189
+ if query_emb is None:
190
+ return []
191
+
192
+ similarities = []
193
+
194
+ for pkg, emb in self.embeddings.items():
195
+ if pkg == package_name.lower():
196
+ continue
197
+
198
+ emb_array = np.array(emb)
199
+ # Cosine similarity
200
+ similarity = np.dot(query_emb, emb_array) / (
201
+ np.linalg.norm(query_emb) * np.linalg.norm(emb_array)
202
+ )
203
+
204
+ if similarity >= threshold:
205
+ similarities.append((pkg, float(similarity)))
206
+
207
+ # Sort by similarity and return top_k
208
+ similarities.sort(key=lambda x: x[1], reverse=True)
209
+ return similarities[:top_k]
210
+
211
+ def get_best_match(self, package_name: str, threshold: float = 0.7) -> Optional[str]:
212
+ """Get the best matching package name."""
213
+ similar = self.find_similar(package_name, top_k=1, threshold=threshold)
214
+ if similar:
215
+ return similar[0][0]
216
+ return None
217
+
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.44.1
2
+ packaging>=23.0
3
+ pip>=23.0
4
+ requests>=2.31.0
5
+ scikit-learn>=1.3.0
6
+ sentence-transformers>=2.2.0
7
+ numpy>=1.24.0
8
+