| --- |
| title: Graduation Project-v1.2 |
| emoji: π |
| colorFrom: indigo |
| colorTo: blue |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| --- |
| |
| # π€ AI-Powered Graduation Project Recommendation System |
|
|
| ## π Overview |
|
|
| This project implements an intelligent AI-powered recommendation and semantic similarity platform for graduation projects using: |
|
|
| * Natural Language Processing (NLP) |
| * Semantic Search |
| * Vector Embeddings |
| * Hybrid Ranking Systems |
| * Large Language Models (LLMs) |
|
|
| The system helps students: |
|
|
| * discover unique graduation project ideas |
| * avoid duplicate projects |
| * analyze originality |
| * generate intelligent project features |
| * receive context-aware recommendations through an AI chatbot |
|
|
| --- |
|
|
| # βοΈ System Pipeline |
|
|
| ## 1οΈβ£ Data Preprocessing |
|
|
| * Text normalization |
| * Duplicate removal |
| * Smart content merging |
| * Technical keyword extraction |
| * Feature engineering |
|
|
| ## 2οΈβ£ Feature Extraction |
|
|
| * KeyBERT-based keyword extraction |
| * Automatic technical term detection |
| * Semantic feature generation |
|
|
| ## 3οΈβ£ Embedding Generation |
|
|
| * SentenceTransformer embeddings |
| * Normalized vector representations |
| * Semantic encoding of projects |
|
|
| ## 4οΈβ£ Semantic Retrieval |
|
|
| * FAISS vector indexing |
| * Nearest-neighbor semantic search |
| * Fast project similarity lookup |
|
|
| ## 5οΈβ£ Hybrid Ranking |
|
|
| The final ranking combines: |
|
|
| * Semantic similarity |
| * Feature similarity |
| * Coverage ratio |
| * Confidence estimation |
| * Originality analysis |
|
|
| ## 6οΈβ£ AI Recommendation Engine |
|
|
| * Context-aware project generation |
| * Feature recommendation |
| * Novelty checking |
| * Conversational chatbot assistance |
|
|
| --- |
|
|
| # π§ AI & NLP Technologies Used |
|
|
| ## πΉ Machine Learning & NLP |
|
|
| * SentenceTransformers |
| * KeyBERT |
| * Scikit-learn |
| * SciPy |
| * FAISS |
|
|
| ## πΉ LLM Integration |
|
|
| * Google Gemini API |
| * Ollama |
| * Mistral |
|
|
| ## πΉ Backend & Infrastructure |
|
|
| * FastAPI |
| * Pandas |
| * NumPy |
| * Python |
|
|
| --- |
|
|
| # ποΈ Project Architecture |
|
|
| ```text |
| User Query |
| β |
| Intent Classification |
| β |
| Context Builder |
| β |
| Feature Extraction |
| β |
| Embedding Generation |
| β |
| FAISS Semantic Search |
| β |
| Hybrid Ranking Engine |
| β |
| Originality & Duplicate Analysis |
| β |
| AI Recommendation Response |
| ``` |
|
|
| --- |
|
|
| # π Similarity Engine Workflow |
|
|
| ```text |
| Raw Dataset |
| β |
| Preprocessing |
| β |
| Feature Extraction |
| β |
| Sentence Embeddings |
| β |
| FAISS Indexing |
| β |
| Semantic Retrieval |
| β |
| Feature Similarity Matching |
| β |
| Hybrid Re-ranking |
| β |
| Final Recommendation |
| ``` |
|
|
| --- |
|
|
| # π Features |
|
|
| ## β
AI Chatbot |
|
|
| * Context-aware conversations |
| * Intent classification |
| * Domain-specific recommendations |
| * Memory-aware responses |
|
|
| ## β
Semantic Similarity Search |
|
|
| * Embedding-based retrieval |
| * Semantic duplicate detection |
| * Vector search with FAISS |
|
|
| ## β
Hybrid Recommendation System |
|
|
| * Multi-stage ranking pipeline |
| * Feature-level semantic comparison |
| * Adaptive scoring strategy |
|
|
| ## β
Originality Detection |
|
|
| * Duplicate risk analysis |
| * Originality scoring |
| * Similarity confidence estimation |
|
|
| ## β
Intelligent Feature Generation |
|
|
| * AI-generated project features |
| * Novelty-aware generation |
| * Domain-aware recommendations |
|
|
| --- |
|
|
| # π Evaluation |
|
|
| The system includes: |
|
|
| * Self-retrieval evaluation |
| * Real-query testing |
| * Hybrid ranking validation |
| * Confidence scoring |
|
|
| ### Evaluation Metrics |
|
|
| * Semantic Similarity Score |
| * Hybrid Score |
| * Originality Score |
| * Confidence Score |
| * Duplicate Risk Classification |
|
|
| --- |
|
|
| # π Project Structure |
|
|
| ```text |
| GRADUATION_PROJECT/ |
| β |
| βββ api/ # FastAPI backend |
| β |
| βββ Data/ |
| β βββ raw/ # Original dataset |
| β βββ processed/ # Cleaned dataset |
| β |
| βββ models/ # FAISS index & metadata |
| β |
| βββ Notebooks/ |
| β βββ TEST.ipynb # Training & evaluation notebook |
| β |
| βββ src/ |
| β βββ recommendation_engine/ # Chatbot & recommendation logic |
| β βββ similarity_model/ # Semantic search engine |
| β |
| βββ requirements.txt |
| βββ README.md |
| βββ .gitignore |
| ``` |
|
|
| --- |
|
|
| # π§© Recommendation Engine Modules |
|
|
| ## recommendation_engine/ |
| |
| Contains: |
| |
| * Chatbot engine |
| * Intent classification |
| * Prompt building |
| * Idea generation |
| * Feature generation |
| * Memory management |
| * Novelty checking |
| * Response formatting |
| |
| --- |
| |
| # π¬ Similarity Model Modules |
| |
| ## similarity_model/ |
|
|
| Contains: |
|
|
| * Semantic search |
| * Embedding engine |
| * Hybrid ranker |
| * Feature similarity engine |
| * Preprocessing pipeline |
| * Evaluation framework |
|
|
| --- |
|
|
| # β‘ Installation |
|
|
| ## 1οΈβ£ Clone Repository |
|
|
| ```bash |
| git clone https://github.com/YOUR_USERNAME/YOUR_REPOSITORY.git |
| cd YOUR_REPOSITORY |
| ``` |
|
|
| --- |
|
|
| ## 2οΈβ£ Create Virtual Environment |
|
|
| ### Windows |
|
|
| ```bash |
| python -m venv .venv |
| .venv\Scripts\activate |
| ``` |
|
|
| ### Linux / Mac |
|
|
| ```bash |
| python3 -m venv .venv |
| source .venv/bin/activate |
| ``` |
|
|
| --- |
|
|
| ## 3οΈβ£ Install Dependencies |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| --- |
|
|
| # π Environment Variables |
|
|
| Create a `.env` file: |
|
|
| ```env |
| GEMINI_API_KEY=your_api_key_here |
| ``` |
|
|
| --- |
|
|
| # βΆοΈ Running The Project |
|
|
| ## Run FastAPI Server |
|
|
| ```bash |
| uvicorn api.main:app --reload |
| ``` |
|
|
| --- |
|
|
| ## Run Notebook |
|
|
| ```bash |
| jupyter notebook |
| ``` |
|
|
| Open: |
|
|
| ```text |
| Notebooks/TEST.ipynb |
| ``` |
|
|
| --- |
|
|
| # π‘ Example Query |
|
|
| ## Input |
|
|
| ```text |
| AI-based smart library recommendation platform |
| ``` |
|
|
| ## Output |
|
|
| * Similar graduation projects |
| * Semantic similarity scores |
| * Originality analysis |
| * Duplicate risk estimation |
| * Recommended features |
|
|
| --- |
|
|
| # π― Future Improvements |
|
|
| * Full RAG integration |
| * Multi-agent orchestration |
| * GPU acceleration |
| * Advanced evaluation metrics |
| * Real-time deployment |
| * Database persistence |
| * Frontend dashboard |
|
|
| --- |
|
|
| # π Research Areas Covered |
|
|
| * Natural Language Processing (NLP) |
| * Semantic Search |
| * Recommendation Systems |
| * Vector Databases |
| * Conversational AI |
| * Information Retrieval |
| * Hybrid Ranking Systems |
| * Large Language Models (LLMs) |
|
|
| --- |
|
|
| # π¨βπ» Author |
|
|
| Yossef Assem |
|
|
| --- |
|
|
| # π License |
|
|
| This project is for educational and research purposes. |
|
|