Spaces:

RiH-137Rishi
/

SearchWithLLM

Sleeping

App Files Files Community

RiH-137Rishi commited on Nov 9, 2024

Commit

ab476b4

verified ·

1 Parent(s): f495de6

done

Browse files

Files changed (8) hide show

course_titles.csv +72 -0
docx.txt +2 -0
embeddings_data.json +0 -0
main.py +61 -0
requirements.txt +8 -0
transf.py +33 -0
webScp.py +60 -0
work.py +54 -0

course_titles.csv ADDED Viewed

	@@ -0,0 +1,72 @@

+title
+Coding a ChatGPT-style Language Model from Scratch in PyTorch
+Mastering Multilingual GenAI Open-Weights for Indic Languages
+Learning Autonomous Driving Behaviors with LLMs & RL
+GenAI Applied to Quantitative Finance: For Control Implementation
+"Navigating LLM Tradeoffs: Techniques for Speed, Cost, Scale & Accuracy"
+Creating Problem-Solving Agents using GenAI for Action Composition
+Improving Real World RAG Systems: Key Challenges & Practical Solutions
+Framework to Choose the Right LLM for your Business
+Building Smarter LLMs with Mamba and State Space Model
+Generative AI - A Way of Life - Free Course
+Building LLM Applications using Prompt Engineering - Free Course
+Building Your First Computer Vision Model - Free Course
+Bagging and Boosting ML Algorithms - Free Course
+MidJourney: From Inspiration to Implementation - Free Course
+Understanding Linear Regression - Free Course
+The Working of Neural Networks - Free Course
+The A to Z of Unsupervised ML - Free Course
+Building Your first RAG System using LlamaIndex - Free Course
+Data Preprocessing on a Real-World Problem Statement - Free Course
+Exploring Stability.AI - Free Course
+Building a Text Classification Model with Natural Language Processing - Free Course
+Getting Started with Large Language Models
+Introduction to Generative  AI
+Nano Course: Dreambooth-Stable Diffusion for Custom Images
+A Comprehensive Learning Path for Deep Learning in 2023
+A Comprehensive Learning Path to Become a Data Scientist in 2024
+Nano Course: Building Large Language Models for Code
+Certified AI & ML BlackBelt+ Program
+Machine Learning Summer Training
+AI Ethics by Fractal
+A Comprehensive Learning Path to Become a Data Engineer in 2022
+Certified Business Analytics Program
+Certified Machine Learning Master's Program (MLMP)
+Certified Natural Language Processing Master’s Program
+Certified Computer Vision Master's Program
+Applied Machine Learning - Beginner to Professional
+Ace Data Science Interviews
+Writing Powerful Data Science Articles
+Machine Learning Certification Course for Beginners
+Data Science Career Conclave
+Top Data Science Projects for Analysts and Data Scientists
+Getting Started with Git and GitHub for Data Science Professionals
+Machine Learning Starter Program
+"Data Science Hacks, Tips and Tricks"
+Introduction to Business Analytics
+Introduction to PyTorch for Deep Learning
+Introductory Data Science for Business Managers
+Introduction to Natural Language Processing
+Getting started with Decision Trees
+Introduction to Python
+Loan Prediction Practice Problem (Using Python)
+Big Mart Sales Prediction Using R
+Twitter Sentiment Analysis
+Pandas for Data Analysis in Python
+Support Vector Machine (SVM) in Python and R
+Evaluation Metrics for Machine Learning Models
+Fundamentals of Regression Analysis
+Getting Started with scikit-learn (sklearn) for Machine Learning
+Convolutional Neural Networks (CNN) from Scratch
+Dimensionality Reduction for Machine Learning
+K-Nearest Neighbors (KNN) Algorithm in Python and R
+Ensemble Learning and Ensemble Learning Techniques
+Linear Programming for Data Science Professionals
+Naive Bayes from Scratch
+Learn Swift for Data Science
+Introduction to Web Scraping using Python
+Tableau for Beginners
+Getting Started with Neural Networks
+Introduction to AI & ML
+Winning Data Science Hackathons - Learn from Elite Data Scientists
+Hypothesis Testing for Data Science and Analytics

docx.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ 1. webscapping is done using bs4
2	+ 2.

embeddings_data.json ADDED Viewed

The diff for this file is too large to render. See raw diff

main.py ADDED Viewed

	@@ -0,0 +1,61 @@

+import streamlit as st
+import json
+import numpy as np
+from sklearn.metrics.pairwise import cosine_similarity
+from sentence_transformers import SentenceTransformer
+# Load the pre-trained model
+model = SentenceTransformer('all-MiniLM-L6-v2')
+# Load the embeddings data from the JSON file
+with open('embeddings_data.json', 'r') as file:
+    courses_data = json.load(file)
+# Extract the embeddings and titles
+course_embeddings = [np.array(course['embedding']) for course in courses_data]
+course_titles = [course['title'] for course in courses_data]
+# Function to get query embedding
+def get_query_embedding(query):
+    return model.encode(query, convert_to_tensor=True)
+# Function to add relevance factors
+def add_relevance_factors(similarities, indices):
+    relevance_factor = 0.2  # Weight for curriculum match
+    enhanced_scores = []
+    for idx in indices:
+        curriculum_match = 1 if "deep learning" in course_titles[idx].lower() else 0
+        enhanced_score = similarities[idx] + relevance_factor * curriculum_match
+        enhanced_scores.append((idx, enhanced_score))
+    enhanced_scores.sort(key=lambda x: x[1], reverse=True)
+    return [x[0] for x in enhanced_scores]
+# Streamlit UI for input
+st.title('Course Recommendation System')
+st.write('Enter a search query to find relevant courses')
+# Input box for the search query
+user_query = st.text_input('Search Query')
+if user_query:
+    # Convert the user query to embedding
+    query_embedding = get_query_embedding(user_query)
+    # Calculate cosine similarities
+    cosine_similarities = cosine_similarity([np.array(query_embedding)], course_embeddings)
+    # Get top 5 most similar courses
+    top_k = 5
+    top_indices = cosine_similarities[0].argsort()[-top_k:][::-1]
+    # Apply relevance factors (optional)
+    top_indices_with_relevance = add_relevance_factors(cosine_similarities[0], top_indices)
+    # Display the top results
+    st.write("### Top Course Recommendations")
+    for i in top_indices_with_relevance:
+        st.write(f"**Title**: {course_titles[i]}")
+        st.write(f"**Cosine Similarity**: {cosine_similarities[0][i]:.4f}")
+        st.write("-" * 50)

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+transformers
+tf-keras
+numpy
+scikit-learn
+pandas
+tensorflow
+streamlit
+sentence-transformers

transf.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import pandas as pd
+# Load the CSV file
+df = pd.read_csv('course_titles.csv')
+sentences = df['title'].tolist()  # Replace with your sentence column name
+from sentence_transformers import SentenceTransformer
+# Load pre-trained model
+model = SentenceTransformer('all-MiniLM-L6-v2')  # You can choose a different pre-trained model if needed
+# Convert sentences to embeddings
+embeddings = model.encode(sentences, convert_to_tensor=True)
+import json
+import numpy as np
+# Create a list of dictionaries containing metadata and embeddings
+data = []
+for i, sentence in enumerate(sentences):
+    # Example of adding course metadata along with embeddings
+    item = {
+        'title': df['title'][i],  # Replace with your course title column
+        'embedding': embeddings[i].tolist()  # Convert the tensor to a list
+    }
+    data.append(item)
+# Save data to JSON file
+with open('embeddings_data.json', 'w') as json_file:
+    json.dump(data, json_file, indent=4)

webScp.py ADDED Viewed

	@@ -0,0 +1,60 @@

+import requests
+from bs4 import BeautifulSoup
+import csv
+## function for csv
+def save_to_csv(course_titles, filename='course_titles.csv'):
+    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
+        csv_w = csv.writer(csvfile)
+        for i in course_titles:
+            csv_w.writerow([i])
+    print(f"Data saved to {filename}.")
+## function for web scrapping
+def scrape_courses(base_url, max_pages=8):
+    ## list to store all course titles
+    all_course_titles = []
+    for i in range(1, max_pages + 1):
+        # Construct the URL for the current page
+        url = f"{base_url}{i}"
+        print(f"Scraping page {i}: {url}")
+        ## response
+        response = requests.get(url)
+        # check if the page exists-->  status code 200
+        if response.status_code != 200:
+            print(f"Page {i} does not exist or cannot be accessed.")
+            break  # Stop if we hit a page that doesn’t exist
+        soup = BeautifulSoup(response.text, 'html.parser')
+        ## 'products__list'
+        products_list = soup.find(class_='products__list')
+        if products_list:
+            # findling h3
+            titles = products_list.find_all('h3')
+            for title in titles:
+                title_text = title.get_text(strip=True)
+                print(f"Course Title: {title_text}")
+                all_course_titles.append(title_text)  # Add title to the list
+        else:
+            print(f"No 'products__list' container found on page {i}.")
+    # saving the course titles to a CSV file
+    save_to_csv(all_course_titles)
+## function calling
+base_url = 'https://courses.analyticsvidhya.com/collections?page='
+scrape_courses(base_url)

work.py ADDED Viewed

	@@ -0,0 +1,54 @@

+import json
+import numpy as np
+from sklearn.metrics.pairwise import cosine_similarity
+from sentence_transformers import SentenceTransformer
+# Load the pre-trained model
+model = SentenceTransformer('all-MiniLM-L6-v2')
+# Load the embeddings data from the JSON file
+with open('embeddings_data.json', 'r') as file:
+    courses_data = json.load(file)
+# Extract the embeddings and titles
+course_embeddings = [np.array(course['embedding']) for course in courses_data]
+course_titles = [course['title'] for course in courses_data]
+# Function to get query embedding
+def get_query_embedding(query):
+    return model.encode(query, convert_to_tensor=True)
+# Function to add relevance factors
+def add_relevance_factors(similarities, indices):
+    relevance_factor = 0.2  # Weight for curriculum match
+    enhanced_scores = []
+    for idx in indices:
+        curriculum_match = 1 if "deep learning" in course_titles[idx].lower() else 0
+        enhanced_score = similarities[idx] + relevance_factor * curriculum_match
+        enhanced_scores.append((idx, enhanced_score))
+    enhanced_scores.sort(key=lambda x: x[1], reverse=True)
+    return [x[0] for x in enhanced_scores]
+# Example user query
+user_query = "machine learning courses with deep learning"
+# Convert the user query to embedding
+query_embedding = get_query_embedding(user_query)
+# Calculate cosine similarities
+cosine_similarities = cosine_similarity([np.array(query_embedding)], course_embeddings)
+# Get top 5 most similar courses
+top_k = 5
+top_indices = cosine_similarities[0].argsort()[-top_k:][::-1]
+# Apply relevance factors (optional)
+top_indices_with_relevance = add_relevance_factors(cosine_similarities[0], top_indices)
+# Display the top results
+for i in top_indices_with_relevance:
+    print(f"Title: {course_titles[i]}")
+    print(f"Cosine Similarity: {cosine_similarities[0][i]:.4f}")
+    print("-" * 50)