Spaces:

husseinelsaadi
/

Codingo

Paused

App Files Files Community

husseinelsaadi commited on May 7, 2025

Commit

76030db

0 Parent(s):

Initial commit: Project structure for Codingo AI recruitment System

Browse files

Files changed (3) hide show

.gitignore +38 -0
readme.md +273 -0
requirements.txt +6 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,38 @@

+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+*.egg-info/
+.installed.cfg
+*.egg
+# Virtual Environment
+venv/
+ENV/
+# PyCharm
+.idea/
+# Training data & models
+data/raw_cvs/
+backend/model/*.pkl
+# Jupyter Notebook
+.ipynb_checkpoints
+# OS specific
+.DS_Store

readme.md ADDED Viewed

	@@ -0,0 +1,273 @@

+# Codingo - AI Powered Smart Recruitment System
+This repository contains the implementation of Codingo, an AI-powered online recruitment platform designed to automate and enhance the hiring process through a virtual HR assistant named LUNA.
+## Project Overview
+Codingo addresses the challenges of traditional recruitment processes by offering:
+- Automated CV screening and skill-based shortlisting
+- AI-led interviews through the virtual assistant LUNA
+- Real-time cheating detection during assessments
+- Gamified practice tools for candidates
+- Secure administration interface for hiring managers
+## Getting Started
+This guide outlines the development process, starting with local model training before moving to AWS deployment.
+### Prerequisites
+- Python 3.8+
+- pip (Python package manager)
+- Git
+### Development Process
+We'll implement the project in phases:
+#### Phase 1: Local Training and Feature Extraction (Current Phase)
+This initial phase focuses on building and training the model locally before AWS deployment.
+### Project Structure
+```
+Codingo/
+├── backend/                     # Flask API backend
+│   ├── app.py                   # Flask server
+│   ├── predict.py               # Predict using trained model
+│   ├── train_model.py           # Model training script
+│   ├── model/                   # Trained model artifacts
+│   │   └── cv_classifier.pkl
+│   ├── utils/
+│   │   ├── text_extractor.py    # PDF/DOCX to text
+│   │   └── preprocessor.py      # Cleaning, tokenizing
+│
+├── data/
+│   ├── training.csv             # Your training dataset
+│   └── raw_cvs/                 # CV files (PDF/DOCX/txt)
+│
+├── notebooks/
+│   └── eda.ipynb                # Data exploration & feature work
+│
+├── requirements.txt             # Python dependencies
+└── README.md                    # Project overview
+```
+## Step-by-Step Implementation Guide
+### Step 1: Create Training Dataset
+Start by manually collecting ~50-100 CV-like text samples with position labels.
+**File:** `data/training.csv`
+Example format:
+```
+text,position
+"Experienced in Python, Flask, AWS",Backend Developer
+"Built dashboards with React and TypeScript",Frontend Developer
+"ML projects using pandas, scikit-learn",Data Scientist
+```
+### Step 2: Train Model
+Implement a classifier using scikit-learn to predict job roles from CV text.
+**File:** `backend/train_model.py`
+```python
+import pandas as pd
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.pipeline import Pipeline
+from sklearn.linear_model import LogisticRegression
+import joblib
+# Load training data
+df = pd.read_csv('data/training.csv')
+# Define model pipeline
+model = Pipeline([
+    ('tfidf', TfidfVectorizer(max_features=5000, ngram_range=(1, 2))),
+    ('classifier', LogisticRegression(max_iter=1000))
+])
+# Train model
+model.fit(df['text'], df['position'])
+# Save model
+joblib.dump(model, 'backend/model/cv_classifier.pkl')
+print("Model trained and saved successfully!")
+```
+### Step 3: Test Prediction Locally
+Create a script to verify your model works correctly.
+**File:** `backend/predict.py`
+```python
+import joblib
+import sys
+def predict_role(cv_text):
+    # Load the trained model
+    model = joblib.load('backend/model/cv_classifier.pkl')
+    # Make prediction
+    prediction = model.predict([cv_text])[0]
+    confidence = max(model.predict_proba([cv_text])[0]) * 100
+    return {
+        'predicted_position': prediction,
+        'confidence': f"{confidence:.2f}%"
+    }
+if __name__ == "__main__":
+    if len(sys.argv) > 1:
+        # Get CV text from command line argument
+        cv_text = sys.argv[1]
+    else:
+        # Example CV text
+        cv_text = "Experienced Python developer with 5 years of experience in Flask and AWS."
+    result = predict_role(cv_text)
+    print(f"Predicted Position: {result['predicted_position']}")
+    print(f"Confidence: {result['confidence']}")
+```
+### Step 4: Add Text Extraction Utility
+Create utilities to extract text from PDF and DOCX files.
+**File:** `backend/utils/text_extractor.py`
+```python
+import fitz  # PyMuPDF
+import docx
+import os
+def extract_text_from_pdf(path):
+    """Extract text from PDF file."""
+    doc = fitz.open(path)
+    text = ""
+    for page in doc:
+        text += page.get_text()
+    return text.strip()
+def extract_text_from_docx(path):
+    """Extract text from DOCX file."""
+    doc = docx.Document(path)
+    text = "\n".join([paragraph.text for paragraph in doc.paragraphs])
+    return text.strip()
+def extract_text(file_path):
+    """Extract text from either PDF or DOCX."""
+    extension = os.path.splitext(file_path)[1].lower()
+    if extension == '.pdf':
+        return extract_text_from_pdf(file_path)
+    elif extension in ['.docx', '.doc']:
+        return extract_text_from_docx(file_path)
+    elif extension == '.txt':
+        with open(file_path, 'r', encoding='utf-8') as f:
+            return f.read().strip()
+    else:
+        raise ValueError(f"Unsupported file extension: {extension}")
+```
+### Step 5: Add Flask API (Simple)
+Create a basic Flask API to accept CV uploads and return predictions.
+**File:** `backend/app.py`
+```python
+from flask import Flask, request, jsonify
+from utils.text_extractor import extract_text
+import joblib
+import os
+app = Flask(__name__)
+model = joblib.load("model/cv_classifier.pkl")
+# Ensure directories exist
+os.makedirs("data/raw_cvs", exist_ok=True)
+os.makedirs("model", exist_ok=True)
+@app.route("/predict", methods=["POST"])
+def predict():
+    if 'file' not in request.files:
+        return jsonify({"error": "No file provided"}), 400
+    file = request.files["file"]
+    file_path = f"data/raw_cvs/{file.filename}"
+    file.save(file_path)
+    try:
+        text = extract_text(file_path)
+        prediction = model.predict([text])[0]
+        confidence = max(model.predict_proba([text])[0]) * 100
+        return jsonify({
+            "predicted_position": prediction,
+            "confidence": f"{confidence:.2f}%"
+        })
+    except Exception as e:
+        return jsonify({"error": str(e)}), 500
+if __name__ == "__main__":
+    app.run(debug=True)
+```
+### Step 6: Install Dependencies
+**File:** `requirements.txt`
+```
+flask
+scikit-learn
+pandas
+joblib
+PyMuPDF
+python-docx
+```
+Run: `pip install -r requirements.txt`
+## Next Steps
+After completing Phase 1, we'll move to:
+1. **Phase 2: Enhanced Model & NLP Features**
+   - Implement BERT or DistilBERT for improved semantic understanding
+   - Add skill extraction from CVs
+   - Develop job-CV matching scoring
+2. **Phase 3: Web Interface & Chatbot**
+   - Develop user interface for admin and candidates
+   - Implement LUNA virtual assistant using LangChain
+   - Add interview scheduling functionality
+3. **Phase 4: Video Interview & Proctoring**
+   - Add video interview capabilities
+   - Implement cheating detection using computer vision
+   - Develop automated scoring system
+4. **Phase 5: AWS Deployment**
+   - Set up AWS infrastructure using Terraform
+   - Deploy application to EC2/Lambda
+   - Configure S3 for file storage
+## Authors
+- Hussein El Saadi
+- Nour Ali Shaito
+## Supervisor
+- Dr. Ali Ezzedine
+## License
+This project is licensed under the MIT License - see the LICENSE file for details.

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+flask
+scikit-learn
+pandas
+joblib
+PyMuPDF
+python-docx