Moncey10 commited on
Commit
5ec9328
Β·
1 Parent(s): b370ada

Enhance README with project details and setup instructions

Browse files

Added detailed project description, features, tech stack, installation instructions, and example API response to README.

Files changed (1) hide show
  1. README.md +83 -0
README.md CHANGED
@@ -3,3 +3,86 @@ title: Homework Validation System
3
  sdk: docker
4
  app_port: 7860
5
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  sdk: docker
4
  app_port: 7860
5
  ---
6
+ # Homework Validation System (FastAPI)
7
+
8
+ A backend API that validates student homework by extracting text from teacher and student files, comparing answers, and generating remarks using rule-based logic and optional AI.
9
+
10
+ ---
11
+
12
+ ## Features
13
+
14
+ - Upload teacher and student homework files
15
+ - OCR support for images and scanned PDFs
16
+ - Text extraction from PDF and DOCX
17
+ - Similarity matching using TF-IDF + cosine similarity
18
+ - Optional AI-generated remarks (OpenAI / Gemini)
19
+ - FastAPI Swagger documentation
20
+
21
+ ---
22
+
23
+ ## Tech Stack
24
+
25
+ - FastAPI
26
+ - Python
27
+ - pytesseract
28
+ - Pillow
29
+ - pypdf / pdf2image
30
+ - python-docx
31
+ - scikit-learn
32
+ - OpenAI / Gemini (optional)
33
+
34
+ ---
35
+
36
+ ## Project Structure
37
+
38
+ ---
39
+ homework_validation_system/
40
+ β”‚
41
+ β”œβ”€β”€ app.py
42
+ β”œβ”€β”€ requirements.txt
43
+ β”œβ”€β”€ artifacts/
44
+ β”œβ”€β”€ uploads/
45
+ β”œβ”€β”€ src/
46
+ β”‚ β”œβ”€β”€ extractors.py
47
+ β”‚ β”œβ”€β”€ similarity.py
48
+ β”‚ β”œβ”€β”€ llm_client.py
49
+ β”‚ └── utils.py
50
+ └── README.md
51
+ ## Installation
52
+
53
+ ### 1. Create Virtual Environment
54
+ python -m venv myenv
55
+
56
+ ### 2. Install Requirements
57
+ pip install -r requirements.txt
58
+ ## OCR Setup (Required)
59
+
60
+ ### Install Tesseract OCR
61
+
62
+ This project uses **Tesseract OCR** for extracting text from images and scanned PDFs.
63
+
64
+ #### Windows
65
+ 1. Download and install Tesseract OCR.
66
+ 2. Default installation path:
67
+ 3. Add this path in your code:
68
+
69
+ pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
70
+
71
+ ### Run API
72
+ uvicorn app:app --reload --host 0.0.0.0 --port 8000
73
+
74
+ ### Swagger UI:
75
+
76
+ http://localhost:8000/docs
77
+
78
+ ### Example API Response
79
+ {
80
+ "student_id": 1,
81
+ "homework_id": 10,
82
+ "status": "Needs Review",
83
+ "match_percentage": 72,
84
+ "teacher_extracted_text": "...",
85
+ "student_extracted_text": "...",
86
+ "ai_generated_remark": "Good attempt but missing key points.",
87
+ "llm_used": true
88
+ }