File size: 2,119 Bytes
1e43b82
 
 
 
 
 
 
 
 
 
5fe9776
1e43b82
 
 
 
 
5fe9776
 
ca59431
 
 
 
 
5ec9328
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5fe9776
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
title: Homework Validation System
emoji: πŸ“š
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---


<<<<<<< HEAD
---
title: Homework Validation System
sdk: docker
app_port: 7860
---
hello
=======
---
title: Homework Validation System
sdk: docker
app_port: 7860
---
# Homework Validation System (FastAPI)

A backend API that validates student homework by extracting text from teacher and student files, comparing answers, and generating remarks using rule-based logic and optional AI.

---

## Features

- Upload teacher and student homework files
- OCR support for images and scanned PDFs
- Text extraction from PDF and DOCX
- Similarity matching using TF-IDF + cosine similarity
- Optional AI-generated remarks (OpenAI / Gemini)
- FastAPI Swagger documentation

---

## Tech Stack

- FastAPI
- Python
- pytesseract
- Pillow
- pypdf / pdf2image
- python-docx
- scikit-learn
- OpenAI / Gemini (optional)

---

## Project Structure

---
homework_validation_system/
β”‚
β”œβ”€β”€ app.py
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ artifacts/
β”œβ”€β”€ uploads/
β”œβ”€β”€ src/
β”‚ β”œβ”€β”€ extractors.py
β”‚ β”œβ”€β”€ similarity.py
β”‚ β”œβ”€β”€ llm_client.py
β”‚ └── utils.py
└── README.md
## Installation

### 1. Create Virtual Environment
python -m venv myenv

### 2. Install Requirements
pip install -r requirements.txt
## OCR Setup (Required)

### Install Tesseract OCR

This project uses **Tesseract OCR** for extracting text from images and scanned PDFs.

#### Windows
1. Download and install Tesseract OCR.
2. Default installation path: 
3. Add this path in your code:

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

### Run API
uvicorn app:app --reload --host 0.0.0.0 --port 8000

### Swagger UI:

http://localhost:8000/docs

### Example API Response
{
  "student_id": 1,
  "homework_id": 10,
  "status": "Needs Review",
  "match_percentage": 72,
  "teacher_extracted_text": "...",
  "student_extracted_text": "...",
  "ai_generated_remark": "Good attempt but missing key points.",
  "llm_used": true
}
>>>>>>> cdb5b148e5facdea1aec264a5b4d0b6293132b6e