Spaces:
Sleeping
Sleeping
Commit
Β·
ce2c75c
1
Parent(s):
6fd5cfa
update readme
Browse files
README.md
CHANGED
|
@@ -0,0 +1,69 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# OCR Web Application
|
| 2 |
+
|
| 3 |
+
## Project Overview
|
| 4 |
+
This is a **web-based Optical Character Recognition (OCR) application** built using Streamlit. The app supports both English and Hindi languages, allowing users to upload images and extract text using advanced OCR models.
|
| 5 |
+
|
| 6 |
+
## How the Application Works
|
| 7 |
+
1. Choose Language: Select either English or Hindi using the sidebar instructions.
|
| 8 |
+
2. Upload Image: Use the file uploader to input an image in JPG, PNG, or JPEG format.
|
| 9 |
+
3. Text Extraction: For English, the app uses the GOT OCR 2.0 model to extract text, while for Hindi, it leverages EasyOCR.
|
| 10 |
+
4. Keyword Search: After text extraction, you can search for specific keywords within the extracted text. Matching keywords will be highlighted, and any missing keywords will be displayed in a warning message.
|
| 11 |
+
5. Reset: If needed, reset the session and upload a new image to start over.
|
| 12 |
+
|
| 13 |
+
## Installation and Setup
|
| 14 |
+
|
| 15 |
+
### Prerequisites:
|
| 16 |
+
- **Python 3.8 or higher**
|
| 17 |
+
- Required libraries listed in `requirements.txt`
|
| 18 |
+
|
| 19 |
+
### Installation Steps:
|
| 20 |
+
1. **Clone the repository**:
|
| 21 |
+
```bash
|
| 22 |
+
git clone https://github.com/Trisandhyadevi/OCR.git
|
| 23 |
+
|
| 24 |
+
2. **Navigate to the project directory**
|
| 25 |
+
```bash
|
| 26 |
+
cd OCR
|
| 27 |
+
|
| 28 |
+
3. **Install the required dependencies:**
|
| 29 |
+
```bash
|
| 30 |
+
pip install -r requirements.txt
|
| 31 |
+
|
| 32 |
+
4. **Run the application:**
|
| 33 |
+
```bash
|
| 34 |
+
streamlit run app.py
|
| 35 |
+
|
| 36 |
+
|
| 37 |
+
# Description
|
| 38 |
+
|
| 39 |
+
This web application supports converting images to text using the GOT OCR 2.0 Model. Below are some key features of the GOT OCR 2.0 model
|
| 40 |
+
|
| 41 |
+
# GOT OCR 2.0 Model Overview
|
| 42 |
+
|
| 43 |
+
The GOT OCR 2.0 Model is a state-of-the-art OCR system designed for accurate text extraction from images. Key features include:
|
| 44 |
+
|
| 45 |
+
- **Multi-task Learning**: The model supports various tasks beyond OCR, including layout analysis and object detection, making it versatile for diverse text recognition needs.
|
| 46 |
+
- **End-to-End Pipeline**: It efficiently processes entire images, identifying and extracting text without the need for additional preprocessing steps.
|
| 47 |
+
|
| 48 |
+
Note: Currently, the model does not support all languages. Fine-tuning is required for languages not included in the pre-trained model. For more information on fine-tuning, visit the [GOT OCR 2.0 Fine-tuning Guide](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/?tab=readme-ov-file#fine-tune).
|
| 49 |
+
|
| 50 |
+
For more technical details about the model architecture and usage, visit the [GOT OCR 2.0 Model Documentation](https://github.com/Ucas-HaoranWei/GOT-OCR2.0/?tab=readme-ov-file#general-ocr-theory-towards-ocr-20-via-a-unified-end-to-end-model).
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
## Deployment
|
| 54 |
+
To deploy the application to a cloud platform(Hugging Face)
|
| 55 |
+
|
| 56 |
+
## Folder Structure
|
| 57 |
+
1.```bash
|
| 58 |
+
.
|
| 59 |
+
βββ app.py # Main application file
|
| 60 |
+
βββ requirements.txt # Python dependencies
|
| 61 |
+
βββ README.md # Projectdocumentation
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
## Dependencies
|
| 65 |
+
1. Streamlit: Web framework to create the interactive interface.
|
| 66 |
+
2. Transformers: To load the GOT OCR 2.0 model.
|
| 67 |
+
3. EasyOCR: To process Hindi text extraction.
|
| 68 |
+
4. Torchvision: To handle image transformations.
|
| 69 |
+
5. Pillow: Image processing library.
|
app.py
CHANGED
|
@@ -75,6 +75,15 @@ if 'reset' not in st.session_state:
|
|
| 75 |
|
| 76 |
if 'language' not in st.session_state:
|
| 77 |
st.session_state.language = False
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
st.header("Optical Character Recognition ")
|
| 80 |
col1, col2 = st.columns(2)
|
|
|
|
| 75 |
|
| 76 |
if 'language' not in st.session_state:
|
| 77 |
st.session_state.language = False
|
| 78 |
+
|
| 79 |
+
with st.sidebar:
|
| 80 |
+
st.header("Instructions")
|
| 81 |
+
st.write("1. Choose a language (English or Hindi).")
|
| 82 |
+
st.write("2. Upload an image in JPG, PNG, or JPEG format.")
|
| 83 |
+
st.write("3. The app will extract text from the image using OCR.")
|
| 84 |
+
st.write("4. Enter keywords to search within the extracted text.")
|
| 85 |
+
st.write("5. If needed, click 'Reset' to upload a new image.")
|
| 86 |
+
|
| 87 |
|
| 88 |
st.header("Optical Character Recognition ")
|
| 89 |
col1, col2 = st.columns(2)
|