Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,11 +1,125 @@
|
|
| 1 |
---
|
| 2 |
title: WallTD V.1
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
license: afl-3.0
|
| 9 |
---
|
|
|
|
| 10 |
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: WallTD V.1
|
| 3 |
+
emoji: 💻
|
| 4 |
+
colorFrom: purple
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
pinned: false
|
| 8 |
license: afl-3.0
|
| 9 |
---
|
| 10 |
+
# WallD-v.1
|
| 11 |
|
| 12 |
+
A FastAPI-based application for ***document summarization***, ***image interpretation***, ***data visualization***, and ***text translation*** using state-of-the-art machine learning models from Hugging Face.
|
| 13 |
+
|
| 14 |
+
## Overview
|
| 15 |
+
|
| 16 |
+
This project provides a web API that allows users to:
|
| 17 |
+
|
| 18 |
+
* **Summarize** documents (DOCX, XLSX, PPTX, PDF, TXT) into concise, factual summaries.
|
| 19 |
+
* **Interpret** images (PNG, JPG, JPEG, WEBP) by generating detailed descriptions.
|
| 20 |
+
* **Generate Visualizations** from Excel data using AI-generated plotting code.
|
| 21 |
+
* **Translate** text from documents into various languages.
|
| 22 |
+
* **Answer Questions** about the content of documents and images.
|
| 23 |
+
|
| 24 |
+
The application leverages models like `BART` for summarization, `Kosmos-2` for image interpretation, `StarCoder` for code generation, `M2M100` for translation, and includes a question-answering capability, all powered by Hugging Face's `Inference` API and `Transformers` library.
|
| 25 |
+
|
| 26 |
+
## Features
|
| 27 |
+
|
| 28 |
+
* **Document Summarization:** Extracts key points from large documents.
|
| 29 |
+
* **Image Interpretation:** Describes image content, including any visible text.
|
| 30 |
+
* **Data Visualization:** Generates Python plotting code for Excel data using `pandas`, `matplotlib`, and `seaborn`.
|
| 31 |
+
* **Text Translation:** Translates document text into supported languages.
|
| 32 |
+
* **Question Answering:** Answers user questions about document content or image details.
|
| 33 |
+
* **File Management:** Uploads files, processes them, and provides downloadable results.
|
| 34 |
+
|
| 35 |
+
## Requirements
|
| 36 |
+
|
| 37 |
+
The app needs `python 3.9.11` (visit [python 3.9.11 ](https://www.python.org/downloads/release/python-3911/)to download it).
|
| 38 |
+
All requirements are listed on `requirements.txt`
|
| 39 |
+
|
| 40 |
+
## Installation
|
| 41 |
+
|
| 42 |
+
1. **Clone the Repository:**
|
| 43 |
+
|
| 44 |
+
```
|
| 45 |
+
git clone https://github.com/yourusername/docsumm-vision-api.git
|
| 46 |
+
cd docsumm-vision-api
|
| 47 |
+
```
|
| 48 |
+
2. **Install Dependencies:**
|
| 49 |
+
|
| 50 |
+
```
|
| 51 |
+
pip install -r requirements.txt
|
| 52 |
+
```
|
| 53 |
+
3. **Set Environment Variables:**
|
| 54 |
+
|
| 55 |
+
* Create a `.env` file or set the `HF_TOKEN` environment variable with your Hugging Face API token:
|
| 56 |
+
**On Linux:** `export HF_TOKEN="your-huggingface-api-token"`
|
| 57 |
+
**On Windows:** `set HF_TOKEN=your-huggingface-api-token`
|
| 58 |
+
4. **Run the Application:**
|
| 59 |
+
`uvicorn main:app --reload`
|
| 60 |
+
|
| 61 |
+
## Usage
|
| 62 |
+
|
| 63 |
+
### Endpoints
|
| 64 |
+
|
| 65 |
+
**1. Document Summarization & Image Interpretation (`/docsum_imginter`):**
|
| 66 |
+
|
| 67 |
+
* **Method:** POST
|
| 68 |
+
* **Form Data:**
|
| 69 |
+
* `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
|
| 70 |
+
* `task`: `"summarize"` (for documents) or `"interpret"` (for images)
|
| 71 |
+
* **Response:**
|
| 72 |
+
* For documents: A summarized file download
|
| 73 |
+
* For images: JSON with a `caption` field (e.g., `{"caption": "A tiger in a forest"}`)
|
| 74 |
+
|
| 75 |
+
**2. Data Visualization (`/generate-visualization`):**
|
| 76 |
+
|
| 77 |
+
* **Method:** POST
|
| 78 |
+
* **Form Data:**
|
| 79 |
+
* `file`: Upload an Excel file (XLSX)
|
| 80 |
+
* `task`: Description of the desired plot (e.g., "A bar chart of sales by region")
|
| 81 |
+
* **Response:** The desired python code and a png image file of the generated plot.
|
| 82 |
+
|
| 83 |
+
**3. Text Translation (`/translate`):**
|
| 84 |
+
|
| 85 |
+
* **Method:** POST
|
| 86 |
+
* **Form Data:**
|
| 87 |
+
* `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT)
|
| 88 |
+
* `task`: Target language (e.g., French)
|
| 89 |
+
* **Response:** A translated file download
|
| 90 |
+
|
| 91 |
+
**4. Question Answering (`/ask`):**
|
| 92 |
+
|
| 93 |
+
* **Method:** POST
|
| 94 |
+
* **Form Data:**
|
| 95 |
+
* `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
|
| 96 |
+
* `task`: A question about the file content
|
| 97 |
+
* **Response:** JSON with an answer field
|
| 98 |
+
|
| 99 |
+
**5. List Processed Files (`/processed_files`):**
|
| 100 |
+
|
| 101 |
+
* **Method:** GET
|
| 102 |
+
* **Response:** JSON list of processed file names
|
| 103 |
+
|
| 104 |
+
**6. Download Processed File (`/download/{filename}`):**
|
| 105 |
+
|
| 106 |
+
* **Method:** GET
|
| 107 |
+
* **Response:** File download
|
| 108 |
+
|
| 109 |
+
### Frontend
|
| 110 |
+
|
| 111 |
+
* Access the basic frontend at `http://localhost:8000/` (serves `frontend/index.html`).
|
| 112 |
+
|
| 113 |
+

|
| 114 |
+
|
| 115 |
+
## Notes
|
| 116 |
+
|
| 117 |
+
* **API Token:** You must have a valid Hugging Face API token (`HF_TOKEN`) to use the InferenceClient.
|
| 118 |
+
* **File Cleanup:** Processed files are stored in the `processed/` directory; temporary uploads are in `updates/` and deleted after image interpretation.
|
| 119 |
+
* **Limitations:**
|
| 120 |
+
* Visualization supports only Excel files.
|
| 121 |
+
* Summarization supports only files written in english
|
| 122 |
+
* Image interpretation can only be applied to images with no text on them
|
| 123 |
+
* Translation supports the following languages: *French*, *English*, *Spanish*, *German*, *Arabic*, *Chinese (Mandarin Chinese)*, *Japanese*, *Russian*
|
| 124 |
+
|
| 125 |
+
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|