File size: 4,651 Bytes
a5ab2b5
 
700265c
 
 
a5ab2b5
 
 
 
700265c
a5ab2b5
700265c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
title: WallTD V.1
emoji: 💻
colorFrom: purple
colorTo: purple
sdk: docker
pinned: false
license: afl-3.0
---
# WallD-v.1

A FastAPI-based application for ***document summarization***, ***image interpretation***, ***data visualization***, and ***text translation*** using state-of-the-art machine learning models from Hugging Face.

## Overview

This project provides a web API that allows users to:

* **Summarize** documents (DOCX, XLSX, PPTX, PDF, TXT) into concise, factual summaries.
* **Interpret** images (PNG, JPG, JPEG, WEBP) by generating detailed descriptions.
* **Generate Visualizations** from Excel data using AI-generated plotting code.
* **Translate** text from documents into various languages.
* **Answer Questions** about the content of documents and images.

The application leverages models like `BART` for summarization, `Kosmos-2` for image interpretation, `StarCoder` for code generation, `M2M100` for translation, and includes a question-answering capability, all powered by Hugging Face's `Inference` API and `Transformers` library.

## Features

* **Document Summarization:** Extracts key points from large documents.
* **Image Interpretation:** Describes image content, including any visible text.
* **Data Visualization:** Generates Python plotting code for Excel data using `pandas`, `matplotlib`, and `seaborn`.
* **Text Translation:** Translates document text into supported languages.
* **Question Answering:** Answers user questions about document content or image details.
* **File Management:** Uploads files, processes them, and provides downloadable results.

## Requirements

The app needs `python 3.9.11` (visit [python 3.9.11 ](https://www.python.org/downloads/release/python-3911/)to download it).
All requirements are listed on `requirements.txt`

## Installation

1. **Clone the Repository:**

   ```
   git clone https://github.com/yourusername/docsumm-vision-api.git
   cd docsumm-vision-api
   ```
2. **Install Dependencies:**

   ```
   pip install -r requirements.txt
   ```
3. **Set Environment Variables:**

   * Create a `.env` file or set the `HF_TOKEN` environment variable with your Hugging Face API token:
     **On Linux:** `export HF_TOKEN="your-huggingface-api-token"`
     **On Windows:** `set HF_TOKEN=your-huggingface-api-token`
4. **Run the Application:**
   `uvicorn main:app --reload`

## Usage

### Endpoints

**1. Document Summarization & Image Interpretation (`/docsum_imginter`):**

* **Method:** POST
* **Form Data:**
  * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
  * `task`: `"summarize"` (for documents) or `"interpret"` (for images)
* **Response:**
  * For documents: A summarized file download
  * For images: JSON with a `caption` field (e.g., `{"caption": "A tiger in a forest"}`)

**2. Data Visualization (`/generate-visualization`):**

* **Method:** POST
* **Form Data:**
  * `file`: Upload an Excel file (XLSX)
  * `task`: Description of the desired plot (e.g., "A bar chart of sales by region")
* **Response:** The desired python code and a png image file of the generated plot.

**3. Text Translation (`/translate`):**

* **Method:** POST
* **Form Data:**
  * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT)
  * `task`: Target language (e.g., French)
* **Response:** A translated file download

**4. Question Answering (`/ask`):**

* **Method:** POST
* **Form Data:**
  * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
  * `task`: A question about the file content
* **Response:** JSON with an answer field

**5. List Processed Files (`/processed_files`):**

* **Method:** GET
* **Response:** JSON list of processed file names

**6. Download Processed File (`/download/{filename}`):**

* **Method:** GET
* **Response:** File download

### Frontend

* Access the basic frontend at `http://localhost:8000/` (serves `frontend/index.html`).

  ![1743848842014](image/README/1743848842014.png)

## Notes

* **API Token:** You must have a valid Hugging Face API token (`HF_TOKEN`) to use the InferenceClient.
* **File Cleanup:** Processed files are stored in the `processed/` directory; temporary uploads are in `updates/` and deleted after image interpretation.
* **Limitations:**
  * Visualization supports only Excel files.
  * Summarization supports only files written in english
  * Image interpretation can only be applied to images with no text on them
  * Translation supports the following languages: *French*, *English*, *Spanish*, *German*, *Arabic*, *Chinese (Mandarin Chinese)*, *Japanese*, *Russian*

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference