Spaces:

Tech-di
/

WallTD-v.1

Sleeping

App Files Files Community

WallTD-v.1 / README.md

Feriel080

Update README.md

700265c verified 10 months ago

preview code

raw

history blame contribute delete

4.65 kB

	---
	title: WallTD V.1
	emoji: 💻
	colorFrom: purple
	colorTo: purple
	sdk: docker
	pinned: false
	license: afl-3.0
	---
	# WallD-v.1

	A FastAPI-based application for *document summarization, image interpretation, data visualization, and text translation* using state-of-the-art machine learning models from Hugging Face.

	## Overview

	This project provides a web API that allows users to:

	* Summarize documents (DOCX, XLSX, PPTX, PDF, TXT) into concise, factual summaries.
	* Interpret images (PNG, JPG, JPEG, WEBP) by generating detailed descriptions.
	* Generate Visualizations from Excel data using AI-generated plotting code.
	* Translate text from documents into various languages.
	* Answer Questions about the content of documents and images.

	The application leverages models like `BART` for summarization, `Kosmos-2` for image interpretation, `StarCoder` for code generation, `M2M100` for translation, and includes a question-answering capability, all powered by Hugging Face's `Inference` API and `Transformers` library.

	## Features

	* Document Summarization: Extracts key points from large documents.
	* Image Interpretation: Describes image content, including any visible text.
	* Data Visualization: Generates Python plotting code for Excel data using `pandas`, `matplotlib`, and `seaborn`.
	* Text Translation: Translates document text into supported languages.
	* Question Answering: Answers user questions about document content or image details.
	* File Management: Uploads files, processes them, and provides downloadable results.

	## Requirements

	The app needs `python 3.9.11` (visit [python 3.9.11 ](https://www.python.org/downloads/release/python-3911/)to download it).
	All requirements are listed on `requirements.txt`

	## Installation

	1. Clone the Repository:

	```
	git clone https://github.com/yourusername/docsumm-vision-api.git
	cd docsumm-vision-api
	```
	2. Install Dependencies:

	```
	pip install -r requirements.txt
	```
	3. Set Environment Variables:

	* Create a `.env` file or set the `HF_TOKEN` environment variable with your Hugging Face API token:
	On Linux: `export HF_TOKEN="your-huggingface-api-token"`
	On Windows: `set HF_TOKEN=your-huggingface-api-token`
	4. Run the Application:
	`uvicorn main:app --reload`

	## Usage

	### Endpoints

	1. Document Summarization & Image Interpretation (`/docsum_imginter`):

	* Method: POST
	* Form Data:
	* `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
	* `task`: `"summarize"` (for documents) or `"interpret"` (for images)
	* Response:
	* For documents: A summarized file download
	* For images: JSON with a `caption` field (e.g., `{"caption": "A tiger in a forest"}`)

	2. Data Visualization (`/generate-visualization`):

	* Method: POST
	* Form Data:
	* `file`: Upload an Excel file (XLSX)
	* `task`: Description of the desired plot (e.g., "A bar chart of sales by region")
	* Response: The desired python code and a png image file of the generated plot.

	3. Text Translation (`/translate`):

	* Method: POST
	* Form Data:
	* `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT)
	* `task`: Target language (e.g., French)
	* Response: A translated file download

	4. Question Answering (`/ask`):

	* Method: POST
	* Form Data:
	* `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
	* `task`: A question about the file content
	* Response: JSON with an answer field

	5. List Processed Files (`/processed_files`):

	* Method: GET
	* Response: JSON list of processed file names

	6. Download Processed File (`/download/{filename}`):

	* Method: GET
	* Response: File download

	### Frontend

	* Access the basic frontend at `http://localhost:8000/` (serves `frontend/index.html`).

	![1743848842014](image/README/1743848842014.png)

	## Notes

	* API Token: You must have a valid Hugging Face API token (`HF_TOKEN`) to use the InferenceClient.
	* File Cleanup: Processed files are stored in the `processed/` directory; temporary uploads are in `updates/` and deleted after image interpretation.
	* Limitations:
	* Visualization supports only Excel files.
	* Summarization supports only files written in english
	* Image interpretation can only be applied to images with no text on them
	* Translation supports the following languages: French, English, Spanish, German, Arabic, Chinese (Mandarin Chinese), Japanese, Russian

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference