Spaces:

Tech-di
/

WallTD-v.1

Sleeping

App Files Files Community

WallTD-v.1 / README.md

Feriel080

Update README.md

700265c verified 10 months ago

preview code

raw

history blame contribute delete

4.65 kB

metadata

title: WallTD V.1
emoji: 💻
colorFrom: purple
colorTo: purple
sdk: docker
pinned: false
license: afl-3.0

WallD-v.1

A FastAPI-based application for document summarization, image interpretation, data visualization, and text translation using state-of-the-art machine learning models from Hugging Face.

Overview

This project provides a web API that allows users to:

Summarize documents (DOCX, XLSX, PPTX, PDF, TXT) into concise, factual summaries.
Interpret images (PNG, JPG, JPEG, WEBP) by generating detailed descriptions.
Generate Visualizations from Excel data using AI-generated plotting code.
Translate text from documents into various languages.
Answer Questions about the content of documents and images.

The application leverages models like BART for summarization, Kosmos-2 for image interpretation, StarCoder for code generation, M2M100 for translation, and includes a question-answering capability, all powered by Hugging Face's Inference API and Transformers library.

Features

Document Summarization: Extracts key points from large documents.
Image Interpretation: Describes image content, including any visible text.
Data Visualization: Generates Python plotting code for Excel data using pandas, matplotlib, and seaborn.
Text Translation: Translates document text into supported languages.
Question Answering: Answers user questions about document content or image details.
File Management: Uploads files, processes them, and provides downloadable results.

Requirements

The app needs python 3.9.11 (visit python 3.9.11 to download it). All requirements are listed on requirements.txt

Installation

Clone the Repository:

git clone https://github.com/yourusername/docsumm-vision-api.git
cd docsumm-vision-api

Install Dependencies:
```
pip install -r requirements.txt
```
Set Environment Variables:
- Create a .env file or set the HF_TOKEN environment variable with your Hugging Face API token: On Linux: export HF_TOKEN="your-huggingface-api-token" On Windows: set HF_TOKEN=your-huggingface-api-token
Run the Application: uvicorn main:app --reload

Usage

Endpoints

1. Document Summarization & Image Interpretation (/docsum_imginter):

Method: POST
Form Data:
- file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
- task: "summarize" (for documents) or "interpret" (for images)
Response:
- For documents: A summarized file download
- For images: JSON with a caption field (e.g., {"caption": "A tiger in a forest"})

2. Data Visualization (/generate-visualization):

Method: POST
Form Data:
- file: Upload an Excel file (XLSX)
- task: Description of the desired plot (e.g., "A bar chart of sales by region")
Response: The desired python code and a png image file of the generated plot.

3. Text Translation (/translate):

Method: POST
Form Data:
- file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT)
- task: Target language (e.g., French)
Response: A translated file download

4. Question Answering (/ask):

Method: POST
Form Data:
- file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
- task: A question about the file content
Response: JSON with an answer field

5. List Processed Files (/processed_files):

Method: GET
Response: JSON list of processed file names

6. Download Processed File (/download/{filename}):

Method: GET
Response: File download

Frontend

Access the basic frontend at http://localhost:8000/ (serves frontend/index.html).

Notes

API Token: You must have a valid Hugging Face API token (HF_TOKEN) to use the InferenceClient.
File Cleanup: Processed files are stored in the processed/ directory; temporary uploads are in updates/ and deleted after image interpretation.
Limitations:
- Visualization supports only Excel files.
- Summarization supports only files written in english
- Image interpretation can only be applied to images with no text on them
- Translation supports the following languages: French, English, Spanish, German, Arabic, Chinese (Mandarin Chinese), Japanese, Russian

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference