Spaces:
Sleeping
title: WallTD V.1
emoji: 💻
colorFrom: purple
colorTo: purple
sdk: docker
pinned: false
license: afl-3.0
WallD-v.1
A FastAPI-based application for document summarization, image interpretation, data visualization, and text translation using state-of-the-art machine learning models from Hugging Face.
Overview
This project provides a web API that allows users to:
- Summarize documents (DOCX, XLSX, PPTX, PDF, TXT) into concise, factual summaries.
- Interpret images (PNG, JPG, JPEG, WEBP) by generating detailed descriptions.
- Generate Visualizations from Excel data using AI-generated plotting code.
- Translate text from documents into various languages.
- Answer Questions about the content of documents and images.
The application leverages models like BART for summarization, Kosmos-2 for image interpretation, StarCoder for code generation, M2M100 for translation, and includes a question-answering capability, all powered by Hugging Face's Inference API and Transformers library.
Features
- Document Summarization: Extracts key points from large documents.
- Image Interpretation: Describes image content, including any visible text.
- Data Visualization: Generates Python plotting code for Excel data using
pandas,matplotlib, andseaborn. - Text Translation: Translates document text into supported languages.
- Question Answering: Answers user questions about document content or image details.
- File Management: Uploads files, processes them, and provides downloadable results.
Requirements
The app needs python 3.9.11 (visit python 3.9.11 to download it).
All requirements are listed on requirements.txt
Installation
Clone the Repository:
git clone https://github.com/yourusername/docsumm-vision-api.git cd docsumm-vision-apiInstall Dependencies:
pip install -r requirements.txtSet Environment Variables:
- Create a
.envfile or set theHF_TOKENenvironment variable with your Hugging Face API token: On Linux:export HF_TOKEN="your-huggingface-api-token"On Windows:set HF_TOKEN=your-huggingface-api-token
- Create a
Run the Application:
uvicorn main:app --reload
Usage
Endpoints
1. Document Summarization & Image Interpretation (/docsum_imginter):
- Method: POST
- Form Data:
file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)task:"summarize"(for documents) or"interpret"(for images)
- Response:
- For documents: A summarized file download
- For images: JSON with a
captionfield (e.g.,{"caption": "A tiger in a forest"})
2. Data Visualization (/generate-visualization):
- Method: POST
- Form Data:
file: Upload an Excel file (XLSX)task: Description of the desired plot (e.g., "A bar chart of sales by region")
- Response: The desired python code and a png image file of the generated plot.
3. Text Translation (/translate):
- Method: POST
- Form Data:
file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT)task: Target language (e.g., French)
- Response: A translated file download
4. Question Answering (/ask):
- Method: POST
- Form Data:
file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)task: A question about the file content
- Response: JSON with an answer field
5. List Processed Files (/processed_files):
- Method: GET
- Response: JSON list of processed file names
6. Download Processed File (/download/{filename}):
- Method: GET
- Response: File download
Frontend
Notes
- API Token: You must have a valid Hugging Face API token (
HF_TOKEN) to use the InferenceClient. - File Cleanup: Processed files are stored in the
processed/directory; temporary uploads are inupdates/and deleted after image interpretation. - Limitations:
- Visualization supports only Excel files.
- Summarization supports only files written in english
- Image interpretation can only be applied to images with no text on them
- Translation supports the following languages: French, English, Spanish, German, Arabic, Chinese (Mandarin Chinese), Japanese, Russian
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
