WallTD-v.1 / README.md
Feriel080's picture
Update README.md
700265c verified
metadata
title: WallTD V.1
emoji: 💻
colorFrom: purple
colorTo: purple
sdk: docker
pinned: false
license: afl-3.0

WallD-v.1

A FastAPI-based application for document summarization, image interpretation, data visualization, and text translation using state-of-the-art machine learning models from Hugging Face.

Overview

This project provides a web API that allows users to:

  • Summarize documents (DOCX, XLSX, PPTX, PDF, TXT) into concise, factual summaries.
  • Interpret images (PNG, JPG, JPEG, WEBP) by generating detailed descriptions.
  • Generate Visualizations from Excel data using AI-generated plotting code.
  • Translate text from documents into various languages.
  • Answer Questions about the content of documents and images.

The application leverages models like BART for summarization, Kosmos-2 for image interpretation, StarCoder for code generation, M2M100 for translation, and includes a question-answering capability, all powered by Hugging Face's Inference API and Transformers library.

Features

  • Document Summarization: Extracts key points from large documents.
  • Image Interpretation: Describes image content, including any visible text.
  • Data Visualization: Generates Python plotting code for Excel data using pandas, matplotlib, and seaborn.
  • Text Translation: Translates document text into supported languages.
  • Question Answering: Answers user questions about document content or image details.
  • File Management: Uploads files, processes them, and provides downloadable results.

Requirements

The app needs python 3.9.11 (visit python 3.9.11 to download it). All requirements are listed on requirements.txt

Installation

  1. Clone the Repository:

    git clone https://github.com/yourusername/docsumm-vision-api.git
    cd docsumm-vision-api
    
  2. Install Dependencies:

    pip install -r requirements.txt
    
  3. Set Environment Variables:

    • Create a .env file or set the HF_TOKEN environment variable with your Hugging Face API token: On Linux: export HF_TOKEN="your-huggingface-api-token" On Windows: set HF_TOKEN=your-huggingface-api-token
  4. Run the Application: uvicorn main:app --reload

Usage

Endpoints

1. Document Summarization & Image Interpretation (/docsum_imginter):

  • Method: POST
  • Form Data:
    • file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
    • task: "summarize" (for documents) or "interpret" (for images)
  • Response:
    • For documents: A summarized file download
    • For images: JSON with a caption field (e.g., {"caption": "A tiger in a forest"})

2. Data Visualization (/generate-visualization):

  • Method: POST
  • Form Data:
    • file: Upload an Excel file (XLSX)
    • task: Description of the desired plot (e.g., "A bar chart of sales by region")
  • Response: The desired python code and a png image file of the generated plot.

3. Text Translation (/translate):

  • Method: POST
  • Form Data:
    • file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT)
    • task: Target language (e.g., French)
  • Response: A translated file download

4. Question Answering (/ask):

  • Method: POST
  • Form Data:
    • file: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
    • task: A question about the file content
  • Response: JSON with an answer field

5. List Processed Files (/processed_files):

  • Method: GET
  • Response: JSON list of processed file names

6. Download Processed File (/download/{filename}):

  • Method: GET
  • Response: File download

Frontend

  • Access the basic frontend at http://localhost:8000/ (serves frontend/index.html).

    1743848842014

Notes

  • API Token: You must have a valid Hugging Face API token (HF_TOKEN) to use the InferenceClient.
  • File Cleanup: Processed files are stored in the processed/ directory; temporary uploads are in updates/ and deleted after image interpretation.
  • Limitations:
    • Visualization supports only Excel files.
    • Summarization supports only files written in english
    • Image interpretation can only be applied to images with no text on them
    • Translation supports the following languages: French, English, Spanish, German, Arabic, Chinese (Mandarin Chinese), Japanese, Russian

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference