Spaces:
Sleeping
Sleeping
| title: WallTD V.1 | |
| emoji: 💻 | |
| colorFrom: purple | |
| colorTo: purple | |
| sdk: docker | |
| pinned: false | |
| license: afl-3.0 | |
| # WallD-v.1 | |
| A FastAPI-based application for ***document summarization***, ***image interpretation***, ***data visualization***, and ***text translation*** using state-of-the-art machine learning models from Hugging Face. | |
| ## Overview | |
| This project provides a web API that allows users to: | |
| * **Summarize** documents (DOCX, XLSX, PPTX, PDF, TXT) into concise, factual summaries. | |
| * **Interpret** images (PNG, JPG, JPEG, WEBP) by generating detailed descriptions. | |
| * **Generate Visualizations** from Excel data using AI-generated plotting code. | |
| * **Translate** text from documents into various languages. | |
| * **Answer Questions** about the content of documents and images. | |
| The application leverages models like `BART` for summarization, `Kosmos-2` for image interpretation, `StarCoder` for code generation, `M2M100` for translation, and includes a question-answering capability, all powered by Hugging Face's `Inference` API and `Transformers` library. | |
| ## Features | |
| * **Document Summarization:** Extracts key points from large documents. | |
| * **Image Interpretation:** Describes image content, including any visible text. | |
| * **Data Visualization:** Generates Python plotting code for Excel data using `pandas`, `matplotlib`, and `seaborn`. | |
| * **Text Translation:** Translates document text into supported languages. | |
| * **Question Answering:** Answers user questions about document content or image details. | |
| * **File Management:** Uploads files, processes them, and provides downloadable results. | |
| ## Requirements | |
| The app needs `python 3.9.11` (visit [python 3.9.11 ](https://www.python.org/downloads/release/python-3911/)to download it). | |
| All requirements are listed on `requirements.txt` | |
| ## Installation | |
| 1. **Clone the Repository:** | |
| ``` | |
| git clone https://github.com/yourusername/docsumm-vision-api.git | |
| cd docsumm-vision-api | |
| ``` | |
| 2. **Install Dependencies:** | |
| ``` | |
| pip install -r requirements.txt | |
| ``` | |
| 3. **Set Environment Variables:** | |
| * Create a `.env` file or set the `HF_TOKEN` environment variable with your Hugging Face API token: | |
| **On Linux:** `export HF_TOKEN="your-huggingface-api-token"` | |
| **On Windows:** `set HF_TOKEN=your-huggingface-api-token` | |
| 4. **Run the Application:** | |
| `uvicorn main:app --reload` | |
| ## Usage | |
| ### Endpoints | |
| **1. Document Summarization & Image Interpretation (`/docsum_imginter`):** | |
| * **Method:** POST | |
| * **Form Data:** | |
| * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP) | |
| * `task`: `"summarize"` (for documents) or `"interpret"` (for images) | |
| * **Response:** | |
| * For documents: A summarized file download | |
| * For images: JSON with a `caption` field (e.g., `{"caption": "A tiger in a forest"}`) | |
| **2. Data Visualization (`/generate-visualization`):** | |
| * **Method:** POST | |
| * **Form Data:** | |
| * `file`: Upload an Excel file (XLSX) | |
| * `task`: Description of the desired plot (e.g., "A bar chart of sales by region") | |
| * **Response:** The desired python code and a png image file of the generated plot. | |
| **3. Text Translation (`/translate`):** | |
| * **Method:** POST | |
| * **Form Data:** | |
| * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT) | |
| * `task`: Target language (e.g., French) | |
| * **Response:** A translated file download | |
| **4. Question Answering (`/ask`):** | |
| * **Method:** POST | |
| * **Form Data:** | |
| * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP) | |
| * `task`: A question about the file content | |
| * **Response:** JSON with an answer field | |
| **5. List Processed Files (`/processed_files`):** | |
| * **Method:** GET | |
| * **Response:** JSON list of processed file names | |
| **6. Download Processed File (`/download/{filename}`):** | |
| * **Method:** GET | |
| * **Response:** File download | |
| ### Frontend | |
| * Access the basic frontend at `http://localhost:8000/` (serves `frontend/index.html`). | |
|  | |
| ## Notes | |
| * **API Token:** You must have a valid Hugging Face API token (`HF_TOKEN`) to use the InferenceClient. | |
| * **File Cleanup:** Processed files are stored in the `processed/` directory; temporary uploads are in `updates/` and deleted after image interpretation. | |
| * **Limitations:** | |
| * Visualization supports only Excel files. | |
| * Summarization supports only files written in english | |
| * Image interpretation can only be applied to images with no text on them | |
| * Translation supports the following languages: *French*, *English*, *Spanish*, *German*, *Arabic*, *Chinese (Mandarin Chinese)*, *Japanese*, *Russian* | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |