Feriel080 commited on
Commit
700265c
·
verified ·
1 Parent(s): f8fc09b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -4
README.md CHANGED
@@ -1,11 +1,125 @@
1
  ---
2
  title: WallTD V.1
3
- emoji: 📊
4
- colorFrom: green
5
- colorTo: yellow
6
  sdk: docker
7
  pinned: false
8
  license: afl-3.0
9
  ---
 
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: WallTD V.1
3
+ emoji: 💻
4
+ colorFrom: purple
5
+ colorTo: purple
6
  sdk: docker
7
  pinned: false
8
  license: afl-3.0
9
  ---
10
+ # WallD-v.1
11
 
12
+ A FastAPI-based application for ***document summarization***, ***image interpretation***, ***data visualization***, and ***text translation*** using state-of-the-art machine learning models from Hugging Face.
13
+
14
+ ## Overview
15
+
16
+ This project provides a web API that allows users to:
17
+
18
+ * **Summarize** documents (DOCX, XLSX, PPTX, PDF, TXT) into concise, factual summaries.
19
+ * **Interpret** images (PNG, JPG, JPEG, WEBP) by generating detailed descriptions.
20
+ * **Generate Visualizations** from Excel data using AI-generated plotting code.
21
+ * **Translate** text from documents into various languages.
22
+ * **Answer Questions** about the content of documents and images.
23
+
24
+ The application leverages models like `BART` for summarization, `Kosmos-2` for image interpretation, `StarCoder` for code generation, `M2M100` for translation, and includes a question-answering capability, all powered by Hugging Face's `Inference` API and `Transformers` library.
25
+
26
+ ## Features
27
+
28
+ * **Document Summarization:** Extracts key points from large documents.
29
+ * **Image Interpretation:** Describes image content, including any visible text.
30
+ * **Data Visualization:** Generates Python plotting code for Excel data using `pandas`, `matplotlib`, and `seaborn`.
31
+ * **Text Translation:** Translates document text into supported languages.
32
+ * **Question Answering:** Answers user questions about document content or image details.
33
+ * **File Management:** Uploads files, processes them, and provides downloadable results.
34
+
35
+ ## Requirements
36
+
37
+ The app needs `python 3.9.11` (visit [python 3.9.11 ](https://www.python.org/downloads/release/python-3911/)to download it).
38
+ All requirements are listed on `requirements.txt`
39
+
40
+ ## Installation
41
+
42
+ 1. **Clone the Repository:**
43
+
44
+ ```
45
+ git clone https://github.com/yourusername/docsumm-vision-api.git
46
+ cd docsumm-vision-api
47
+ ```
48
+ 2. **Install Dependencies:**
49
+
50
+ ```
51
+ pip install -r requirements.txt
52
+ ```
53
+ 3. **Set Environment Variables:**
54
+
55
+ * Create a `.env` file or set the `HF_TOKEN` environment variable with your Hugging Face API token:
56
+ **On Linux:** `export HF_TOKEN="your-huggingface-api-token"`
57
+ **On Windows:** `set HF_TOKEN=your-huggingface-api-token`
58
+ 4. **Run the Application:**
59
+ `uvicorn main:app --reload`
60
+
61
+ ## Usage
62
+
63
+ ### Endpoints
64
+
65
+ **1. Document Summarization & Image Interpretation (`/docsum_imginter`):**
66
+
67
+ * **Method:** POST
68
+ * **Form Data:**
69
+ * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
70
+ * `task`: `"summarize"` (for documents) or `"interpret"` (for images)
71
+ * **Response:**
72
+ * For documents: A summarized file download
73
+ * For images: JSON with a `caption` field (e.g., `{"caption": "A tiger in a forest"}`)
74
+
75
+ **2. Data Visualization (`/generate-visualization`):**
76
+
77
+ * **Method:** POST
78
+ * **Form Data:**
79
+ * `file`: Upload an Excel file (XLSX)
80
+ * `task`: Description of the desired plot (e.g., "A bar chart of sales by region")
81
+ * **Response:** The desired python code and a png image file of the generated plot.
82
+
83
+ **3. Text Translation (`/translate`):**
84
+
85
+ * **Method:** POST
86
+ * **Form Data:**
87
+ * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT)
88
+ * `task`: Target language (e.g., French)
89
+ * **Response:** A translated file download
90
+
91
+ **4. Question Answering (`/ask`):**
92
+
93
+ * **Method:** POST
94
+ * **Form Data:**
95
+ * `file`: Upload a file (DOCX, XLSX, PPTX, PDF, TXT, PNG, JPG, JPEG, WEBP)
96
+ * `task`: A question about the file content
97
+ * **Response:** JSON with an answer field
98
+
99
+ **5. List Processed Files (`/processed_files`):**
100
+
101
+ * **Method:** GET
102
+ * **Response:** JSON list of processed file names
103
+
104
+ **6. Download Processed File (`/download/{filename}`):**
105
+
106
+ * **Method:** GET
107
+ * **Response:** File download
108
+
109
+ ### Frontend
110
+
111
+ * Access the basic frontend at `http://localhost:8000/` (serves `frontend/index.html`).
112
+
113
+ ![1743848842014](image/README/1743848842014.png)
114
+
115
+ ## Notes
116
+
117
+ * **API Token:** You must have a valid Hugging Face API token (`HF_TOKEN`) to use the InferenceClient.
118
+ * **File Cleanup:** Processed files are stored in the `processed/` directory; temporary uploads are in `updates/` and deleted after image interpretation.
119
+ * **Limitations:**
120
+ * Visualization supports only Excel files.
121
+ * Summarization supports only files written in english
122
+ * Image interpretation can only be applied to images with no text on them
123
+ * Translation supports the following languages: *French*, *English*, *Spanish*, *German*, *Arabic*, *Chinese (Mandarin Chinese)*, *Japanese*, *Russian*
124
+
125
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference