File size: 6,853 Bytes
a2c0a02
 
 
 
 
 
 
 
 
 
 
ffdedc7
 
 
 
3e3bf83
ffdedc7
 
 
 
 
 
 
 
 
 
 
 
 
3e3bf83
 
ffdedc7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
---
title: Genipapo Parser - Web Version
emoji: 🧉
colorFrom: green
colorTo: blue
sdk: docker
pinned: false
license: apache-2.0
short_description: Parser/API para processamento de arquivos CoNLL-U na web.
---

# Genipapo Web

**Genipapo Web** is a lightweight web-based interface for the **Genipapo Parser**, enabling users to validate and process `.conllu` files directly in their browser. This repository simplifies the deployment of the Genipapo Parser's web version using Docker.

---

## Purpose

This project provides an accessible interface for the **Genipapo Parser**, allowing users to:

- **Validate and Parse** `.conllu` files directly in their web browser.
- **Easily Deploy** the parser via Docker, without requiring a complex local setup.
- **Build a local API version** of the parser, allowing local requisitions in a faster manner.

For details on the **Genipapo Parser** itself, visit the main repository:

[Genipapo Parser GitHub Repository](https://github.com/bryankhelven/genipapo)

---

## Features

1. **Web-Based Interface**:
   - Upload `.conllu` files for validation and parsing.
   - Download parsed files with updated dependency relations.
   - View warnings and errors for `.conllu` file validation.

2. **Dockerized Deployment**:
   - Simplified setup with a single Docker command.
   - No local installation of dependencies required.

3. **Reference to Genipapo Parser**:
   - Built on the Genipapo Parser, a multigenre dependency parser for Brazilian Portuguese.

---

## Prerequisites

- **Docker**: Ensure Docker is installed on your system. [Download Docker](https://www.docker.com/products/docker-desktop)
- **Python 3.7+** (only needed to prepare resources before building the Docker image)

---

## Installation and Setup

### 1. Clone the Repository

```bash
git clone https://github.com/bryankhelven/genipapo_web.git
cd genipapo_web
```

### 2. Download Resources

Run the following script to download the necessary resources and models:

```bash
python download_resources.py
```

This will place the resources and model files in their respective folders:
- `stanza_resources/`
- `models/`

### 3. Build the Docker Image

Build the Docker image using the following command:

```bash
docker build -t genipapo-web .
```

### 4. Run the Docker Container

Run the container and expose the application on port `8000`:

```bash
docker run -it -p 8000:8000 genipapo-web
```

### 5. Access the Application

Open your browser and navigate to:

```text
http://localhost:8000/
```

---

## API Usage

### Endpoints

- **POST /api/process** - Process a `.conllu` file.
- **POST /api/process/json** - Process raw `.conllu` content in JSON format.

### 1. Process a File

Use the `/api/process` endpoint to upload a `.conllu` file.

#### Parameters:

- **response_format** (optional): Set to `json` to return processed content as JSON. Defaults to `file`.

#### Example: Returning a File

When `response_format` is set to `file`, the processed content is returned as a downloadable `.conllu` file.

```bash
curl -X POST -H "Content-Type: multipart/form-data" \
-F "file=@example.conllu" \
"http://localhost:8000/api/process?response_format=file" \
--output processed_example.conllu
```

#### Example: Returning JSON

When `response_format` is set to `json`, the processed content is returned in JSON format.

```bash
curl -X POST -H "Content-Type: multipart/form-data" \
-F "file=@example.conllu" \
"http://localhost:8000/api/process?response_format=json"
```

Example JSON Response:

```json
{
    "status": "success",
    "warnings": [],
    "processed_content": "# sent_id = FOLHA_DOC000123_SENT016\n# text = O Capit\u00e3o Am\u00e9rica tamb\u00e9m bajulou o tucano.\n1\tO\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t2\tdet\t_\t_\n2\tCapit\u00e3o\tCapit\u00e3o\tPROPN\t_\t_\t5\tnsubj\t_\t_\n3\tAm\u00e9rica\tAm\u00e9rica\tPROPN\t_\t_\t2\tflat:name\t_\t_\n4\ttamb\u00e9m\ttamb\u00e9m\tADV\t_\t_\t5\tadvmod\t_\t_\n5\tbajulou\tbajular\tVERB\t_\tMood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin\t0\troot\t_\t_\n6\to\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t7\tdet\t_\t_\n7\ttucano\ttucano\tNOUN\t_\tGender=Masc|Number=Sing\t5\tobj\t_\tSpaceAfter=No\n8\t.\t.\tPUNCT\t_\t_\t5\tpunct\t_\tSpaceAfter=No\n"
}
```

### 2. Process Raw Content

Use the `/api/process/json` endpoint to send raw CoNLL-U content as JSON.

#### Example:

```bash
curl -X POST -H "Content-Type: application/json" \
-d '{"content": "# sent_id = FOLHA_DOC000123_SENT016\n# text = O Capit\u00e3o Am\u00e9rica tamb\u00e9m bajulou o tucano.\n1\tO\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t_\t_\t_\t_\n2\tCapit\u00e3o\tCapit\u00e3o\tPROPN\t_\t_\t_\t_\t_\t_\n3\tAm\u00e9rica\tAm\u00e9rica\tPROPN\t_\t_\t_\t_\t_\t_\n4\ttamb\u00e9m\ttamb\u00e9m\tADV\t_\t_\t_\t_\t_\t_\n5\tbajulou\tbajular\tVERB\t_\tMood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin\t_\t_\t_\t_\n6\to\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t_\t_\t_\t_\n7\ttucano\ttucano\tNOUN\t_\tGender=Masc|Number=Sing\t_\t_\t_\tSpaceAfter=No\n8\t.\t.\tPUNCT\t_\t_\t_\t_\t_\tSpaceAfter=No"}' \
"http://localhost:8000/api/process/json"
```

Example JSON Response:

```json
{
    "status": "success",
    "warnings": [],
    "processed_content": "# sent_id = FOLHA_DOC000123_SENT016\n# text = O Capit\u00e3o Am\u00e9rica tamb\u00e9m bajulou o tucano.\n1\tO\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t2\tdet\t_\t_\n2\tCapit\u00e3o\tCapit\u00e3o\tPROPN\t_\t_\t5\tnsubj\t_\t_\n3\tAm\u00e9rica\tAm\u00e9rica\tPROPN\t_\t_\t2\tflat:name\t_\t_\n4\ttamb\u00e9m\ttamb\u00e9m\tADV\t_\t_\t5\tadvmod\t_\t_\n5\tbajulou\tbajular\tVERB\t_\tMood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin\t0\troot\t_\t_\n6\to\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t7\tdet\t_\t_\n7\ttucano\ttucano\tNOUN\t_\tGender=Masc|Number=Sing\t5\tobj\t_\tSpaceAfter=No\n8\t.\t.\tPUNCT\t_\t_\t5\tpunct\t_\tSpaceAfter=No\n"
}
```

---

## Acknowledgments

- This work was carried out at the [Center for Artificial Intelligence of the University of São Paulo (C4AI)](http://c4ai.inova.usp.br/), supported by the São Paulo Research Foundation (FAPESP grant #2019/07665-4) and the IBM Corporation.
- The project was supported by the Ministry of Science, Technology and Innovation, with resources of Law N. 8.248, of October 23, 1991, within the scope of PPI-SOFTEX, coordinated by Softex and published as Residence in TIC 13, DOU 01245.010222/2022-44.
- **Genipapo** was developed using the [Stanza library](https://stanfordnlp.github.io/stanza/), courtesy of the Stanford NLP Group.

---

## Contact

For inquiries, suggestions, or bug reports, reach out to:

- **Email**: [bryankhelven@ieee.org](mailto:bryankhelven@ieee.org)
- **Main Parser Repository**: [Genipapo Parser](https://github.com/bryankhelven/genipapo)