Genipapo API Guide
This guide provides instructions on how to use the Genipapo Parser API for processing Brazilian Portuguese text in CoNLL-U format.
All the examples provided in this guide were extracted from the Porttinari Base corpus, part of the Poetisa project.
Endpoints
- POST /api/process - Process a
.conllufile. - POST /api/process/json - Process raw
.conllucontent in JSON format.
1. Process a File
Use the /api/process endpoint to upload a .conllu file. The endpoint accepts the following parameter:
- response_format (optional): Set to
jsonto return processed content as JSON. Defaults tofile.
1.1 Example: Returning a File
When response_format is set to file, the processed content is returned as a downloadable
.conllu file. Specify the output filename using --output.
curl -X POST -H "Content-Type: multipart/form-data" \
-F "file=@example.conllu" \
"https://genipapo-parser.azurewebsites.net/api/process?response_format=file" \
--output processed_example.conllu
1.2 Example: Returning JSON
When response_format is set to json, the processed content is returned in JSON format.
curl -X POST -H "Content-Type: multipart/form-data" \
-F "file=@example.conllu" \
"https://genipapo-parser.azurewebsites.net/api/process?response_format=json"
Example JSON Response:
{
"status": "success",
"warnings": [],
"processed_content": "# sent_id = FOLHA_DOC000123_SENT016\n# text = O Capitão América também bajulou o tucano.\n1\tO\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t2\tdet\t_\t_\n2\tCapitão\tCapitão\tPROPN\t_\t_\t5\tnsubj\t_\t_\n3\tAmérica\tAmérica\tPROPN\t_\t_\t2\tflat:name\t_\t_\n4\ttambém\ttambém\tADV\t_\t_\t5\tadvmod\t_\t_\n5\tbajulou\tbajular\tVERB\t_\tMood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin\t0\troot\t_\t_\n6\to\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t7\tdet\t_\t_\n7\ttucano\ttucano\tNOUN\t_\tGender=Masc|Number=Sing\t5\tobj\t_\tSpaceAfter=No\n8\t.\t.\tPUNCT\t_\t_\t5\tpunct\t_\tSpaceAfter=No\n"
}
2. Process Raw Content
Use the /api/process/json endpoint to send raw CoNLL-U content as JSON. Include the content
in the content field of the JSON body.
curl -X POST -H "Content-Type: application/json" \
-d '{"content": "# sent_id = FOLHA_DOC000123_SENT016
# text = O Capitão América também bajulou o tucano.
1\tO\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t_\t_\t_\t_
2\tCapitão\tCapitão\tPROPN\t_\t_\t_\t_\t_\t_
3\tAmérica\tAmérica\tPROPN\t_\t_\t_\t_\t_\t_
4\ttambém\ttambém\tADV\t_\t_\t_\t_\t_\t_
5\tbajulou\tbajular\tVERB\t_\tMood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin\t_\t_\t_\t_
6\to\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t_\t_\t_\t_
7\ttucano\ttucano\tNOUN\t_\tGender=Masc|Number=Sing\t_\t_\t_\tSpaceAfter=No
8\t.\t.\tPUNCT\t_\t_\t_\t_\t_\tSpaceAfter=No"}' \
"http://localhost:8000/api/process/json"
Example JSON Response:
{
"status": "success",
"warnings": [],
"processed_content": "# sent_id = FOLHA_DOC000123_SENT016\n# text = O Capitão América também bajulou o tucano.\n1\tO\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t2\tdet\t_\t_\n2\tCapitão\tCapitão\tPROPN\t_\t_\t5\tnsubj\t_\t_\n3\tAmérica\tAmérica\tPROPN\t_\t_\t2\tflat:name\t_\t_\n4\ttambém\ttambém\tADV\t_\t_\t5\tadvmod\t_\t_\n5\tbajulou\tbajular\tVERB\t_\tMood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin\t0\troot\t_\t_\n6\to\to\tDET\t_\tDefinite=Def|Gender=Masc|Number=Sing|PronType=Art\t7\tdet\t_\t_\n7\ttucano\ttucano\tNOUN\t_\tGender=Masc|Number=Sing\t5\tobj\t_\tSpaceAfter=No\n8\t.\t.\tPUNCT\t_\t_\t5\tpunct\t_\tSpaceAfter=No\n"
}
Example with Input and Output
Original Input
# sent_id = FOLHA_DOC000123_SENT016
# text = O Capitão América também bajulou o tucano.
1 O o DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art _ _ _ _
2 Capitão Capitão PROPN _ _ _ _ _ _
3 América América PROPN _ _ _ _ _ _
4 também também ADV _ _ _ _ _ _
5 bajulou bajular VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin _ _ _ _
6 o o DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art _ _ _ _
7 tucano tucano NOUN _ Gender=Masc|Number=Sing _ _ _ SpaceAfter=No
8 . . PUNCT _ _ _ _ _ SpaceAfter=No
Processed Output
# sent_id = FOLHA_DOC000123_SENT016
# text = O Capitão América também bajulou o tucano.
1 O o DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 2 det _ _
2 Capitão Capitão PROPN _ _ 5 nsubj _ _
3 América América PROPN _ _ 2 flat:name _ _
4 também também ADV _ _ 5 advmod _ _
5 bajulou bajular VERB _ Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin 0 root _ _
6 o o DET _ Definite=Def|Gender=Masc|Number=Sing|PronType=Art 7 det _ _
7 tucano tucano NOUN _ Gender=Masc|Number=Sing 5 obj _ SpaceAfter=No
8 . . PUNCT _ _ 5 punct _ SpaceAfter=No
Contact
For further assistance, please contact us.