Haonan Liu commited on
Commit Β·
248ece2
1
Parent(s): 2e7eced
update app and add doc
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
title: GPTagger
|
| 3 |
-
emoji:
|
| 4 |
colorFrom: red
|
| 5 |
colorTo: pink
|
| 6 |
sdk: gradio
|
|
@@ -11,3 +11,80 @@ license: gpl-3.0
|
|
| 11 |
---
|
| 12 |
|
| 13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: GPTagger
|
| 3 |
+
emoji: π·οΈ
|
| 4 |
colorFrom: red
|
| 5 |
colorTo: pink
|
| 6 |
sdk: gradio
|
|
|
|
| 11 |
---
|
| 12 |
|
| 13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 14 |
+
|
| 15 |
+
# [GPTagger](https://github.com/hnliu-git/GPTagger) :label:
|
| 16 |
+
|
| 17 |
+
GPT Tagger is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT. However, using GPT as a text tagger is not a trivial task. GPT has the tendency to generate non-existing, fabricated, or processed text. To mitigate this issue, GPT Tagger provides a reliable method to ensure that the generated tags are derived from the input text while allowing GPT to process the extracted tags to some extent.
|
| 18 |
+
|
| 19 |
+
Below is an example of how GPT may respond wrong.
|
| 20 |
+
|
| 21 |
+
```md
|
| 22 |
+
Text: "I earn $1000 this week!"
|
| 23 |
+
Prompt: "Extract how much he/she earns"
|
| 24 |
+
|
| 25 |
+
# Non-existent text
|
| 26 |
+
GPT: "one thousand dollar"
|
| 27 |
+
# Make-up text
|
| 28 |
+
GPT: "$999999"
|
| 29 |
+
# Processed text
|
| 30 |
+
GPT: "$1,000"
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
## Introduction
|
| 34 |
+
|
| 35 |
+
These incorrect responses highlight the importance of using a reliable tag extraction tool like GPT Tagger. To do that, GPT Tagger follows a set of main steps:
|
| 36 |
+
1. π΅οΈββοΈ Extraction: GPT Tagger sniffs out all possible tags by following your instructions to GPT.
|
| 37 |
+
2. π Indexing: It spots the exact locations of these tags within the text.
|
| 38 |
+
3. β
Validator: GPT Tagger's trusty validator steps in to validate if the extracted tags pass the rule-based and ML-based checks.
|
| 39 |
+
|
| 40 |
+
Check the example above how we extract ingredients from a yummy recipe text. π
|
| 41 |
+
|
| 42 |
+
## Features β¨
|
| 43 |
+
|
| 44 |
+
### Scale up GPT annotators and use switch between GPT3.5 and GPT4 easily
|
| 45 |
+
- Want to have a higher precision? try using GPT-4!
|
| 46 |
+
- Want to have a higher recall? Scale up the number of GPT annotators!
|
| 47 |
+
|
| 48 |
+
### Instead of making a perfect prompt, use validator to shave off bad extractions
|
| 49 |
+
- Simple validator: Length, Regex...
|
| 50 |
+
- ML validator: GPT validator (Consider it like a chain of GPTs!)
|
| 51 |
+
|
| 52 |
+
## How to Use π
|
| 53 |
+
|
| 54 |
+
### Setup
|
| 55 |
+
|
| 56 |
+
```shell
|
| 57 |
+
make install
|
| 58 |
+
export OPENAI_API_KEY=<your-key>
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
### Pre-defined NER pipeline
|
| 62 |
+
|
| 63 |
+
The easiest way to dive into the GPT Tagger is through the Gradio web demo! Fire it up with a single command:
|
| 64 |
+
```shell
|
| 65 |
+
poetry run python GPTagger/app.py
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
If you prefer having the power of GPT Tagger at your fingertips in Python, check out this snippet:
|
| 69 |
+
|
| 70 |
+
```python
|
| 71 |
+
from pathlib import Path
|
| 72 |
+
from GPTagger import *
|
| 73 |
+
|
| 74 |
+
cfg = NerConfig(
|
| 75 |
+
tag_name='date',
|
| 76 |
+
tag_regex=r"\d",
|
| 77 |
+
tag_max_len=128,
|
| 78 |
+
)
|
| 79 |
+
prompt = PromptTemplate.from_template(Path('<path-to-prompt>').read_text())
|
| 80 |
+
pipeline = NerPipeline.from_config(cfg)
|
| 81 |
+
|
| 82 |
+
doc = Path('<path-to-doc>').read_text()
|
| 83 |
+
tags = pipeline(doc, prompt)
|
| 84 |
+
```
|
| 85 |
+
|
| 86 |
+
### Build Custom Pipelines π
|
| 87 |
+
|
| 88 |
+
We believe that the possibilities of using GPT as a text tagger are endless! We invite you to contribute your own custom pipelines. Together, we'll unlock the true potential of GPT Tagger and make text tagging an better experience.
|
| 89 |
+
|
| 90 |
+
Leave a star if you find GPTagger is useful for your product or company! π
|
app.py
CHANGED
|
@@ -1,3 +1,4 @@
|
|
|
|
|
| 1 |
import gradio as gr
|
| 2 |
|
| 3 |
from GPTagger import *
|
|
@@ -14,20 +15,20 @@ TEXT:
|
|
| 14 |
|
| 15 |
def ner(
|
| 16 |
model: str,
|
| 17 |
-
|
| 18 |
tag_name: str,
|
| 19 |
tag_max_len: int,
|
| 20 |
text: str,
|
| 21 |
prompt: str,
|
|
|
|
| 22 |
):
|
| 23 |
-
|
|
|
|
| 24 |
tag_name=tag_name,
|
|
|
|
| 25 |
model=model,
|
| 26 |
-
|
| 27 |
-
tag_max_len=tag_max_len,
|
| 28 |
)
|
| 29 |
-
|
| 30 |
-
ner_pipeline = NerPipeline.from_config(cfg)
|
| 31 |
template = PromptTemplate.from_template(prompt)
|
| 32 |
|
| 33 |
extractions = ner_pipeline(text, template, "")
|
|
@@ -42,26 +43,36 @@ def ner(
|
|
| 42 |
return {"text": text, "entities": output}
|
| 43 |
|
| 44 |
|
| 45 |
-
with gr.Blocks(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
with gr.Row():
|
| 47 |
-
tag_name = gr.Textbox(label="
|
| 48 |
tag_max_len = gr.Slider(
|
| 49 |
-
minimum=10, maximum=1000, step=10, label="
|
| 50 |
)
|
| 51 |
with gr.Row():
|
| 52 |
model = gr.Dropdown(
|
| 53 |
["gpt-3.5-turbo-0613", "gpt-4-0613"],
|
| 54 |
-
label="
|
| 55 |
value="gpt-3.5-turbo-0613",
|
| 56 |
)
|
| 57 |
nr_call = gr.Number(label="nr_of_calls", minimum=1, value=1, precision=0)
|
| 58 |
with gr.Row():
|
| 59 |
prompt = gr.TextArea(
|
| 60 |
placeholder="Enter your prompt here...",
|
| 61 |
-
label="prompt",
|
| 62 |
value=default_prompt,
|
| 63 |
)
|
| 64 |
-
text = gr.TextArea(placeholder="Enter your text here...", label="
|
| 65 |
btn = gr.Button("Submit")
|
| 66 |
output = gr.HighlightedText()
|
| 67 |
btn.click(
|
|
@@ -73,6 +84,7 @@ with gr.Blocks(theme=gr.themes.Default(text_size=gr.themes.sizes.text_lg)) as de
|
|
| 73 |
tag_max_len,
|
| 74 |
text,
|
| 75 |
prompt,
|
|
|
|
| 76 |
],
|
| 77 |
outputs=output,
|
| 78 |
)
|
|
|
|
| 1 |
+
import os
|
| 2 |
import gradio as gr
|
| 3 |
|
| 4 |
from GPTagger import *
|
|
|
|
| 15 |
|
| 16 |
def ner(
|
| 17 |
model: str,
|
| 18 |
+
nr_calls: int,
|
| 19 |
tag_name: str,
|
| 20 |
tag_max_len: int,
|
| 21 |
text: str,
|
| 22 |
prompt: str,
|
| 23 |
+
key: str,
|
| 24 |
):
|
| 25 |
+
os.environ['OPENAI_API_KEY'] = key
|
| 26 |
+
ner_pipeline = NerPipeline(
|
| 27 |
tag_name=tag_name,
|
| 28 |
+
nr_calls=nr_calls,
|
| 29 |
model=model,
|
| 30 |
+
tag_max_len=tag_max_len
|
|
|
|
| 31 |
)
|
|
|
|
|
|
|
| 32 |
template = PromptTemplate.from_template(prompt)
|
| 33 |
|
| 34 |
extractions = ner_pipeline(text, template, "")
|
|
|
|
| 43 |
return {"text": text, "entities": output}
|
| 44 |
|
| 45 |
|
| 46 |
+
with gr.Blocks() as demo:
|
| 47 |
+
gr.Markdown(
|
| 48 |
+
"""
|
| 49 |
+
# GPTagger π·οΈ
|
| 50 |
+
|
| 51 |
+
[GPTagger](https://github.com/hnliu-git/GPTagger) is a powerful text tagger that makes use of the GPT model. This tool allows you to extract tags from a given text by leveraging the capabilities of GPT.
|
| 52 |
+
Simply specify the tag you want to extract from the text using prompt, you will get them highlighted in the output.
|
| 53 |
+
"""
|
| 54 |
+
)
|
| 55 |
+
with gr.Row():
|
| 56 |
+
key = gr.Textbox(label='OpenAI API Key:')
|
| 57 |
with gr.Row():
|
| 58 |
+
tag_name = gr.Textbox(label="Tag Name:", placeholder='Enter the tag you want to extract')
|
| 59 |
tag_max_len = gr.Slider(
|
| 60 |
+
minimum=10, maximum=1000, step=10, label="Max length of a tag", value=50
|
| 61 |
)
|
| 62 |
with gr.Row():
|
| 63 |
model = gr.Dropdown(
|
| 64 |
["gpt-3.5-turbo-0613", "gpt-4-0613"],
|
| 65 |
+
label="Model Name:",
|
| 66 |
value="gpt-3.5-turbo-0613",
|
| 67 |
)
|
| 68 |
nr_call = gr.Number(label="nr_of_calls", minimum=1, value=1, precision=0)
|
| 69 |
with gr.Row():
|
| 70 |
prompt = gr.TextArea(
|
| 71 |
placeholder="Enter your prompt here...",
|
| 72 |
+
label="Prompt: (Please include the default prompt at the end)",
|
| 73 |
value=default_prompt,
|
| 74 |
)
|
| 75 |
+
text = gr.TextArea(placeholder="Enter your text here...", label="Text")
|
| 76 |
btn = gr.Button("Submit")
|
| 77 |
output = gr.HighlightedText()
|
| 78 |
btn.click(
|
|
|
|
| 84 |
tag_max_len,
|
| 85 |
text,
|
| 86 |
prompt,
|
| 87 |
+
key
|
| 88 |
],
|
| 89 |
outputs=output,
|
| 90 |
)
|