Juan Paulo Pérez-Tejada commited on
Commit
7a95605
·
unverified ·
1 Parent(s): f681f38

Add extract pdf content function (#3)

Browse files
Files changed (4) hide show
  1. README.md +7 -1
  2. app.py +9 -2
  3. requirements.txt +1 -1
  4. wk_flow_requirements.txt +1 -0
README.md CHANGED
@@ -14,9 +14,15 @@ An Intelligent Assistant that explains you the content of a PDF file
14
 
15
  ## Deployment
16
 
17
- Deploy in HF with Streamlit
 
 
 
 
18
 
19
  ## Stack
20
 
21
  - Streamlit
22
  - HuggingFace
 
 
 
14
 
15
  ## Deployment
16
 
17
+ Deploy in HF with Streamlit-
18
+
19
+ ## Local
20
+
21
+ Run streamlit run app.py
22
 
23
  ## Stack
24
 
25
  - Streamlit
26
  - HuggingFace
27
+ - Tika: For extracting pdf text
28
+ - Java Runtime
app.py CHANGED
@@ -1,5 +1,12 @@
1
  """ A simple example of Streamlit. """
2
  import streamlit as st
 
3
 
4
- x = st.slider("Select a value")
5
- st.write(x, "squared is", x * x)
 
 
 
 
 
 
 
1
  """ A simple example of Streamlit. """
2
  import streamlit as st
3
+ from tika import parser
4
 
5
+ pdf = st.file_uploader("Upload a file", type="pdf")
6
+
7
+ if st.button("Extract text"):
8
+ if pdf is not None:
9
+ extracted_text = parser.from_file(pdf)
10
+ st.write(extracted_text["content"])
11
+ else:
12
+ st.write("Please upload a file of type: pdf")
requirements.txt CHANGED
@@ -1,6 +1,6 @@
1
  openai
2
  langchain
3
- pdfminer
4
  chromadb
5
  sentence_transformers
6
  streamlit
 
1
  openai
2
  langchain
3
+ tika
4
  chromadb
5
  sentence_transformers
6
  streamlit
wk_flow_requirements.txt CHANGED
@@ -1,2 +1,3 @@
1
  streamlit
 
2
  pylint
 
1
  streamlit
2
+ tika
3
  pylint