afortuny commited on
Commit
fc85650
·
1 Parent(s): 4973591
Files changed (1) hide show
  1. README.md +84 -2
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: CitingLLM
3
  emoji: 🏢
4
  colorFrom: pink
5
  colorTo: purple
@@ -11,4 +11,86 @@ license: mit
11
  short_description: Assist you to match bucnhes of text with your reference note
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Research Notes Matcher
3
  emoji: 🏢
4
  colorFrom: pink
5
  colorTo: purple
 
11
  short_description: Assist you to match bucnhes of text with your reference note
12
  ---
13
 
14
+
15
+ # Research Notes Matcher
16
+
17
+ ## Overview
18
+
19
+ The **Research Notes Matcher** is a web application designed to help users find relevant research notes based on a given text input. By leveraging text similarity algorithms, this application allows users to upload a CSV file containing research notes and retrieve the top five notes that most closely match their input.
20
+
21
+ ## Features
22
+
23
+ - **File Upload:** Users can upload a CSV file containing their research notes.
24
+ - **Text Input:** Users can enter a free text that describes their query or topic of interest.
25
+ - **Top 5 Matching Entries:** The application outputs the five most relevant notes, along with their sources and sections, based on text similarity.
26
+
27
+ ## Requirements
28
+
29
+ To run this application, you need the following Python libraries:
30
+
31
+ - `gradio`: For creating the web interface.
32
+ - `pandas`: For data manipulation and handling CSV files.
33
+ - `scikit-learn`: For text processing and calculating similarity.
34
+
35
+ You can install these libraries using pip:
36
+
37
+ ```bash
38
+ pip install gradio pandas scikit-learn
39
+ ```
40
+
41
+ ## Input File Format
42
+
43
+ The input file must be a CSV file containing the following columns:
44
+
45
+ - **Source**: A string representing the source of the research note (e.g., author names, book title, etc.).
46
+ - **Section**: A string representing the section or chapter title related to the note.
47
+ - **Notes**: A string containing the actual content of the research note.
48
+
49
+ ### Example of CSV Structure
50
+
51
+ ```plaintext
52
+ Source,Section,Notes
53
+ "Author Name, Book Title","Chapter 1: Introduction","This is the content of the first note..."
54
+ "Another Author, Another Book","Chapter 2: Background","This note discusses background information..."
55
+ ```
56
+
57
+ ## How to Use
58
+
59
+ 1. **Upload the CSV file**: Click on the upload button and select a CSV file containing your research notes.
60
+ 2. **Enter your text**: In the provided text box, type the content or query related to the research topic you are interested in.
61
+ 3. **Submit**: Click the "Submit" button to process your input.
62
+ 4. **View Results**: The application will display the top five matching entries based on cosine similarity, formatted for easy reading.
63
+
64
+ ### Output Format
65
+
66
+ The output will consist of the top five matching notes presented in a readable format, which includes:
67
+
68
+ - **Notes**: The content of the matching note.
69
+ - **Source**: The source from which the note is taken.
70
+ - **Section**: The section or chapter related to the note.
71
+
72
+ Each entry will be separated by a line for clarity, as shown below:
73
+
74
+ ```
75
+ **Notes:** This is the content of the matching note...
76
+ **Source:** Author Name, Book Title
77
+ **Section:** Chapter 1: Introduction
78
+ -------------------------------------
79
+ ```
80
+
81
+ ## Technical Explanation
82
+
83
+ The application works by:
84
+
85
+ 1. **Uploading and Reading the CSV File**: The user uploads a CSV file, which is read into a DataFrame using `pandas`.
86
+ 2. **Data Validation**: The application checks that the necessary columns ('Source', 'Section', 'Notes') are present and handles any missing values by replacing them with empty strings.
87
+ 3. **Text Processing**: The notes and sections are combined into a single text column, which is then vectorized using `TfidfVectorizer` to create a matrix of TF-IDF features.
88
+ 4. **Cosine Similarity Calculation**: The application calculates the cosine similarity between the user’s input and the notes using the vectorized representations. This identifies the five notes most similar to the input text.
89
+ 5. **Formatting the Output**: The results are formatted in a user-friendly manner, making it easy for the user to read and understand the top matching entries.
90
+
91
+ ## Conclusion
92
+
93
+ The Research Notes Matcher is a powerful tool for quickly finding relevant research notes based on a user's input. It simplifies the process of sifting through large amounts of information, making it easier to find insights and connections in research literature.
94
+
95
+ ---
96
+