readme
Browse files
README.md
CHANGED
|
@@ -1,5 +1,5 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
emoji: 🏢
|
| 4 |
colorFrom: pink
|
| 5 |
colorTo: purple
|
|
@@ -11,4 +11,86 @@ license: mit
|
|
| 11 |
short_description: Assist you to match bucnhes of text with your reference note
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: Research Notes Matcher
|
| 3 |
emoji: 🏢
|
| 4 |
colorFrom: pink
|
| 5 |
colorTo: purple
|
|
|
|
| 11 |
short_description: Assist you to match bucnhes of text with your reference note
|
| 12 |
---
|
| 13 |
|
| 14 |
+
|
| 15 |
+
# Research Notes Matcher
|
| 16 |
+
|
| 17 |
+
## Overview
|
| 18 |
+
|
| 19 |
+
The **Research Notes Matcher** is a web application designed to help users find relevant research notes based on a given text input. By leveraging text similarity algorithms, this application allows users to upload a CSV file containing research notes and retrieve the top five notes that most closely match their input.
|
| 20 |
+
|
| 21 |
+
## Features
|
| 22 |
+
|
| 23 |
+
- **File Upload:** Users can upload a CSV file containing their research notes.
|
| 24 |
+
- **Text Input:** Users can enter a free text that describes their query or topic of interest.
|
| 25 |
+
- **Top 5 Matching Entries:** The application outputs the five most relevant notes, along with their sources and sections, based on text similarity.
|
| 26 |
+
|
| 27 |
+
## Requirements
|
| 28 |
+
|
| 29 |
+
To run this application, you need the following Python libraries:
|
| 30 |
+
|
| 31 |
+
- `gradio`: For creating the web interface.
|
| 32 |
+
- `pandas`: For data manipulation and handling CSV files.
|
| 33 |
+
- `scikit-learn`: For text processing and calculating similarity.
|
| 34 |
+
|
| 35 |
+
You can install these libraries using pip:
|
| 36 |
+
|
| 37 |
+
```bash
|
| 38 |
+
pip install gradio pandas scikit-learn
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Input File Format
|
| 42 |
+
|
| 43 |
+
The input file must be a CSV file containing the following columns:
|
| 44 |
+
|
| 45 |
+
- **Source**: A string representing the source of the research note (e.g., author names, book title, etc.).
|
| 46 |
+
- **Section**: A string representing the section or chapter title related to the note.
|
| 47 |
+
- **Notes**: A string containing the actual content of the research note.
|
| 48 |
+
|
| 49 |
+
### Example of CSV Structure
|
| 50 |
+
|
| 51 |
+
```plaintext
|
| 52 |
+
Source,Section,Notes
|
| 53 |
+
"Author Name, Book Title","Chapter 1: Introduction","This is the content of the first note..."
|
| 54 |
+
"Another Author, Another Book","Chapter 2: Background","This note discusses background information..."
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## How to Use
|
| 58 |
+
|
| 59 |
+
1. **Upload the CSV file**: Click on the upload button and select a CSV file containing your research notes.
|
| 60 |
+
2. **Enter your text**: In the provided text box, type the content or query related to the research topic you are interested in.
|
| 61 |
+
3. **Submit**: Click the "Submit" button to process your input.
|
| 62 |
+
4. **View Results**: The application will display the top five matching entries based on cosine similarity, formatted for easy reading.
|
| 63 |
+
|
| 64 |
+
### Output Format
|
| 65 |
+
|
| 66 |
+
The output will consist of the top five matching notes presented in a readable format, which includes:
|
| 67 |
+
|
| 68 |
+
- **Notes**: The content of the matching note.
|
| 69 |
+
- **Source**: The source from which the note is taken.
|
| 70 |
+
- **Section**: The section or chapter related to the note.
|
| 71 |
+
|
| 72 |
+
Each entry will be separated by a line for clarity, as shown below:
|
| 73 |
+
|
| 74 |
+
```
|
| 75 |
+
**Notes:** This is the content of the matching note...
|
| 76 |
+
**Source:** Author Name, Book Title
|
| 77 |
+
**Section:** Chapter 1: Introduction
|
| 78 |
+
-------------------------------------
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## Technical Explanation
|
| 82 |
+
|
| 83 |
+
The application works by:
|
| 84 |
+
|
| 85 |
+
1. **Uploading and Reading the CSV File**: The user uploads a CSV file, which is read into a DataFrame using `pandas`.
|
| 86 |
+
2. **Data Validation**: The application checks that the necessary columns ('Source', 'Section', 'Notes') are present and handles any missing values by replacing them with empty strings.
|
| 87 |
+
3. **Text Processing**: The notes and sections are combined into a single text column, which is then vectorized using `TfidfVectorizer` to create a matrix of TF-IDF features.
|
| 88 |
+
4. **Cosine Similarity Calculation**: The application calculates the cosine similarity between the user’s input and the notes using the vectorized representations. This identifies the five notes most similar to the input text.
|
| 89 |
+
5. **Formatting the Output**: The results are formatted in a user-friendly manner, making it easy for the user to read and understand the top matching entries.
|
| 90 |
+
|
| 91 |
+
## Conclusion
|
| 92 |
+
|
| 93 |
+
The Research Notes Matcher is a powerful tool for quickly finding relevant research notes based on a user's input. It simplifies the process of sifting through large amounts of information, making it easier to find insights and connections in research literature.
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|