Update README.md
Browse files
README.md
CHANGED
|
@@ -11,86 +11,58 @@ license: mit
|
|
| 11 |
short_description: Assist you to match bucnhes of text with your reference note
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
| 15 |
# Research Notes Matcher
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
The **Research Notes Matcher** is a web application designed to help users find relevant research notes based on a given text input. By leveraging text similarity algorithms, this application allows users to upload a CSV file containing research notes and retrieve the top five notes that most closely match their input.
|
| 20 |
|
| 21 |
## Features
|
| 22 |
|
| 23 |
-
- **
|
| 24 |
-
- **Text Input
|
| 25 |
-
- **
|
| 26 |
|
| 27 |
## Requirements
|
| 28 |
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
- `gradio`: For creating the web interface.
|
| 32 |
-
- `pandas`: For data manipulation and handling CSV files.
|
| 33 |
-
- `scikit-learn`: For text processing and calculating similarity.
|
| 34 |
-
|
| 35 |
-
You can install these libraries using pip:
|
| 36 |
|
| 37 |
```bash
|
| 38 |
-
pip install gradio pandas scikit-learn
|
| 39 |
```
|
| 40 |
|
| 41 |
-
##
|
| 42 |
-
|
| 43 |
-
The input file must be a CSV file containing the following columns:
|
| 44 |
-
|
| 45 |
-
- **Source**: A string representing the source of the research note (e.g., author names, book title, etc.).
|
| 46 |
-
- **Section**: A string representing the section or chapter title related to the note.
|
| 47 |
-
- **Notes**: A string containing the actual content of the research note.
|
| 48 |
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
| 50 |
|
| 51 |
-
|
| 52 |
-
Source,Section,Notes
|
| 53 |
-
"Author Name, Book Title","Chapter 1: Introduction","This is the content of the first note..."
|
| 54 |
-
"Another Author, Another Book","Chapter 2: Background","This note discusses background information..."
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
## How to Use
|
| 58 |
-
|
| 59 |
-
1. **Upload the CSV file**: Click on the upload button and select a CSV file containing your research notes.
|
| 60 |
-
2. **Enter your text**: In the provided text box, type the content or query related to the research topic you are interested in.
|
| 61 |
-
3. **Submit**: Click the "Submit" button to process your input.
|
| 62 |
-
4. **View Results**: The application will display the top five matching entries based on cosine similarity, formatted for easy reading.
|
| 63 |
|
| 64 |
-
|
| 65 |
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
-
|
| 69 |
-
- **Source**: The source from which the note is taken.
|
| 70 |
-
- **Section**: The section or chapter related to the note.
|
| 71 |
|
| 72 |
-
|
| 73 |
|
| 74 |
-
```
|
| 75 |
-
|
| 76 |
-
**Source:** Author Name, Book Title
|
| 77 |
-
**Section:** Chapter 1: Introduction
|
| 78 |
-
-------------------------------------
|
| 79 |
```
|
| 80 |
|
| 81 |
-
|
| 82 |
|
| 83 |
-
|
| 84 |
|
| 85 |
-
|
| 86 |
-
2. **Data Validation**: The application checks that the necessary columns ('Source', 'Section', 'Notes') are present and handles any missing values by replacing them with empty strings.
|
| 87 |
-
3. **Text Processing**: The notes and sections are combined into a single text column, which is then vectorized using `TfidfVectorizer` to create a matrix of TF-IDF features.
|
| 88 |
-
4. **Cosine Similarity Calculation**: The application calculates the cosine similarity between the user’s input and the notes using the vectorized representations. This identifies the five notes most similar to the input text.
|
| 89 |
-
5. **Formatting the Output**: The results are formatted in a user-friendly manner, making it easy for the user to read and understand the top matching entries.
|
| 90 |
|
| 91 |
-
##
|
| 92 |
|
| 93 |
-
|
|
|
|
| 94 |
|
| 95 |
-
---
|
| 96 |
|
|
|
|
|
|
| 11 |
short_description: Assist you to match bucnhes of text with your reference note
|
| 12 |
---
|
| 13 |
|
|
|
|
| 14 |
# Research Notes Matcher
|
| 15 |
|
| 16 |
+
This application allows you to find the top 5 matching research notes based on your input text. The tool uses a pre-trained language model from Hugging Face's Sentence Transformers to compute semantic similarity between the notes and the user input.
|
|
|
|
|
|
|
| 17 |
|
| 18 |
## Features
|
| 19 |
|
| 20 |
+
- **Upload CSV**: Upload a CSV file containing research notes.
|
| 21 |
+
- **Text Input**: Enter your text to find the most relevant notes.
|
| 22 |
+
- **Semantic Matching**: The application uses a Sentence Transformer to provide more meaningful matches compared to traditional methods.
|
| 23 |
|
| 24 |
## Requirements
|
| 25 |
|
| 26 |
+
Make sure to install the following packages:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
```bash
|
| 29 |
+
pip install gradio pandas sentence-transformers scikit-learn
|
| 30 |
```
|
| 31 |
|
| 32 |
+
## Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
+
1. Run the application.
|
| 35 |
+
2. Upload a CSV file with the columns **Source**, **Section**, and **Notes**.
|
| 36 |
+
3. Type your content in the provided textbox.
|
| 37 |
+
4. Click the submit button to see the top 5 matching entries.
|
| 38 |
|
| 39 |
+
## Sample CSV Format
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
Your CSV file should have the following columns:
|
| 42 |
|
| 43 |
+
| Source | Section | Notes |
|
| 44 |
+
|---------|----------|--------|
|
| 45 |
+
| Source1 | Section1 | Note1 |
|
| 46 |
+
| Source2 | Section2 | Note2 |
|
| 47 |
|
| 48 |
+
## Launching the Application
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
To run the application, execute the following command in your terminal:
|
| 51 |
|
| 52 |
+
```bash
|
| 53 |
+
python app.py
|
|
|
|
|
|
|
|
|
|
| 54 |
```
|
| 55 |
|
| 56 |
+
Replace `app.py` with the name of your Python file if it's different.
|
| 57 |
|
| 58 |
+
## License
|
| 59 |
|
| 60 |
+
This project is licensed under the MIT License.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
+
## Acknowledgements
|
| 63 |
|
| 64 |
+
- [Gradio](https://gradio.app/) for creating the user interface.
|
| 65 |
+
- [Hugging Face](https://huggingface.co/sentence-transformers) for providing the Sentence Transformers.
|
| 66 |
|
|
|
|
| 67 |
|
| 68 |
+
----
|