fortuala commited on
Commit
8a0115a
·
verified ·
1 Parent(s): f4dc45e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -56
README.md CHANGED
@@ -11,86 +11,58 @@ license: mit
11
  short_description: Assist you to match bucnhes of text with your reference note
12
  ---
13
 
14
-
15
  # Research Notes Matcher
16
 
17
- ## Overview
18
-
19
- The **Research Notes Matcher** is a web application designed to help users find relevant research notes based on a given text input. By leveraging text similarity algorithms, this application allows users to upload a CSV file containing research notes and retrieve the top five notes that most closely match their input.
20
 
21
  ## Features
22
 
23
- - **File Upload:** Users can upload a CSV file containing their research notes.
24
- - **Text Input:** Users can enter a free text that describes their query or topic of interest.
25
- - **Top 5 Matching Entries:** The application outputs the five most relevant notes, along with their sources and sections, based on text similarity.
26
 
27
  ## Requirements
28
 
29
- To run this application, you need the following Python libraries:
30
-
31
- - `gradio`: For creating the web interface.
32
- - `pandas`: For data manipulation and handling CSV files.
33
- - `scikit-learn`: For text processing and calculating similarity.
34
-
35
- You can install these libraries using pip:
36
 
37
  ```bash
38
- pip install gradio pandas scikit-learn
39
  ```
40
 
41
- ## Input File Format
42
-
43
- The input file must be a CSV file containing the following columns:
44
-
45
- - **Source**: A string representing the source of the research note (e.g., author names, book title, etc.).
46
- - **Section**: A string representing the section or chapter title related to the note.
47
- - **Notes**: A string containing the actual content of the research note.
48
 
49
- ### Example of CSV Structure
 
 
 
50
 
51
- ```plaintext
52
- Source,Section,Notes
53
- "Author Name, Book Title","Chapter 1: Introduction","This is the content of the first note..."
54
- "Another Author, Another Book","Chapter 2: Background","This note discusses background information..."
55
- ```
56
-
57
- ## How to Use
58
-
59
- 1. **Upload the CSV file**: Click on the upload button and select a CSV file containing your research notes.
60
- 2. **Enter your text**: In the provided text box, type the content or query related to the research topic you are interested in.
61
- 3. **Submit**: Click the "Submit" button to process your input.
62
- 4. **View Results**: The application will display the top five matching entries based on cosine similarity, formatted for easy reading.
63
 
64
- ### Output Format
65
 
66
- The output will consist of the top five matching notes presented in a readable format, which includes:
 
 
 
67
 
68
- - **Notes**: The content of the matching note.
69
- - **Source**: The source from which the note is taken.
70
- - **Section**: The section or chapter related to the note.
71
 
72
- Each entry will be separated by a line for clarity, as shown below:
73
 
74
- ```
75
- **Notes:** This is the content of the matching note...
76
- **Source:** Author Name, Book Title
77
- **Section:** Chapter 1: Introduction
78
- -------------------------------------
79
  ```
80
 
81
- ## Technical Explanation
82
 
83
- The application works by:
84
 
85
- 1. **Uploading and Reading the CSV File**: The user uploads a CSV file, which is read into a DataFrame using `pandas`.
86
- 2. **Data Validation**: The application checks that the necessary columns ('Source', 'Section', 'Notes') are present and handles any missing values by replacing them with empty strings.
87
- 3. **Text Processing**: The notes and sections are combined into a single text column, which is then vectorized using `TfidfVectorizer` to create a matrix of TF-IDF features.
88
- 4. **Cosine Similarity Calculation**: The application calculates the cosine similarity between the user’s input and the notes using the vectorized representations. This identifies the five notes most similar to the input text.
89
- 5. **Formatting the Output**: The results are formatted in a user-friendly manner, making it easy for the user to read and understand the top matching entries.
90
 
91
- ## Conclusion
92
 
93
- The Research Notes Matcher is a powerful tool for quickly finding relevant research notes based on a user's input. It simplifies the process of sifting through large amounts of information, making it easier to find insights and connections in research literature.
 
94
 
95
- ---
96
 
 
 
11
  short_description: Assist you to match bucnhes of text with your reference note
12
  ---
13
 
 
14
  # Research Notes Matcher
15
 
16
+ This application allows you to find the top 5 matching research notes based on your input text. The tool uses a pre-trained language model from Hugging Face's Sentence Transformers to compute semantic similarity between the notes and the user input.
 
 
17
 
18
  ## Features
19
 
20
+ - **Upload CSV**: Upload a CSV file containing research notes.
21
+ - **Text Input**: Enter your text to find the most relevant notes.
22
+ - **Semantic Matching**: The application uses a Sentence Transformer to provide more meaningful matches compared to traditional methods.
23
 
24
  ## Requirements
25
 
26
+ Make sure to install the following packages:
 
 
 
 
 
 
27
 
28
  ```bash
29
+ pip install gradio pandas sentence-transformers scikit-learn
30
  ```
31
 
32
+ ## Usage
 
 
 
 
 
 
33
 
34
+ 1. Run the application.
35
+ 2. Upload a CSV file with the columns **Source**, **Section**, and **Notes**.
36
+ 3. Type your content in the provided textbox.
37
+ 4. Click the submit button to see the top 5 matching entries.
38
 
39
+ ## Sample CSV Format
 
 
 
 
 
 
 
 
 
 
 
40
 
41
+ Your CSV file should have the following columns:
42
 
43
+ | Source | Section | Notes |
44
+ |---------|----------|--------|
45
+ | Source1 | Section1 | Note1 |
46
+ | Source2 | Section2 | Note2 |
47
 
48
+ ## Launching the Application
 
 
49
 
50
+ To run the application, execute the following command in your terminal:
51
 
52
+ ```bash
53
+ python app.py
 
 
 
54
  ```
55
 
56
+ Replace `app.py` with the name of your Python file if it's different.
57
 
58
+ ## License
59
 
60
+ This project is licensed under the MIT License.
 
 
 
 
61
 
62
+ ## Acknowledgements
63
 
64
+ - [Gradio](https://gradio.app/) for creating the user interface.
65
+ - [Hugging Face](https://huggingface.co/sentence-transformers) for providing the Sentence Transformers.
66
 
 
67
 
68
+ ----