Spaces:

fortuala
/

CitingLLM

Build error

App Files Files Community

afortuny commited on Oct 16, 2024

Commit

fc85650

1 Parent(s): 4973591

readme

Browse files

Files changed (1) hide show

README.md +84 -2

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-title: CitingLLM
 emoji: 🏢
 colorFrom: pink
 colorTo: purple
@@ -11,4 +11,86 @@ license: mit
 short_description: Assist you to match bucnhes of text with your reference note
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Research Notes Matcher
 emoji: 🏢
 colorFrom: pink
 colorTo: purple
 short_description: Assist you to match bucnhes of text with your reference note
 ---
+# Research Notes Matcher
+## Overview
+The **Research Notes Matcher** is a web application designed to help users find relevant research notes based on a given text input. By leveraging text similarity algorithms, this application allows users to upload a CSV file containing research notes and retrieve the top five notes that most closely match their input.
+## Features
+- **File Upload:** Users can upload a CSV file containing their research notes.
+- **Text Input:** Users can enter a free text that describes their query or topic of interest.
+- **Top 5 Matching Entries:** The application outputs the five most relevant notes, along with their sources and sections, based on text similarity.
+## Requirements
+To run this application, you need the following Python libraries:
+- `gradio`: For creating the web interface.
+- `pandas`: For data manipulation and handling CSV files.
+- `scikit-learn`: For text processing and calculating similarity.
+You can install these libraries using pip:
+```bash
+pip install gradio pandas scikit-learn
+```
+## Input File Format
+The input file must be a CSV file containing the following columns:
+- **Source**: A string representing the source of the research note (e.g., author names, book title, etc.).
+- **Section**: A string representing the section or chapter title related to the note.
+- **Notes**: A string containing the actual content of the research note.
+### Example of CSV Structure
+```plaintext
+Source,Section,Notes
+"Author Name, Book Title","Chapter 1: Introduction","This is the content of the first note..."
+"Another Author, Another Book","Chapter 2: Background","This note discusses background information..."
+```
+## How to Use
+1. **Upload the CSV file**: Click on the upload button and select a CSV file containing your research notes.
+2. **Enter your text**: In the provided text box, type the content or query related to the research topic you are interested in.
+3. **Submit**: Click the "Submit" button to process your input.
+4. **View Results**: The application will display the top five matching entries based on cosine similarity, formatted for easy reading.
+### Output Format
+The output will consist of the top five matching notes presented in a readable format, which includes:
+- **Notes**: The content of the matching note.
+- **Source**: The source from which the note is taken.
+- **Section**: The section or chapter related to the note.
+Each entry will be separated by a line for clarity, as shown below:
+```
+**Notes:** This is the content of the matching note...
+**Source:** Author Name, Book Title
+**Section:** Chapter 1: Introduction
+-------------------------------------
+```
+## Technical Explanation
+The application works by:
+1. **Uploading and Reading the CSV File**: The user uploads a CSV file, which is read into a DataFrame using `pandas`.
+2. **Data Validation**: The application checks that the necessary columns ('Source', 'Section', 'Notes') are present and handles any missing values by replacing them with empty strings.
+3. **Text Processing**: The notes and sections are combined into a single text column, which is then vectorized using `TfidfVectorizer` to create a matrix of TF-IDF features.
+4. **Cosine Similarity Calculation**: The application calculates the cosine similarity between the user’s input and the notes using the vectorized representations. This identifies the five notes most similar to the input text.
+5. **Formatting the Output**: The results are formatted in a user-friendly manner, making it easy for the user to read and understand the top matching entries.
+## Conclusion
+The Research Notes Matcher is a powerful tool for quickly finding relevant research notes based on a user's input. It simplifies the process of sifting through large amounts of information, making it easier to find insights and connections in research literature.
+---