HipFil98 commited on
Commit
d7dede5
Β·
verified Β·
1 Parent(s): a90a20a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -100
README.md CHANGED
@@ -1,6 +1,15 @@
1
  ---
2
- pinned: true
 
 
 
 
 
 
 
 
3
  ---
 
4
  # ELAN-Bot πŸ€–
5
 
6
  A virtual assistant designed to help users with the ELAN annotation software. The bot can answer questions about ELAN usage and modify EAF (ELAN Annotation Format) files based on user instructions.
@@ -13,113 +22,48 @@ A virtual assistant designed to help users with the ELAN annotation software. Th
13
  - **Vector Search**: Uses semantic search to find relevant information from documentation
14
  - **Powered by Llama 3.3 70B**: Advanced language model for accurate responses
15
 
16
- ## Project Structure
17
-
18
- ```
19
- elan-bot/
20
- β”œβ”€β”€ app.py # Main application entry point
21
- β”œβ”€β”€ requirements.txt # Python dependencies
22
- β”œβ”€β”€ README.md # Project documentation
23
- β”œβ”€β”€ .env.example # Environment variables example
24
- β”œβ”€β”€ config/
25
- β”‚ └── settings.py # Configuration settings
26
- β”œβ”€β”€ prompts/
27
- β”‚ β”œβ”€β”€ __init__.py
28
- β”‚ β”œβ”€β”€ system_prompts.py # System prompts
29
- β”‚ β”œβ”€β”€ user_prompts.py # User prompts
30
- β”‚ └── assistant_prompts.py # Assistant prompts
31
- β”œβ”€β”€ services/
32
- β”‚ β”œβ”€β”€ __init__.py
33
- β”‚ β”œβ”€β”€ vector_search.py # Vector search functionality
34
- β”‚ β”œβ”€β”€ llm_service.py # LLM interaction service
35
- β”‚ └── elan_assistant.py # Main assistant coordinator
36
- β”œβ”€β”€ utils/
37
- β”‚ β”œβ”€β”€ __init__.py
38
- β”‚ └── text_processing.py # Text processing utilities
39
- β”œβ”€β”€ ui/
40
- β”‚ β”œβ”€β”€ __init__.py
41
- β”‚ └── gradio_interface.py # Gradio interface components
42
- └── data/
43
- └── qdrant_data/ # Vector database storage
44
- ```
45
-
46
- ## Installation
47
-
48
- 1. Clone the repository:
49
- ```bash
50
- git clone <repository-url>
51
- cd elan-bot
52
- ```
53
-
54
- 2. Create virtual environment (recommended):
55
- ```bash
56
- python -m venv venv
57
- source venv/bin/activate # On Windows: venv\Scripts\activate
58
- ```
59
-
60
- 3. Install dependencies:
61
- ```bash
62
- pip install -r requirements.txt
63
- ```
64
-
65
- 4. Set up environment variables:
66
- ```bash
67
- cp .env.example .env
68
- # Edit .env file with your Hugging Face token
69
- ```
70
-
71
- 5. Ensure you have the Qdrant vector database set up with ELAN documentation in the `data/qdrant_data` directory.
72
-
73
  ## Usage
74
 
75
- Run the application:
76
- ```bash
77
- python app.py
78
- ```
79
-
80
- The Gradio interface will launch and you can:
81
-
82
- - Ask questions about ELAN: "How can I add a new tier in ELAN?"
83
- - Modify EAF files: Paste your EAF content with instructions at the beginning
84
-
85
- ## Configuration
86
-
87
- Modify `config/settings.py` to adjust:
88
- - Model settings (encoder, LLM, tokenizer)
89
- - Vector database configuration
90
- - Text processing parameters
91
- - UI settings
92
-
93
- ## Components
94
 
95
- ### Services
96
- - **VectorSearchService**: Handles semantic search through ELAN documentation using sentence transformers and Qdrant
97
- - **LLMService**: Manages interactions with the Llama 3.3 70B model for generating responses and processing XML
98
- - **ElanAssistant**: Main coordinator that routes requests between question answering and XML modification workflows
 
 
 
 
99
 
100
- ### Utils
101
- - **TextProcessor**: Utilities for splitting large EAF files into manageable chunks and recombining results
102
 
103
- ### UI
104
- - **GradioInterface**: Handles the Gradio chat interface setup and configuration
 
 
105
 
106
- ### Configuration
107
- - **settings.py**: Centralized configuration for all application parameters
108
- - **prompts/**: Organized prompt templates separated by type (system, user, assistant)
109
-
110
- ## Development
111
 
112
- The project follows a clean architecture pattern with separation of concerns:
113
 
114
- - `config/`: Application configuration
115
- - `prompts/`: All prompt templates organized by type
116
- - `services/`: Core business logic and external service integrations
117
- - `utils/`: Utility functions and helpers
118
- - `ui/`: User interface components
119
- - `data/`: Data storage (vector database)
120
 
121
- Each module is self-contained with clear interfaces and minimal dependencies.
 
 
 
122
 
123
- ## License
124
 
125
- [Add your license information here]
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: ELAN-Bot
3
+ emoji: πŸ€–
4
+ colorFrom: blue
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 4.44.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
  ---
12
+
13
  # ELAN-Bot πŸ€–
14
 
15
  A virtual assistant designed to help users with the ELAN annotation software. The bot can answer questions about ELAN usage and modify EAF (ELAN Annotation Format) files based on user instructions.
 
22
  - **Vector Search**: Uses semantic search to find relevant information from documentation
23
  - **Powered by Llama 3.3 70B**: Advanced language model for accurate responses
24
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
  ## Usage
26
 
27
+ Simply interact with the chat interface:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
+ - **Ask questions**: "How can I add a new tier in ELAN?"
30
+ - **Modify EAF files**: Paste your EAF content with instructions at the beginning like:
31
+ ```
32
+ instructions: change the participant name from Eleonora to Gianni
33
+
34
+ <?xml version="1.0" encoding="UTF-8"?>
35
+ <ANNOTATION_DOCUMENT...>
36
+ ```
37
 
38
+ ## Examples
 
39
 
40
+ Try these sample questions:
41
+ - "How can I add a new tier in ELAN?"
42
+ - "ΒΏCΓ³mo puedo exportar anotaciones en formato txt?"
43
+ - "Come posso cercare all'interno delle annotazioni?"
44
 
45
+ ## Configuration
 
 
 
 
46
 
47
+ The app requires a HF_TOKEN environment variable to be set in the Hugging Face Spaces settings for accessing the Llama model.
48
 
49
+ ## Technical Details
 
 
 
 
 
50
 
51
+ - **Backend**: Python with Gradio interface
52
+ - **Vector Search**: Qdrant + SentenceTransformers
53
+ - **LLM**: Meta Llama 3.3 70B Instruct via Hugging Face Inference API
54
+ - **Text Processing**: tiktoken for efficient chunking
55
 
56
+ ## Project Structure
57
 
58
+ ```
59
+ elan-bot/
60
+ β”œβ”€β”€ app.py # Main application entry point
61
+ β”œβ”€β”€ requirements.txt # Python dependencies
62
+ β”œβ”€β”€ config/
63
+ β”‚ └── settings.py # Configuration settings
64
+ β”œβ”€β”€ prompts/ # Organized prompt templates
65
+ β”œβ”€β”€ services/ # Core business logic
66
+ β”œβ”€β”€ utils/ # Utility functions
67
+ β”œβ”€β”€ ui/ # Gradio interface components
68
+ └── data/ # Vector database storage
69
+ ```