callidus commited on
Commit
4d33852
·
verified ·
1 Parent(s): 8ed3baf

Add proper model card with YAML metadata

Browse files
Files changed (1) hide show
  1. README.md +143 -67
README.md CHANGED
@@ -1,102 +1,178 @@
1
- # CodeBasics FAQ System
2
-
3
- An intelligent FAQ retrieval system for CodeBasics bootcamp questions using TF-IDF and cosine similarity.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ## Features
6
 
7
- - 🎯 Smart question matching using TF-IDF
8
- - 📊 Confidence scores for each match
9
- - 🔍 Keyword search functionality
10
- - 💬 Interactive Q&A interface
 
 
 
 
 
11
 
12
  ## Quick Start
13
 
14
  ### Installation
15
 
16
  ```bash
17
- pip install pandas scikit-learn
18
  ```
19
 
20
  ### Usage
21
 
22
  ```python
23
- from faq_system import CodeBasicsFAQ
24
-
25
- # Initialize FAQ system
26
- faq = CodeBasicsFAQ('codebasics_faqs.csv')
27
-
28
- # Ask a question
29
- result = faq.answer("Can I take this bootcamp without programming experience?")
30
-
31
- if result['status'] == 'success':
32
- print(f"Confidence: {result['confidence']}")
33
- print(f"Answer: {result['answer']}")
 
 
34
  ```
35
 
36
- ### Interactive Mode
37
 
38
- ```bash
39
- python faq_system.py
40
- ```
41
-
42
- ## Files
43
-
44
- - `faq_system.py` - Main FAQ system code
45
- - `codebasics_faqs.csv` - FAQ database (prompt, response)
46
- - `model_config.json` - Model configuration (for reference)
47
- - `model_weights.pt` - Transformer model weights (for reference)
48
- - `tokenizer.json` - Tokenizer (for reference)
49
-
50
- ## API
51
-
52
- ### Initialize
53
  ```python
54
- faq = CodeBasicsFAQ('codebasics_faqs.csv')
55
- ```
 
56
 
57
- ### Get Answer
58
- ```python
59
- result = faq.answer("Your question here")
60
- # Returns: {'status': 'success', 'confidence': '95.2%', 'matched_question': '...', 'answer': '...'}
61
- ```
62
-
63
- ### Search by Keyword
64
- ```python
65
- matches = faq.search_keyword('bootcamp')
66
- # Returns: List of matching Q&A pairs
67
- ```
68
-
69
- ### List All Questions
70
- ```python
71
- questions = faq.list_all_questions()
72
  ```
73
 
74
  ## Example Questions
75
 
 
76
  - "Can I take this bootcamp without programming experience?"
77
  - "Why should I trust Codebasics?"
78
  - "What are the prerequisites?"
79
- - "Do I need a laptop?"
80
- - "Is there lifetime access?"
81
  - "Do you provide job assistance?"
 
82
 
83
- ## How It Works
84
-
85
- 1. **TF-IDF Vectorization**: Converts questions into numerical vectors
86
- 2. **Cosine Similarity**: Measures similarity between user query and FAQ questions
87
- 3. **Best Match Selection**: Returns the most similar question with confidence score
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
88
 
89
- ## Accuracy
90
 
91
- - Typically 85-95% accuracy on similar phrasings
92
- - Handles variations in question format
93
- - Case-insensitive matching
94
- - Removes common stop words
95
 
96
  ## License
97
 
98
- Apache 2.0
99
 
100
- ## Contact
101
 
102
- For questions about CodeBasics courses, visit [codebasics.io](https://codebasics.io)
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - text-generation
7
+ - question-answering
8
+ - faq
9
+ - codebasics
10
+ - education
11
+ - bootcamp
12
+ datasets:
13
+ - custom
14
+ library_name: pytorch
15
+ pipeline_tag: text-generation
16
+ ---
17
+
18
+ # CodeBasics FAQ & Text Generation System
19
+
20
+ An intelligent AI system for CodeBasics bootcamp questions with dual capabilities:
21
+ - Smart FAQ retrieval for accurate answers to bootcamp questions
22
+ - Text generation for general AI/ML topics
23
+
24
+ ## Model Details
25
+
26
+ - **Developed by:** callidus
27
+ - **Model type:** Hybrid (TF-IDF FAQ + Transformer)
28
+ - **Language:** English
29
+ - **License:** Apache 2.0
30
 
31
  ## Features
32
 
33
+ 🎯 **Smart Question Answering**
34
+ - Intelligent FAQ matching using TF-IDF and cosine similarity
35
+ - 50+ CodeBasics bootcamp questions covered
36
+ - High accuracy for course-related queries
37
+
38
+ 🤖 **Text Generation**
39
+ - Transformer-based text generation
40
+ - Trained on AI/ML domain text
41
+ - Suitable for general tech content
42
 
43
  ## Quick Start
44
 
45
  ### Installation
46
 
47
  ```bash
48
+ pip install torch pandas scikit-learn huggingface_hub
49
  ```
50
 
51
  ### Usage
52
 
53
  ```python
54
+ from huggingface_hub import hf_hub_download
55
+ import pandas as pd
56
+ from sklearn.feature_extraction.text import TfidfVectorizer
57
+ from sklearn.metrics.pairwise import cosine_similarity
58
+ import numpy as np
59
+
60
+ # Download FAQ data
61
+ csv_path = hf_hub_download(
62
+ repo_id="callidus/good",
63
+ filename="codebasics_faqs.csv"
64
+ )
65
+
66
+ # Load and use (see full code in repository)
67
  ```
68
 
69
+ ### Interactive Usage
70
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
  ```python
72
+ # The system automatically chooses between FAQ and text generation
73
+ result = smart_inference("Can I take this bootcamp without experience?")
74
+ print(result) # Returns FAQ answer
75
 
76
+ result = smart_inference("machine learning algorithms")
77
+ print(result) # Returns generated text
 
 
 
 
 
 
 
 
 
 
 
 
 
78
  ```
79
 
80
  ## Example Questions
81
 
82
+ **Bootcamp Questions:**
83
  - "Can I take this bootcamp without programming experience?"
84
  - "Why should I trust Codebasics?"
85
  - "What are the prerequisites?"
 
 
86
  - "Do you provide job assistance?"
87
+ - "Is there lifetime access?"
88
 
89
+ **General Topics:**
90
+ - AI and machine learning concepts
91
+ - Programming and data analytics
92
+ - Technology discussions
93
+
94
+ ## Files in Repository
95
+
96
+ - `codebasics_faqs.csv` - FAQ database (50+ Q&A pairs)
97
+ - `faq_system.py` - FAQ retrieval system code
98
+ - `model_config.json` - Transformer model configuration
99
+ - `model_weights.pt` - Transformer model weights
100
+ - `tokenizer.json` - Tokenizer vocabulary
101
+ - `README.md` - This file
102
+
103
+ ## Model Architecture
104
+
105
+ ### FAQ System
106
+ - **Method:** TF-IDF + Cosine Similarity
107
+ - **Vectorizer:** TfidfVectorizer with bigrams
108
+ - **Threshold:** 0.2 similarity score
109
+ - **Accuracy:** ~90% on similar phrasings
110
+
111
+ ### Transformer Model
112
+ - **Architecture:** Custom Transformer
113
+ - **Layers:** 6 transformer blocks
114
+ - **Hidden size:** 512
115
+ - **Attention heads:** 8
116
+ - **Vocabulary:** 229 tokens
117
+ - **Max sequence length:** 512
118
+
119
+ ## Training Data
120
+
121
+ - **FAQ Data:** Custom CodeBasics bootcamp questions
122
+ - **Text Generation:** AI/ML domain corpus
123
+ - **Total samples:** Proprietary dataset
124
+
125
+ ## Limitations
126
+
127
+ - FAQ system requires questions similar to training data
128
+ - Text generation model has limited vocabulary (229 tokens)
129
+ - Best performance on CodeBasics-related questions
130
+ - English language only
131
+
132
+ ## Use Cases
133
+
134
+ ✅ **Recommended:**
135
+ - Answering CodeBasics bootcamp questions
136
+ - Educational chatbots
137
+ - Course support systems
138
+ - General AI/ML text generation
139
+
140
+ ❌ **Not Recommended:**
141
+ - Medical or legal advice
142
+ - Real-time information (trained on historical data)
143
+ - Languages other than English
144
+
145
+ ## Ethical Considerations
146
+
147
+ - Model provides educational content only
148
+ - Should not replace human instructors
149
+ - Answers based on training data may be outdated
150
+ - Users should verify critical information
151
+
152
+ ## Citation
153
+
154
+ If you use this model, please cite:
155
+
156
+ ```bibtex
157
+ @misc{codebasics-faq-2024,
158
+ author = {callidus},
159
+ title = {CodeBasics FAQ and Text Generation System},
160
+ year = {2024},
161
+ publisher = {HuggingFace},
162
+ howpublished = {\url{https://huggingface.co/callidus/good}}
163
+ }
164
+ ```
165
 
166
+ ## Contact
167
 
168
+ For questions about CodeBasics courses: [codebasics.io](https://codebasics.io)
 
 
 
169
 
170
  ## License
171
 
172
+ Apache 2.0 - See LICENSE file for details
173
 
174
+ ## Acknowledgments
175
 
176
+ - CodeBasics for the educational content
177
+ - Hugging Face for hosting infrastructure
178
+ - Open source community for tools and libraries