admin-4minds commited on
Commit
da6e1f7
·
verified ·
1 Parent(s): 6e7145d

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +166 -0
  2. config.json +69 -0
  3. label_encoder.joblib +3 -0
  4. model.joblib +3 -0
README.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: sklearn
3
+ tags:
4
+ - text-classification
5
+ - dependency-detection
6
+ - random-forest
7
+ - nlp
8
+ - query-dependency
9
+ - conversational-ai
10
+ pipeline_tag: text-classification
11
+ metrics:
12
+ - accuracy
13
+ - f1
14
+ - precision
15
+ - recall
16
+ ---
17
+
18
+ # Query Dependence Classifier
19
+
20
+ A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.
21
+
22
+ ## Model Description
23
+
24
+ - **Model Type:** Random Forest Classifier (scikit-learn)
25
+ - **Task:** Binary text classification for query dependency detection
26
+ - **Features:** 45 engineered linguistic features
27
+ - **Classes:** Independent vs Dependent queries
28
+
29
+ ## Intended Use
30
+
31
+ This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.
32
+
33
+ **Examples:**
34
+ - Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → **Dependent**
35
+ - Query 1: "What is AI?" Query 2: "What's the weather today?" → **Independent**
36
+
37
+ ## Model Performance
38
+
39
+ - **Training Features:** 45 engineered features
40
+ - **Model Architecture:** Random Forest with 500 estimators
41
+ - **Cross-validation:** Out-of-bag scoring enabled
42
+
43
+ ## Feature Engineering
44
+
45
+ The model uses 45 sophisticated features including:
46
+
47
+ ### Lexical Features
48
+ - Word overlap and Jaccard similarity
49
+ - N-gram overlap (bigrams, trigrams)
50
+ - Semantic similarity with stemming
51
+
52
+ ### Linguistic Features
53
+ - Pronoun and reference patterns
54
+ - Question type classification
55
+ - Discourse markers and connectives
56
+ - Dependency phrases detection
57
+
58
+ ### Structural Features
59
+ - Length ratios and differences
60
+ - Punctuation patterns
61
+ - Complexity measures (syllable density)
62
+ - Capitalization patterns
63
+
64
+ ## Usage
65
+
66
+ ```python
67
+ # Install dependencies
68
+ # pip install scikit-learn pandas nltk huggingface-hub joblib
69
+
70
+ from huggingface_hub import hf_hub_download
71
+ import joblib
72
+ import json
73
+
74
+ # Download model files
75
+ model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
76
+ encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
77
+ config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")
78
+
79
+ # Load model components
80
+ model = joblib.load(model_path)
81
+ label_encoder = joblib.load(encoder_path)
82
+
83
+ with open(config_path, 'r') as f:
84
+ config = json.load(f)
85
+
86
+ # Initialize classifier
87
+ classifier = DependencyClassifier()
88
+ classifier.model = model
89
+ classifier.label_encoder = label_encoder
90
+ classifier.feature_names = config['feature_names']
91
+
92
+ # Make predictions
93
+ result = classifier.predict(
94
+ "What is artificial intelligence?",
95
+ "Can you give me some examples?"
96
+ )
97
+
98
+ print(f"Prediction: {result['prediction']}")
99
+ print(f"Confidence: {result['confidence']:.3f}")
100
+ print(f"Probabilities: {result['probabilities']}")
101
+ ```
102
+
103
+ ## Alternative Loading Method
104
+
105
+ ```python
106
+ # Load directly using class method
107
+ classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")
108
+
109
+ # Use for inference
110
+ result = classifier.predict("Query 1", "Query 2")
111
+ ```
112
+
113
+ ## Training Data Format
114
+
115
+ The model expects training data with columns:
116
+ - `query1`: First query/question
117
+ - `query2`: Second query/question
118
+ - `label`: 'independent' or 'dependent'
119
+
120
+ ## Model Architecture
121
+
122
+ ```python
123
+ RandomForestClassifier(
124
+ n_estimators=500,
125
+ max_depth=15,
126
+ min_samples_split=7,
127
+ min_samples_leaf=3,
128
+ max_features='sqrt',
129
+ class_weight='balanced',
130
+ random_state=42
131
+ )
132
+ ```
133
+
134
+ ## Limitations
135
+
136
+ - Designed for English language queries
137
+ - Performance may vary on very short queries (< 3 words)
138
+ - Requires NLTK stopwords corpus for optimal performance
139
+ - Best suited for conversational question-answering scenarios
140
+
141
+ ## Technical Details
142
+
143
+ - **Framework:** scikit-learn
144
+ - **Storage Format:** joblib (secure alternative to pickle)
145
+ - **Configuration:** JSON metadata
146
+ - **Reproducibility:** Fixed random seed (42)
147
+
148
+ ## Citation
149
+
150
+ ```bibtex
151
+ @misc{query_dependence_classifier_2025,
152
+ title={Query Dependence Classifier},
153
+ author={Admin-4minds},
154
+ year={2025},
155
+ publisher={Hugging Face},
156
+ url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
157
+ }
158
+ ```
159
+
160
+ ## License
161
+
162
+ This model is released under the MIT License.
163
+
164
+ ## Contact
165
+
166
+ For questions or issues, please contact the admin-4minds team.
config.json ADDED
@@ -0,0 +1,69 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "RandomForestClassifier",
3
+ "library": "sklearn",
4
+ "task": "text-classification",
5
+ "subtask": "query-dependency-detection",
6
+ "feature_names": [
7
+ "q1_length",
8
+ "q2_length",
9
+ "length_diff",
10
+ "length_ratio",
11
+ "q1_char_length",
12
+ "q2_char_length",
13
+ "char_length_ratio",
14
+ "common_words",
15
+ "jaccard_similarity",
16
+ "word_overlap_ratio",
17
+ "stem_overlap",
18
+ "bigram_overlap",
19
+ "trigram_overlap",
20
+ "pronoun_count",
21
+ "reference_count",
22
+ "connective_count",
23
+ "early_pronoun_count",
24
+ "early_reference_count",
25
+ "early_connective_count",
26
+ "dependency_phrase_count",
27
+ "has_dependency_phrase",
28
+ "semantic_similarity",
29
+ "entity_overlap",
30
+ "q1_exclamation",
31
+ "q2_exclamation",
32
+ "q1_comma_count",
33
+ "q2_comma_count",
34
+ "q1_avg_word_length",
35
+ "q2_avg_word_length",
36
+ "complexity_diff",
37
+ "q1_syllable_density",
38
+ "q2_syllable_density",
39
+ "continuation_markers",
40
+ "contrast_markers",
41
+ "causation_markers",
42
+ "exemplification_markers",
43
+ "elaboration_markers",
44
+ "repeated_words_q2",
45
+ "max_word_repetition",
46
+ "q1_caps_words",
47
+ "q2_caps_words",
48
+ "spatial_references",
49
+ "temporal_references",
50
+ "comparative_references",
51
+ "quantitative_references"
52
+ ],
53
+ "label_classes": [
54
+ "dependent",
55
+ "independent"
56
+ ],
57
+ "num_features": 45,
58
+ "model_params": {
59
+ "n_estimators": 500,
60
+ "max_depth": 15,
61
+ "min_samples_split": 7,
62
+ "min_samples_leaf": 3,
63
+ "max_features": "sqrt",
64
+ "random_state": 42,
65
+ "class_weight": "balanced"
66
+ },
67
+ "created_at": "2025-07-25T18:08:02.989967",
68
+ "version": "1.0.0"
69
+ }
label_encoder.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cbd1fdf15974b88c06e16e9ce0f0393d2b6c2a0ce2fde186873a995196e9b0bd
3
+ size 498
model.joblib ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:972ae33af6194b34c8be0551bd5526b20801bad8bd5827fc9e1d40c24411ef2a
3
+ size 4446838