File size: 4,266 Bytes
da6e1f7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
library_name: sklearn
tags:
- text-classification
- dependency-detection
- random-forest
- nlp
- query-dependency
- conversational-ai
pipeline_tag: text-classification
metrics:
- accuracy
- f1
- precision
- recall
---

# Query Dependence Classifier

A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.

## Model Description

- **Model Type:** Random Forest Classifier (scikit-learn)
- **Task:** Binary text classification for query dependency detection
- **Features:** 45 engineered linguistic features
- **Classes:** Independent vs Dependent queries

## Intended Use

This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.

**Examples:**
- Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → **Dependent**
- Query 1: "What is AI?" Query 2: "What's the weather today?" → **Independent**

## Model Performance

- **Training Features:** 45 engineered features
- **Model Architecture:** Random Forest with 500 estimators
- **Cross-validation:** Out-of-bag scoring enabled

## Feature Engineering

The model uses 45 sophisticated features including:

### Lexical Features
- Word overlap and Jaccard similarity
- N-gram overlap (bigrams, trigrams)
- Semantic similarity with stemming

### Linguistic Features  
- Pronoun and reference patterns
- Question type classification
- Discourse markers and connectives
- Dependency phrases detection

### Structural Features
- Length ratios and differences
- Punctuation patterns
- Complexity measures (syllable density)
- Capitalization patterns

## Usage

```python
# Install dependencies
# pip install scikit-learn pandas nltk huggingface-hub joblib

from huggingface_hub import hf_hub_download
import joblib
import json

# Download model files
model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")

# Load model components
model = joblib.load(model_path)
label_encoder = joblib.load(encoder_path)

with open(config_path, 'r') as f:
    config = json.load(f)

# Initialize classifier
classifier = DependencyClassifier()
classifier.model = model
classifier.label_encoder = label_encoder
classifier.feature_names = config['feature_names']

# Make predictions
result = classifier.predict(
    "What is artificial intelligence?", 
    "Can you give me some examples?"
)

print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.3f}")
print(f"Probabilities: {result['probabilities']}")
```

## Alternative Loading Method

```python
# Load directly using class method
classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")

# Use for inference
result = classifier.predict("Query 1", "Query 2")
```

## Training Data Format

The model expects training data with columns:
- `query1`: First query/question  
- `query2`: Second query/question
- `label`: 'independent' or 'dependent'

## Model Architecture

```python
RandomForestClassifier(
    n_estimators=500,
    max_depth=15,
    min_samples_split=7,
    min_samples_leaf=3,
    max_features='sqrt',
    class_weight='balanced',
    random_state=42
)
```

## Limitations

- Designed for English language queries
- Performance may vary on very short queries (< 3 words)
- Requires NLTK stopwords corpus for optimal performance
- Best suited for conversational question-answering scenarios

## Technical Details

- **Framework:** scikit-learn
- **Storage Format:** joblib (secure alternative to pickle)
- **Configuration:** JSON metadata
- **Reproducibility:** Fixed random seed (42)

## Citation

```bibtex
@misc{query_dependence_classifier_2025,
  title={Query Dependence Classifier},
  author={Admin-4minds},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
}
```

## License

This model is released under the MIT License.

## Contact

For questions or issues, please contact the admin-4minds team.