File size: 4,266 Bytes

da6e1f7

---
library_name: sklearn
tags:
- text-classification
- dependency-detection
- random-forest
- nlp
- query-dependency
- conversational-ai
pipeline_tag: text-classification
metrics:
- accuracy
- f1
- precision
- recall
---

# Query Dependence Classifier

A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.

## Model Description

- **Model Type:** Random Forest Classifier (scikit-learn)
- **Task:** Binary text classification for query dependency detection
- **Features:** 45 engineered linguistic features
- **Classes:** Independent vs Dependent queries

## Intended Use

This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.

**Examples:**
- Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → **Dependent**
- Query 1: "What is AI?" Query 2: "What's the weather today?" → **Independent**

## Model Performance

- **Training Features:** 45 engineered features
- **Model Architecture:** Random Forest with 500 estimators
- **Cross-validation:** Out-of-bag scoring enabled

## Feature Engineering

The model uses 45 sophisticated features including:

### Lexical Features
- Word overlap and Jaccard similarity
- N-gram overlap (bigrams, trigrams)
- Semantic similarity with stemming

### Linguistic Features  
- Pronoun and reference patterns
- Question type classification
- Discourse markers and connectives
- Dependency phrases detection

### Structural Features
- Length ratios and differences
- Punctuation patterns
- Complexity measures (syllable density)
- Capitalization patterns

## Usage

```python
# Install dependencies
# pip install scikit-learn pandas nltk huggingface-hub joblib

from huggingface_hub import hf_hub_download
import joblib
import json

# Download model files
model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")

# Load model components
model = joblib.load(model_path)
label_encoder = joblib.load(encoder_path)

with open(config_path, 'r') as f:
    config = json.load(f)

# Initialize classifier
classifier = DependencyClassifier()
classifier.model = model
classifier.label_encoder = label_encoder
classifier.feature_names = config['feature_names']

# Make predictions
result = classifier.predict(
    "What is artificial intelligence?", 
    "Can you give me some examples?"
)

print(f"Prediction: {result['prediction']}")
print(f"Confidence: {result['confidence']:.3f}")
print(f"Probabilities: {result['probabilities']}")
```

## Alternative Loading Method

```python
# Load directly using class method
classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")

# Use for inference
result = classifier.predict("Query 1", "Query 2")
```

## Training Data Format

The model expects training data with columns:
- `query1`: First query/question  
- `query2`: Second query/question
- `label`: 'independent' or 'dependent'

## Model Architecture

```python
RandomForestClassifier(
    n_estimators=500,
    max_depth=15,
    min_samples_split=7,
    min_samples_leaf=3,
    max_features='sqrt',
    class_weight='balanced',
    random_state=42
)
```

## Limitations

- Designed for English language queries
- Performance may vary on very short queries (< 3 words)
- Requires NLTK stopwords corpus for optimal performance
- Best suited for conversational question-answering scenarios

## Technical Details

- **Framework:** scikit-learn
- **Storage Format:** joblib (secure alternative to pickle)
- **Configuration:** JSON metadata
- **Reproducibility:** Fixed random seed (42)

## Citation

```bibtex
@misc{query_dependence_classifier_2025,
  title={Query Dependence Classifier},
  author={Admin-4minds},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
}
```

## License

This model is released under the MIT License.

## Contact

For questions or issues, please contact the admin-4minds team.