--- library_name: sklearn tags: - text-classification - dependency-detection - random-forest - nlp - query-dependency - conversational-ai pipeline_tag: text-classification metrics: - accuracy - f1 - precision - recall --- # Query Dependence Classifier A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems. ## Model Description - **Model Type:** Random Forest Classifier (scikit-learn) - **Task:** Binary text classification for query dependency detection - **Features:** 45 engineered linguistic features - **Classes:** Independent vs Dependent queries ## Intended Use This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query. **Examples:** - Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → **Dependent** - Query 1: "What is AI?" Query 2: "What's the weather today?" → **Independent** ## Model Performance - **Training Features:** 45 engineered features - **Model Architecture:** Random Forest with 500 estimators - **Cross-validation:** Out-of-bag scoring enabled ## Feature Engineering The model uses 45 sophisticated features including: ### Lexical Features - Word overlap and Jaccard similarity - N-gram overlap (bigrams, trigrams) - Semantic similarity with stemming ### Linguistic Features - Pronoun and reference patterns - Question type classification - Discourse markers and connectives - Dependency phrases detection ### Structural Features - Length ratios and differences - Punctuation patterns - Complexity measures (syllable density) - Capitalization patterns ## Usage ```python # Install dependencies # pip install scikit-learn pandas nltk huggingface-hub joblib from huggingface_hub import hf_hub_download import joblib import json # Download model files model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib") encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib") config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json") # Load model components model = joblib.load(model_path) label_encoder = joblib.load(encoder_path) with open(config_path, 'r') as f: config = json.load(f) # Initialize classifier classifier = DependencyClassifier() classifier.model = model classifier.label_encoder = label_encoder classifier.feature_names = config['feature_names'] # Make predictions result = classifier.predict( "What is artificial intelligence?", "Can you give me some examples?" ) print(f"Prediction: {result['prediction']}") print(f"Confidence: {result['confidence']:.3f}") print(f"Probabilities: {result['probabilities']}") ``` ## Alternative Loading Method ```python # Load directly using class method classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL") # Use for inference result = classifier.predict("Query 1", "Query 2") ``` ## Training Data Format The model expects training data with columns: - `query1`: First query/question - `query2`: Second query/question - `label`: 'independent' or 'dependent' ## Model Architecture ```python RandomForestClassifier( n_estimators=500, max_depth=15, min_samples_split=7, min_samples_leaf=3, max_features='sqrt', class_weight='balanced', random_state=42 ) ``` ## Limitations - Designed for English language queries - Performance may vary on very short queries (< 3 words) - Requires NLTK stopwords corpus for optimal performance - Best suited for conversational question-answering scenarios ## Technical Details - **Framework:** scikit-learn - **Storage Format:** joblib (secure alternative to pickle) - **Configuration:** JSON metadata - **Reproducibility:** Fixed random seed (42) ## Citation ```bibtex @misc{query_dependence_classifier_2025, title={Query Dependence Classifier}, author={Admin-4minds}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL} } ``` ## License This model is released under the MIT License. ## Contact For questions or issues, please contact the admin-4minds team.