Upload folder using huggingface_hub

Browse files

Files changed (4) hide show

README.md +166 -0
config.json +69 -0
label_encoder.joblib +3 -0
model.joblib +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,166 @@

+---
+library_name: sklearn
+tags:
+- text-classification
+- dependency-detection
+- random-forest
+- nlp
+- query-dependency
+- conversational-ai
+pipeline_tag: text-classification
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+---
+# Query Dependence Classifier
+A Random Forest model that determines whether a second query depends on the context of a first query in conversational AI systems.
+## Model Description
+- **Model Type:** Random Forest Classifier (scikit-learn)
+- **Task:** Binary text classification for query dependency detection
+- **Features:** 45 engineered linguistic features
+- **Classes:** Independent vs Dependent queries
+## Intended Use
+This model is designed for conversational AI systems to determine if a follow-up question requires context from a previous query.
+**Examples:**
+- Query 1: "What is machine learning?" Query 2: "Can you give me examples?" → **Dependent**
+- Query 1: "What is AI?" Query 2: "What's the weather today?" → **Independent**
+## Model Performance
+- **Training Features:** 45 engineered features
+- **Model Architecture:** Random Forest with 500 estimators
+- **Cross-validation:** Out-of-bag scoring enabled
+## Feature Engineering
+The model uses 45 sophisticated features including:
+### Lexical Features
+- Word overlap and Jaccard similarity
+- N-gram overlap (bigrams, trigrams)
+- Semantic similarity with stemming
+### Linguistic Features
+- Pronoun and reference patterns
+- Question type classification
+- Discourse markers and connectives
+- Dependency phrases detection
+### Structural Features
+- Length ratios and differences
+- Punctuation patterns
+- Complexity measures (syllable density)
+- Capitalization patterns
+## Usage
+```python
+# Install dependencies
+# pip install scikit-learn pandas nltk huggingface-hub joblib
+from huggingface_hub import hf_hub_download
+import joblib
+import json
+# Download model files
+model_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="model.joblib")
+encoder_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="label_encoder.joblib")
+config_path = hf_hub_download(repo_id="admin-4minds/QUERY-DEPENDENCE-MODEL", filename="config.json")
+# Load model components
+model = joblib.load(model_path)
+label_encoder = joblib.load(encoder_path)
+with open(config_path, 'r') as f:
+    config = json.load(f)
+# Initialize classifier
+classifier = DependencyClassifier()
+classifier.model = model
+classifier.label_encoder = label_encoder
+classifier.feature_names = config['feature_names']
+# Make predictions
+result = classifier.predict(
+    "What is artificial intelligence?",
+    "Can you give me some examples?"
+)
+print(f"Prediction: {result['prediction']}")
+print(f"Confidence: {result['confidence']:.3f}")
+print(f"Probabilities: {result['probabilities']}")
+```
+## Alternative Loading Method
+```python
+# Load directly using class method
+classifier = DependencyClassifier.load_from_huggingface_hub("admin-4minds/QUERY-DEPENDENCE-MODEL")
+# Use for inference
+result = classifier.predict("Query 1", "Query 2")
+```
+## Training Data Format
+The model expects training data with columns:
+- `query1`: First query/question
+- `query2`: Second query/question
+- `label`: 'independent' or 'dependent'
+## Model Architecture
+```python
+RandomForestClassifier(
+    n_estimators=500,
+    max_depth=15,
+    min_samples_split=7,
+    min_samples_leaf=3,
+    max_features='sqrt',
+    class_weight='balanced',
+    random_state=42
+)
+```
+## Limitations
+- Designed for English language queries
+- Performance may vary on very short queries (< 3 words)
+- Requires NLTK stopwords corpus for optimal performance
+- Best suited for conversational question-answering scenarios
+## Technical Details
+- **Framework:** scikit-learn
+- **Storage Format:** joblib (secure alternative to pickle)
+- **Configuration:** JSON metadata
+- **Reproducibility:** Fixed random seed (42)
+## Citation
+```bibtex
+@misc{query_dependence_classifier_2025,
+  title={Query Dependence Classifier},
+  author={Admin-4minds},
+  year={2025},
+  publisher={Hugging Face},
+  url={https://huggingface.co/admin-4minds/QUERY-DEPENDENCE-MODEL}
+}
+```
+## License
+This model is released under the MIT License.
+## Contact
+For questions or issues, please contact the admin-4minds team.

config.json ADDED Viewed

	@@ -0,0 +1,69 @@

+{
+  "model_type": "RandomForestClassifier",
+  "library": "sklearn",
+  "task": "text-classification",
+  "subtask": "query-dependency-detection",
+  "feature_names": [
+    "q1_length",
+    "q2_length",
+    "length_diff",
+    "length_ratio",
+    "q1_char_length",
+    "q2_char_length",
+    "char_length_ratio",
+    "common_words",
+    "jaccard_similarity",
+    "word_overlap_ratio",
+    "stem_overlap",
+    "bigram_overlap",
+    "trigram_overlap",
+    "pronoun_count",
+    "reference_count",
+    "connective_count",
+    "early_pronoun_count",
+    "early_reference_count",
+    "early_connective_count",
+    "dependency_phrase_count",
+    "has_dependency_phrase",
+    "semantic_similarity",
+    "entity_overlap",
+    "q1_exclamation",
+    "q2_exclamation",
+    "q1_comma_count",
+    "q2_comma_count",
+    "q1_avg_word_length",
+    "q2_avg_word_length",
+    "complexity_diff",
+    "q1_syllable_density",
+    "q2_syllable_density",
+    "continuation_markers",
+    "contrast_markers",
+    "causation_markers",
+    "exemplification_markers",
+    "elaboration_markers",
+    "repeated_words_q2",
+    "max_word_repetition",
+    "q1_caps_words",
+    "q2_caps_words",
+    "spatial_references",
+    "temporal_references",
+    "comparative_references",
+    "quantitative_references"
+  ],
+  "label_classes": [
+    "dependent",
+    "independent"
+  ],
+  "num_features": 45,
+  "model_params": {
+    "n_estimators": 500,
+    "max_depth": 15,
+    "min_samples_split": 7,
+    "min_samples_leaf": 3,
+    "max_features": "sqrt",
+    "random_state": 42,
+    "class_weight": "balanced"
+  },
+  "created_at": "2025-07-25T18:08:02.989967",
+  "version": "1.0.0"
+}

label_encoder.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:cbd1fdf15974b88c06e16e9ce0f0393d2b6c2a0ce2fde186873a995196e9b0bd
+size 498

model.joblib ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:972ae33af6194b34c8be0551bd5526b20801bad8bd5827fc9e1d40c24411ef2a
+size 4446838