--- datasets: - Xerv-AI/netuark-posts-6000 --- # NetuArk Posts Classifier (Ensemble Architecture) This model is a novel ensemble classifier designed to categorize technology-related social media posts into their respective news sources. The model is trained to classify the following sources: - ArsTechnica - FT - GuardianTech - HackerNews - Slashdot - TechCrunch - TheVerge - ## Model Details - **Architecture:** Voting Classifier (Multinomial Naive Bayes + Logistic Regression) - **Vectorization:** TF-IDF (N-grams 1-3) - **Accuracy:** 99.81% on the NetuArk-6000 dataset. - **Classes:** HackerNews, TechCrunch, TheVerge, FT, GuardianTech, Slashdot, ArsTechnica. ## Training Data Trained on the [Xerv-AI/netuark-posts-6000](https://huggingface.co/datasets/Xerv-AI/netuark-posts-6000) dataset. ## Usage ```python import joblib import os from huggingface_hub import hf_hub_download # Define the missing custom function required by the unpickler def advanced_clean(text): return text # Assign it to __main__ to ensure joblib can find it during loading import __main__ __main__.advanced_clean = advanced_clean # Repository and filename repo_id = 'Phase-Technologies/netuark-classifier-ensemble' filename = 'netuark_ensemble_classifier.joblib' try: # Download the file from Hugging Face file_path = hf_hub_download(repo_id=repo_id, filename=filename) # Load the model model = joblib.load(file_path) prediction = model.predict(["📰 Perplexity's 'Personal Computer' Lets AI Agents Access Your Local Files #slashdot"]) print(f"Prediction: {prediction}") except Exception as e: import traceback print(f"An error occurred: {e}") traceback.print_exc() ```