CIS5190FinalProj
/

RandomForest

Model card Files Files and versions

Dada80 commited on Dec 16, 2024

Commit

1451ade

·

verified ·

1 Parent(s): 6f0d730

Update README.md

Files changed (1) hide show

README.md +45 -0

README.md CHANGED Viewed

	@@ -33,3 +33,48 @@ This model classifies news headlines as either NBC or Fox News.
33
34	- Accuracy Score
35

 - Accuracy Score
+### Model Evaluation
+```python
+import pandas as pd
+import joblib
+from huggingface_hub import hf_hub_download
+from sklearn.feature_extraction.text import TfidfVectorizer
+from sklearn.metrics import classification_report
+# Mount to drive
+from google.colab import drive
+drive.mount('/content/drive')
+# Load test set
+test_df = pd.read_csv("/content/drive/MyDrive/test_data_random_subset.csv", encoding="Windows-1252")
+# Log in w/ huggingface token
+# token: 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
+!huggingface-cli login
+# Download the model
+model = hf_hub_download(repo_id = "CIS5190FinalProj/RandomForest", filename = "best_rf_model.pkl")
+# Download the vectorizer
+tfidf_vectorizer = hf_hub_download(repo_id = "CIS5190FinalProj/RandomForest", filename = "tfidf_vectorizer.pkl")
+# Load the model
+pipeline = joblib.load(model)
+# Load the vectorizer
+tfidf_vectorizer = joblib.load(tfidf_vectorizer)
+# Extract the headlines from the test set
+X_test = test_df['title']
+# Apply transformation to the headlines into numerical features
+X_test_transformed = tfidf_vectorizer.transform(X_test)
+# Make predictions using the pipeline
+y_pred = pipeline.predict(X_test_transformed)
+# Extract 'labels' as target
+y_test = test_df['label']
+# Print classification report
+print(classification_report(y_test, y_pred))