CIS5190FinalProj
/

RandomForest

Model card Files Files and versions

RandomForest / README.md

Dada80's picture

Update README.md

a288599 verified over 1 year ago

|

2.35 kB

	---
	# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
	# Doc / guide: https://huggingface.co/docs/hub/model-cards
	{}
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->

	This modelcard aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).

	## Model Details
	This model classifies news headlines as either NBC or Fox News.

	### Model Description

	<!-- Provide a longer summary of what this model is. -->



	- Developed by: Jack Bader, Kaiyuan Wang, Pairan Xu
	- Taks: Binary classification (NBC News vs. Fox News)
	- Preprocessing: TF-IDF vectorization applied to the text data
	- stop_words = "english"
	- max_features = 1000
	- Model type: Random Forest
	- Freamwork: Scikit-learn
	-
	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	- Accuracy Score

	### Model Evaluation
	```python
	import pandas as pd
	import joblib
	from huggingface_hub import hf_hub_download
	from sklearn.feature_extraction.text import TfidfVectorizer
	from sklearn.metrics import classification_report

	# Mount to drive
	from google.colab import drive
	drive.mount('/content/drive')

	# Load test set
	test_df = pd.read_csv("/content/drive/MyDrive/test_data_random_subset.csv", encoding="Windows-1252")

	# Log in w/ huggingface token
	!huggingface-cli login

	# Download the model
	model = hf_hub_download(repo_id = "CIS5190FinalProj/RandomForest", filename = "best_rf_model.pkl")

	# Download the vectorizer
	tfidf_vectorizer = hf_hub_download(repo_id = "CIS5190FinalProj/RandomForest", filename = "tfidf_vectorizer.pkl")

	# Load the model
	pipeline = joblib.load(model)

	# Load the vectorizer
	tfidf_vectorizer = joblib.load(tfidf_vectorizer)

	# Extract the headlines from the test set
	X_test = test_df['title']

	# Apply transformation to the headlines into numerical features
	X_test_transformed = tfidf_vectorizer.transform(X_test)

	# Make predictions using the pipeline
	y_pred = pipeline.predict(X_test_transformed)

	# Extract 'labels' as target
	y_test = test_df['label']

	# Print classification report
	print(classification_report(y_test, y_pred))