Update README.md
Browse files
README.md
CHANGED
|
@@ -22,15 +22,42 @@ It achieves the following results on the evaluation set:
|
|
| 22 |
|
| 23 |
## Model description
|
| 24 |
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
| 26 |
|
| 27 |
## Intended uses & limitations
|
| 28 |
|
| 29 |
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Training and evaluation data
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
## Training procedure
|
| 36 |
|
|
|
|
| 22 |
|
| 23 |
## Model description
|
| 24 |
|
| 25 |
+
User
|
| 26 |
+
Im writing a model card for my machine learning model that I uploaded online, I wrote a rough description but can you fix my explanation, make it as correct and clear as possible, and spread out the descriptions under the three titles mentioned below:## Model description
|
| 27 |
+
|
| 28 |
+
I needed a topic classifier fine-tuned on subreddit descriptions to distinguish if that subreddit has to do with specific technology companies/stocks.
|
| 29 |
|
| 30 |
## Intended uses & limitations
|
| 31 |
|
| 32 |
More information needed
|
| 33 |
+
a binary topic classifier for my project.
|
| 34 |
+
## Training and evaluation data
|
| 35 |
+
|
| 36 |
+
I labeled about a 1,000 descriptions of subreddits I extracted using the Reddit API.
|
| 37 |
+
My project is regarding the "Magnificent Seven Companies" (Apple, Microsoft, Alphabet, Amazon, Nvidia, Tesla, and Meta),
|
| 38 |
+
I extracted subreddits based on keyword searches of those companies. The results yielded enough subreddits that were not
|
| 39 |
+
technology related (technology finance, stock, crypto) in the sense that I needed them to be for my project, so labeling
|
| 40 |
+
the subreddits resulting from these company keyword searches yielded a relatively even distribution of subreddits labeled
|
| 41 |
+
as 1 (being technology related) and 0 (being not technology related), that I felt confident that 1,000 data points were
|
| 42 |
+
enough to fine tune the model for my purposes. Because i am going to have a large sample size when evaluating data from
|
| 43 |
+
a spread of subreddits, I believe the error will be get evened out by the large number of samples I'm using.
|
| 44 |
+
|
| 45 |
+
ChatGPT
|
| 46 |
+
Model Description
|
| 47 |
+
|
| 48 |
+
I developed a topic classifier with the specific purpose of discerning whether a given subreddit is associated with particular technology companies or stocks. The model focuses on identifying subreddits related to the "Magnificent Seven Companies," namely Apple, Microsoft, Alphabet, Amazon, Nvidia, Tesla, and Meta.
|
| 49 |
+
|
| 50 |
+
## Intended uses & limitations
|
| 51 |
+
|
| 52 |
+
The primary function of this model is to serve as a binary topic classifier for a project centered around technology companies and stocks. However, it's important to note that more detailed information about the specific use cases and limitations is needed for a comprehensive understanding of its applicability and potential constraints.
|
| 53 |
|
| 54 |
## Training and evaluation data
|
| 55 |
|
| 56 |
+
To train the model, I curated a dataset comprising approximately 1,000 subreddit descriptions obtained through the Reddit API. The extraction process involved keyword searches related to the "Magnificent Seven Companies." Subsequently, I manually labeled these subreddits, assigning a label of 1 for those directly related to technology and 0 for those unrelated.
|
| 57 |
+
|
| 58 |
+
Given the diverse nature of the extracted subreddits, which included technology, finance, stocks, and crypto, the labeled data aimed to achieve a balanced distribution of relevant and non-relevant subreddits for the project's objectives. The resulting dataset provided a foundation for fine-tuning the model, and with 1,000 data points, I am confident in its efficacy for the intended purpose.
|
| 59 |
+
|
| 60 |
+
Considering the anticipated large sample size during the evaluation phase across a variety of subreddits, any potential errors in classification are expected to be mitigated by the sheer volume of data points. This approach is designed to ensure the model's robustness and generalizability across a broad spectrum of subreddit descriptions.
|
| 61 |
|
| 62 |
## Training procedure
|
| 63 |
|