gulnuravci commited on
Commit
e15de18
·
verified ·
1 Parent(s): 392aa0f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -2
README.md CHANGED
@@ -22,15 +22,42 @@ It achieves the following results on the evaluation set:
22
 
23
  ## Model description
24
 
25
- More information needed
 
 
 
26
 
27
  ## Intended uses & limitations
28
 
29
  More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Training and evaluation data
32
 
33
- More information needed
 
 
 
 
34
 
35
  ## Training procedure
36
 
 
22
 
23
  ## Model description
24
 
25
+ User
26
+ Im writing a model card for my machine learning model that I uploaded online, I wrote a rough description but can you fix my explanation, make it as correct and clear as possible, and spread out the descriptions under the three titles mentioned below:## Model description
27
+
28
+ I needed a topic classifier fine-tuned on subreddit descriptions to distinguish if that subreddit has to do with specific technology companies/stocks.
29
 
30
  ## Intended uses & limitations
31
 
32
  More information needed
33
+ a binary topic classifier for my project.
34
+ ## Training and evaluation data
35
+
36
+ I labeled about a 1,000 descriptions of subreddits I extracted using the Reddit API.
37
+ My project is regarding the "Magnificent Seven Companies" (Apple, Microsoft, Alphabet, Amazon, Nvidia, Tesla, and Meta),
38
+ I extracted subreddits based on keyword searches of those companies. The results yielded enough subreddits that were not
39
+ technology related (technology finance, stock, crypto) in the sense that I needed them to be for my project, so labeling
40
+ the subreddits resulting from these company keyword searches yielded a relatively even distribution of subreddits labeled
41
+ as 1 (being technology related) and 0 (being not technology related), that I felt confident that 1,000 data points were
42
+ enough to fine tune the model for my purposes. Because i am going to have a large sample size when evaluating data from
43
+ a spread of subreddits, I believe the error will be get evened out by the large number of samples I'm using.
44
+
45
+ ChatGPT
46
+ Model Description
47
+
48
+ I developed a topic classifier with the specific purpose of discerning whether a given subreddit is associated with particular technology companies or stocks. The model focuses on identifying subreddits related to the "Magnificent Seven Companies," namely Apple, Microsoft, Alphabet, Amazon, Nvidia, Tesla, and Meta.
49
+
50
+ ## Intended uses & limitations
51
+
52
+ The primary function of this model is to serve as a binary topic classifier for a project centered around technology companies and stocks. However, it's important to note that more detailed information about the specific use cases and limitations is needed for a comprehensive understanding of its applicability and potential constraints.
53
 
54
  ## Training and evaluation data
55
 
56
+ To train the model, I curated a dataset comprising approximately 1,000 subreddit descriptions obtained through the Reddit API. The extraction process involved keyword searches related to the "Magnificent Seven Companies." Subsequently, I manually labeled these subreddits, assigning a label of 1 for those directly related to technology and 0 for those unrelated.
57
+
58
+ Given the diverse nature of the extracted subreddits, which included technology, finance, stocks, and crypto, the labeled data aimed to achieve a balanced distribution of relevant and non-relevant subreddits for the project's objectives. The resulting dataset provided a foundation for fine-tuning the model, and with 1,000 data points, I am confident in its efficacy for the intended purpose.
59
+
60
+ Considering the anticipated large sample size during the evaluation phase across a variety of subreddits, any potential errors in classification are expected to be mitigated by the sheer volume of data points. This approach is designed to ensure the model's robustness and generalizability across a broad spectrum of subreddit descriptions.
61
 
62
  ## Training procedure
63