SayedShaun commited on
Commit
bbb02ee
Β·
verified Β·
1 Parent(s): 2f7f8ea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -15
README.md CHANGED
@@ -2,42 +2,42 @@
2
  language: en
3
  license: apache-2.0
4
  tags:
 
 
5
  - url-classification
6
  - text-classification
7
- - distilbert
8
- - web-mining
9
  - nlp
 
10
  - seo
11
- - crawler
12
  datasets:
13
  - ruggsea/infini-news-corpus
14
- metrics:
15
- - accuracy
16
  pipeline_tag: text-classification
17
  ---
18
 
19
- # 🌐 URL Content vs Section Classifier (DistilBERT)
20
 
21
- This model classifies a **web URL** into one of two structural categories:
22
 
23
- - **content** β†’ A specific article, blog post, or news story page
24
  - **section** β†’ A category page, listing page, or homepage/navigation page
25
 
26
- It is designed for **web crawling, content extraction, and large-scale URL filtering**.
27
 
28
  ---
29
 
30
- # πŸš€ Model Overview
31
 
32
- This model is a fine-tuned version of:
33
 
34
  πŸ‘‰ **:contentReference[oaicite:0]{index=0}**
35
 
36
- It learns patterns in URL structure rather than natural language sentences.
 
 
37
 
38
  ---
39
 
40
- ## 🧠 Problem Type
41
 
42
- ### Input
43
- A single URL string:
 
2
  language: en
3
  license: apache-2.0
4
  tags:
5
+ - distilbert
6
+ - fine-tuned
7
  - url-classification
8
  - text-classification
 
 
9
  - nlp
10
+ - web-mining
11
  - seo
 
12
  datasets:
13
  - ruggsea/infini-news-corpus
 
 
14
  pipeline_tag: text-classification
15
  ---
16
 
17
+ # 🌐 URL Content vs Section Classifier (Fine-tuned DistilBERT)
18
 
19
+ This model is a **fine-tuned version of DistilBERT** designed to classify web URLs into two categories:
20
 
21
+ - **content** β†’ A specific page such as an article, blog post, or news story
22
  - **section** β†’ A category page, listing page, or homepage/navigation page
23
 
24
+ It is optimized for **web crawling, URL filtering, and content extraction pipelines**.
25
 
26
  ---
27
 
28
+ # 🧠 Model Description
29
 
30
+ This model is fine-tuned from:
31
 
32
  πŸ‘‰ **:contentReference[oaicite:0]{index=0}**
33
 
34
+ DistilBERT is a compressed version of BERT that retains strong language understanding capabilities while being lightweight and fast.
35
+
36
+ In this project, it is adapted to learn **structural patterns in URLs** rather than natural language sentences.
37
 
38
  ---
39
 
40
+ # πŸš€ Task Definition
41
 
42
+ ## Input
43
+ A raw URL string: