SayedShaun commited on
Commit
2f7f8ea
·
verified ·
1 Parent(s): 45a137b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: apache-2.0
4
+ tags:
5
+ - url-classification
6
+ - text-classification
7
+ - distilbert
8
+ - web-mining
9
+ - nlp
10
+ - seo
11
+ - crawler
12
+ datasets:
13
+ - ruggsea/infini-news-corpus
14
+ metrics:
15
+ - accuracy
16
+ pipeline_tag: text-classification
17
+ ---
18
+
19
+ # 🌐 URL Content vs Section Classifier (DistilBERT)
20
+
21
+ This model classifies a **web URL** into one of two structural categories:
22
+
23
+ - **content** → A specific article, blog post, or news story page
24
+ - **section** → A category page, listing page, or homepage/navigation page
25
+
26
+ It is designed for **web crawling, content extraction, and large-scale URL filtering**.
27
+
28
+ ---
29
+
30
+ # 🚀 Model Overview
31
+
32
+ This model is a fine-tuned version of:
33
+
34
+ 👉 **:contentReference[oaicite:0]{index=0}**
35
+
36
+ It learns patterns in URL structure rather than natural language sentences.
37
+
38
+ ---
39
+
40
+ ## 🧠 Problem Type
41
+
42
+ ### Input
43
+ A single URL string: