tomaarsen
/

span-marker-bert-base-orgs

@@ -1,6 +1,7 @@
 ---
 language:
 - en
 library_name: span-marker
 tags:
 - span-marker
@@ -15,55 +16,55 @@ metrics:
 - recall
 - f1
 widget:
-- text: Hallacas are also commonly consumed in eastern Cuba parts of Colombia, Ecuador,
-    Aruba, and Curaçao.
-- text: The co-production of Yvon Michel's GYM and Jean Bédard's Interbox promotions
-    and televised via HBO, has trumped a proposed HBO -televised rematch between Jean
-    Pascal and RING and WBC 175-pound champion Chad Dawson that was slated for the
-    same date at Bell Centre in Montreal.
-- text: The synoptic conditions see a low over southern Norway, bringing warm south
-    and southwesterly flows of air up from the inner continental areas of Russia and
-    Belarus.
-- text: The RCIS recommended amongst other things that the Australian Security Intelligence
-    Organisation (ASIO) areas of investigation be widened to include terrorism.
-- text: The large network had multiple campuses in Minnesota, Wisconsin, and South
-    Dakota.
 pipeline_tag: token-classification
 co2_eq_emissions:
-  emissions: 532.6472478623315
   source: codecarbon
   training_type: fine-tuning
   on_cloud: false
   cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
   ram_total_size: 31.777088165283203
-  hours_used: 3.696
   hardware_used: 1 x NVIDIA GeForce RTX 3090
 base_model: bert-base-cased
 model-index:
-- name: SpanMarker with bert-base-cased on FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD
   results:
   - task:
       type: token-classification
       name: Named Entity Recognition
     dataset:
-      name: FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD
       type: tomaarsen/ner-orgs
       split: test
     metrics:
     - type: f1
-      value: 0.8311343653918766
       name: F1
     - type: precision
-      value: 0.8334090564894745
       name: Precision
     - type: recall
-      value: 0.8288720574945131
       name: Recall
 ---
-# SpanMarker with bert-base-cased on FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD
-This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD](https://huggingface.co/datasets/tomaarsen/ner-orgs) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [bert-base-cased](https://huggingface.co/bert-base-cased) as the underlying encoder.
 ## Model Details
@@ -72,9 +73,9 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained
 - **Encoder:** [bert-base-cased](https://huggingface.co/bert-base-cased)
 - **Maximum Sequence Length:** 256 tokens
 - **Maximum Entity Length:** 8 words
-- **Training Dataset:** [FewNERD, CoNLL2003, OntoNotes v5, and MultiNERD](https://huggingface.co/datasets/tomaarsen/ner-orgs)
 - **Language:** en
-<!-- - **License:** Unknown -->
 ### Model Sources
@@ -84,15 +85,15 @@ This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained
 ### Model Labels
 | Label | Examples                                     |
 |:------|:---------------------------------------------|
-| ORG   | "IAEA", "Church 's Chicken", "Texas Chicken" |
 ## Evaluation
 ### Metrics
 | Label   | Precision | Recall | F1     |
 |:--------|:----------|:-------|:-------|
-| **all** | 0.8334    | 0.8289 | 0.8311 |
-| ORG     | 0.8334    | 0.8289 | 0.8311 |
 ## Uses
@@ -104,7 +105,7 @@ from span_marker import SpanMarkerModel
 # Download from the 🤗 Hub
 model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-orgs")
 # Run inference
-entities = model.predict("The large network had multiple campuses in Minnesota, Wisconsin, and South Dakota.")
 ```
 ### Downstream Use
@@ -155,8 +156,8 @@ trainer.save_model("tomaarsen/span-marker-bert-base-orgs-finetuned")
 ### Training Set Metrics
 | Training set          | Min | Median  | Max |
 |:----------------------|:----|:--------|:----|
-| Sentence length       | 1   | 22.1911 | 267 |
-| Entities per sentence | 0   | 0.8144  | 39  |
 ### Training Hyperparameters
 - learning_rate: 5e-05
@@ -169,22 +170,17 @@ trainer.save_model("tomaarsen/span-marker-bert-base-orgs-finetuned")
 - num_epochs: 3
 ### Training Results
-| Epoch  | Step  | Validation Loss |
-|:------:|:-----:|:---------------:|
-| 0.3273 | 3000  | 0.0052          |
-| 0.6546 | 6000  | 0.0047          |
-| 0.9819 | 9000  | 0.0045          |
-| 1.3092 | 12000 | 0.0047          |
-| 1.6365 | 15000 | 0.0045          |
-| 1.9638 | 18000 | 0.0046          |
-| 2.2911 | 21000 | 0.0054          |
-| 2.6184 | 24000 | 0.0053          |
-| 2.9457 | 27000 | 0.0052          |
 ### Environmental Impact
 Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
-- **Carbon Emitted**: 0.533 kg of CO2
-- **Hours Used**: 3.696 hours
 ### Training Hardware
 - **On Cloud**: No

 ---
 language:
 - en
+license: cc-by-sa-4.0
 library_name: span-marker
 tags:
 - span-marker
 - recall
 - f1
 widget:
+- text: Today in Zhongnanhai, General Secretary of the Communist Party of China, President
+    of the country and honorary President of China's Red Cross, Zemin Jiang met with
+    representatives of the 6th National Member Congress of China's Red Cross, and
+    expressed warm greetings to the 20 million hardworking members on behalf of the
+    Central Committee of the Chinese Communist Party and State Council.
+- text: On April 20, 2017, MGM Television Studios, headed by Mark Burnett formed a
+    partnership with McLane and Buss to produce and distribute new content across
+    a number of media platforms.
+- text: 'Postponed: East Fife v Clydebank, St Johnstone v'
+- text: Prime contractor was Hughes Aircraft Company Electronics Division which developed
+    the Tiamat with the assistance of the NACA.
+- text: After graduating from Auburn University with a degree in Engineering in 1985,
+    he went on to play inside linebacker for the Pittsburgh Steelers for four seasons.
 pipeline_tag: token-classification
 co2_eq_emissions:
+  emissions: 248.1008753496152
   source: codecarbon
   training_type: fine-tuning
   on_cloud: false
   cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
   ram_total_size: 31.777088165283203
+  hours_used: 1.766
   hardware_used: 1 x NVIDIA GeForce RTX 3090
 base_model: bert-base-cased
 model-index:
+- name: SpanMarker with bert-base-cased on FewNERD, CoNLL2003, and OntoNotes v5
   results:
   - task:
       type: token-classification
       name: Named Entity Recognition
     dataset:
+      name: FewNERD, CoNLL2003, and OntoNotes v5
       type: tomaarsen/ner-orgs
       split: test
     metrics:
     - type: f1
+      value: 0.7946954813359528
       name: F1
     - type: precision
+      value: 0.7958325880879986
       name: Precision
     - type: recall
+      value: 0.793561619404316
       name: Recall
 ---
+# SpanMarker with bert-base-cased on FewNERD, CoNLL2003, and OntoNotes v5
+This is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model trained on the [FewNERD, CoNLL2003, and OntoNotes v5](https://huggingface.co/datasets/tomaarsen/ner-orgs) dataset that can be used for Named Entity Recognition. This SpanMarker model uses [bert-base-cased](https://huggingface.co/bert-base-cased) as the underlying encoder.
 ## Model Details
 - **Encoder:** [bert-base-cased](https://huggingface.co/bert-base-cased)
 - **Maximum Sequence Length:** 256 tokens
 - **Maximum Entity Length:** 8 words
+- **Training Dataset:** [FewNERD, CoNLL2003, and OntoNotes v5](https://huggingface.co/datasets/tomaarsen/ner-orgs)
 - **Language:** en
+- **License:** cc-by-sa-4.0
 ### Model Sources
 ### Model Labels
 | Label | Examples                                     |
 |:------|:---------------------------------------------|
+| ORG   | "Texas Chicken", "IAEA", "Church 's Chicken" |
 ## Evaluation
 ### Metrics
 | Label   | Precision | Recall | F1     |
 |:--------|:----------|:-------|:-------|
+| **all** | 0.7958    | 0.7936 | 0.7947 |
+| ORG     | 0.7958    | 0.7936 | 0.7947 |
 ## Uses
 # Download from the 🤗 Hub
 model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-orgs")
 # Run inference
+entities = model.predict("Postponed: East Fife v Clydebank, St Johnstone v")
 ```
 ### Downstream Use
 ### Training Set Metrics
 | Training set          | Min | Median  | Max |
 |:----------------------|:----|:--------|:----|
+| Sentence length       | 1   | 23.5706 | 263 |
+| Entities per sentence | 0   | 0.7865  | 39  |
 ### Training Hyperparameters
 - learning_rate: 5e-05
 - num_epochs: 3
 ### Training Results
+| Epoch  | Step  | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
+|:------:|:-----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
+| 0.7131 | 3000  | 0.0061          | 0.7978               | 0.7830            | 0.7904        | 0.9764              |
+| 1.4262 | 6000  | 0.0059          | 0.8170               | 0.7843            | 0.8004        | 0.9774              |
+| 2.1393 | 9000  | 0.0061          | 0.8221               | 0.7938            | 0.8077        | 0.9772              |
+| 2.8524 | 12000 | 0.0062          | 0.8211               | 0.8003            | 0.8106        | 0.9780              |
 ### Environmental Impact
 Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
+- **Carbon Emitted**: 0.248 kg of CO2
+- **Hours Used**: 1.766 hours
 ### Training Hardware
 - **On Cloud**: No

pytorch_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f64ee8bee4e465b21fba71e70d47d4bb19ba4eef09d7565dc544b41248ae8e58
 size 433332917

 version https://git-lfs.github.com/spec/v1
+oid sha256:55ca4260a3118b42791a244aa1d7981a524aa53b6033730ec8a6f1fba949ee04
 size 433332917