Akshan Krithick commited on Nov 23, 2025

Commit

88e4669

verified ·

1 Parent(s): 1e86a7d

v2: 5-epoch RoBERTa-large + LoRA on LEDGAR (acc=0.869, macro F1=0.790)

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

README.md +128 -176
adapter_config.json +7 -8
adapter_model.safetensors +2 -2
checkpoint-1875/README.md +202 -0
checkpoint-1875/adapter_config.json +36 -0
checkpoint-1875/adapter_model.safetensors +3 -0
checkpoint-1875/merges.txt +0 -0
checkpoint-1875/optimizer.pt +3 -0
checkpoint-1875/rng_state.pth +3 -0
checkpoint-1875/scheduler.pt +3 -0
checkpoint-1875/special_tokens_map.json +15 -0
checkpoint-1875/tokenizer.json +0 -0
checkpoint-1875/tokenizer_config.json +57 -0
checkpoint-1875/trainer_state.json +169 -0
checkpoint-1875/training_args.bin +3 -0
checkpoint-1875/vocab.json +0 -0
checkpoint-3750/README.md +202 -0
checkpoint-3750/adapter_config.json +36 -0
checkpoint-3750/adapter_model.safetensors +3 -0
checkpoint-3750/merges.txt +0 -0
checkpoint-3750/optimizer.pt +3 -0
checkpoint-3750/rng_state.pth +3 -0
checkpoint-3750/scheduler.pt +3 -0
checkpoint-3750/special_tokens_map.json +15 -0
checkpoint-3750/tokenizer.json +0 -0
checkpoint-3750/tokenizer_config.json +57 -0
checkpoint-3750/trainer_state.json +312 -0
checkpoint-3750/training_args.bin +3 -0
checkpoint-3750/vocab.json +0 -0
checkpoint-5625/README.md +202 -0
checkpoint-5625/adapter_config.json +36 -0
checkpoint-5625/adapter_model.safetensors +3 -0
checkpoint-5625/merges.txt +0 -0
checkpoint-5625/optimizer.pt +3 -0
checkpoint-5625/rng_state.pth +3 -0
checkpoint-5625/scheduler.pt +3 -0
checkpoint-5625/special_tokens_map.json +15 -0
checkpoint-5625/tokenizer.json +0 -0
checkpoint-5625/tokenizer_config.json +57 -0
checkpoint-5625/trainer_state.json +455 -0
checkpoint-5625/training_args.bin +3 -0
checkpoint-5625/vocab.json +0 -0
checkpoint-7500/README.md +202 -0
checkpoint-7500/adapter_config.json +36 -0
checkpoint-7500/adapter_model.safetensors +3 -0
checkpoint-7500/merges.txt +0 -0
checkpoint-7500/optimizer.pt +3 -0
checkpoint-7500/rng_state.pth +3 -0
checkpoint-7500/scheduler.pt +3 -0
checkpoint-7500/special_tokens_map.json +15 -0

README.md CHANGED Viewed

@@ -1,250 +1,202 @@
 ---
-license: mit
-datasets:
-- coastalcph/lex_glue
-- coastalchp/ledgar
-metrics:
-- accuracy
-- f1
-tags:
-- legal
-- contracts
-- clause-classification
-- governance
-- robustness
-- lora
-- PEFT
-- roberta-large
-task_categories:
-- sequence-classification
-model_name: termsconditioned-roberta-large-ledgar-lora
-library_name: transformers
-pipeline_tag: text-classification
-language:
-- en
-base_model:
-- FacebookAI/roberta-large
-model-index:
-- name: termsconditioned-roberta-large-ledgar-lora
-  results:
-  - task:
-      type: text-classification
-      name: Contract clause classification
-    dataset:
-      name: LEDGAR (LexGLUE)
-      type: coastalcph/lex_glue
-      config: ledgar
-      split: validation
-    metrics:
-    - name: Accuracy
-      type: accuracy
-      value: 0.815
-    - name: Macro F1
-      type: f1
-      value: 0.742
 ---
-# TermsConditioned – RoBERTa-large LEDGAR + LoRA
-A RoBERTa-large encoder, fine-tuned with LoRA on the LEDGAR subset of LexGLUE to classify contract paragraphs into 100 clause families, with an explicit *risk bucket* and slice-level governance analysis.
-This repo only contains the **adapter weights + tokenizer**, not the full base model.
-To use it, you must load `roberta-large` from Hugging Face and then apply these LoRA adapters.
-You cannot ` AutoModelForSequenceClassification.from_pretrained("snickerszz/…") `
----
-## 1. What this model does
-- Input: a **single contract paragraph** (e.g., ToS, MSA, clickwrap clause).
-- Output: one of **100 LEDGAR clause families** (e.g., `Arbitration`, `Governing Laws`, `Indemnity`, `Limitation Of Liability`, `Amendments`, etc.).
-- Special focus on a **risk bucket** of families where “false green-lights” are costly:
-  - `Arbitration`
-  - `Waiver Of Jury Trials`
-  - `Waivers`
-  - `Jurisdictions`, `Submission To Jurisdiction`, `Consent To Jurisdiction`, `Governing Laws`
-  - `Modifications`, `Amendments`
-  - `Limitation Of Liability`, `Remedies`, `Indemnity`, `Indemnifications`
-The model is intended as the **classification core** for a governance-style triage system:
-> “Don’t miss risky clauses; if unsure, abstain and send to human review.”
----
-## 2. Intended use
-### 2.1. Primary use case
-This model is designed to be part of a **Terms & Conditions / contract intake triage tool** that:
-1. Splits a document into paragraphs.
-2. Runs this classifier on each paragraph.
-3. Applies a **policy** over the probabilities:
-   - high-confidence risky clause → *“Flag as risky”*
-   - high-confidence non-risky clause → *“Green-light”*
-   - low-confidence → *“Needs review” (abstain)*
-### 2.2. Non-goals
-- Not legal advice.
-- Not guaranteed fair / non-biased for every jurisdiction or contract type.
-- Not designed to replace full contract review or negotiation tools.
----
-## 3. Training data
-- **Dataset:** `coastalcph/lex_glue` (LEDGAR split)
-- **Train / Validation / Test:**
-  - train: 60,000 paragraphs
-  - validation: 10,000 paragraphs
-  - test: 10,000 paragraphs
-- **Labels:** 100 clause families as defined in LEDGAR.
-Each example is a *single paragraph* of a contract, labeled with exactly one family.
----
-## 4. Model architecture & fine-tuning
-### 4.1 Base model
-- `roberta-large` from Hugging Face (`transformers`).
-### 4.2 LoRA setup
-We apply LoRA to a subset of the encoder:
-- **Target modules:** `query`, `key`, `value`, `dense`
-- **LoRA config:**
-  - `r = 16`
-  - `lora_alpha = 32`
-  - `lora_dropout = 0.05`
-- **Frozen:** All other base model weights.
-- **Saved extra modules:** `classifier` head kept and saved along with adapters.
-### 4.3 Optimization & training
-- **Objective:** weighted cross-entropy with **class weights** to counter label imbalance.
-- **Label smoothing:** ε = 0.1
-- **Optimizer:** AdamW (8-bit or standard), weight decay 0.1
-- **Scheduler:** cosine LR with warmup
-- **Batch size (effective):** 32 (per-device × grad_accumulation)
-- **Epochs:** 5
-- **Max seq length:** 384 tokens
-- **Hardware:** single GPU (tested on A100 via Colab)
-Reproducibility knobs:
-- Fixed random seed (42) for Python / NumPy / PyTorch.
-- Deterministic behavior is not fully guaranteed but training is stable.
----
-## 5. Evaluation
-All numbers below are on the **validation split** (10,000 paragraphs) with the LoRA adapters applied.
-### 5.1 Standard metrics
-- **Accuracy:** ~0.815
-- **Macro F1:** ~0.742
-This is a multi-class setting with 100 labels and notable class imbalance.
-### 5.2 Calibration
-On top of logits, we apply **temperature scaling**:
-- Search over a grid of temperatures.
-- Best temperature on validation: **T\* ≈ 0.8**
-- Expected Calibration Error (ECE) before / after scaling:
-  - `ECE_raw ≈ 0.115`
-  - `ECE_cal ≈ 0.022`
-These calibrated probabilities are what we use for **governance policies** (false-green caps, abstain band, etc.).
----
-## 6. Inference: using the model
-### Load base + adapters
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-from peft import PeftModel
-BASE = "roberta-large"
-ADAPTER_REPO = "snickerszz/termsconditioned-roberta-large-ledgar-lora"
-tokenizer = AutoTokenizer.from_pretrained(BASE)
-base_model = AutoModelForSequenceClassification.from_pretrained(
-    BASE,
-    num_labels=100,
-)
-model = PeftModel.from_pretrained(base_model, ADAPTER_REPO)
-model.eval()
-```
----
-You can test the model on synthetic or real ToS paragraphs (for example, arbitration clauses, limitation of liability caps, or indemnity language)
-```python
-# Must run the above cell first
-# This cell is a sample use case for the model
-text = "Any dispute arising out of or relating to this Agreement shall be finally settled by binding arbitration..."
-inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=384)
-with torch.no_grad():
-    outputs = model(**inputs)
-    probs = outputs.logits.softmax(dim=-1)[0]
-topk = torch.topk(probs, k=5)
-for idx, score in zip(topk.indices.tolist(), topk.values.tolist()):
-    print(idx, float(score))
-```
-# 7. Limitations and warnings
-- Domain
-The model is trained on LEDGAR (public contract clauses). Behavior on consumer terms of service, privacy policies, employment agreements, or narrow industry contracts may differ. You should re-check performance on your own corpus.
-- Single label per paragraph
-The dataset assumes one dominant clause family per paragraph. Real-world paragraphs can mix multiple concerns (for example, arbitration plus waiver of class actions). Treat the prediction as the "primary" family, not an exhaustive tagging of everything risky in the text.
-- Language
-Training data is English-only; performance on other languages is not characterized.
-- Legal risk
-This model is for triage, research, and prototyping. It is not legal advice. Any production use should keep a human in the loop and document the residual error rates, especially for the risky bucket.
----
-# 8. How to cite or reference
-If you use this model in a writeup, you can describe it as:
-A RoBERTa-large encoder fine-tuned with LoRA on the LEDGAR subset of LexGLUE for 100-way contract clause classification.
----
-# 9. Files in this repo
-- adapter_model.safetensors – LoRA adapter weights for the classifier head and selected encoder modules
-- adapter_config.json – PEFT / LoRA configuration
-- config.json – model configuration (num_labels, id2label, label2id, etc.)
-- tokenizer.json, vocab.json, merges.txt, tokenizer_config.json – tokenizer assets compatible with roberta-large
-- special_tokens_map.json – tokenizer special token mapping
-- README.md
-The base roberta-large weights are not duplicated here; at inference time they are loaded from the main Hugging Face model hub.

 ---
+base_model: roberta-large
+library_name: peft
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

adapter_config.json CHANGED Viewed

@@ -1,9 +1,6 @@
 {
   "alpha_pattern": {},
-  "auto_mapping": {
-    "base_model_class": "RobertaForSequenceClassification",
-    "parent_library": "transformers.models.roberta.modeling_roberta"
-  },
   "base_model_name_or_path": "roberta-large",
   "bias": "none",
   "fan_in_fan_out": false,
@@ -18,18 +15,20 @@
   "megatron_config": null,
   "megatron_core": "megatron.core",
   "modules_to_save": [
-    "classifier"
   ],
   "peft_type": "LORA",
   "r": 16,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "output.dense",
     "value",
     "intermediate.dense",
-    "query",
-    "key"
   ],
   "task_type": "SEQ_CLS",
   "use_dora": false,

 {
   "alpha_pattern": {},
+  "auto_mapping": null,
   "base_model_name_or_path": "roberta-large",
   "bias": "none",
   "fan_in_fan_out": false,
   "megatron_config": null,
   "megatron_core": "megatron.core",
   "modules_to_save": [
+    "classifier",
+    "classifier",
+    "score"
   ],
   "peft_type": "LORA",
   "r": 16,
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
+    "key",
     "value",
+    "output.dense",
     "intermediate.dense",
+    "query"
   ],
   "task_type": "SEQ_CLS",
   "use_dora": false,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7156f28bcdf34dac6461166a1c3328e900c16777d50caa690b39b9ee22a966ac
-size 30658032

 version https://git-lfs.github.com/spec/v1
+oid sha256:96bf21eabe22965d9dd84dd5189e85dbc71bf074115f1921c13ed90671e5d0d6
+size 32962328

checkpoint-1875/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: roberta-large
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

checkpoint-1875/adapter_config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "roberta-large",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "key",
+    "value",
+    "output.dense",
+    "intermediate.dense",
+    "query"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-1875/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6af536be16c08d8eac68d126ca6e0716353bbe04ea2f5faf02c2c0050a27f066
+size 32962328

checkpoint-1875/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1875/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:21f1c924a35666af5a2243f793ad74a6b0bc018511b6dc5e78ccbdaf3695f67f
+size 66085050

checkpoint-1875/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:261a780904fe177e1e36fdaf4ea0b21c232cc175f147030bd514711a079b4bd2
+size 14244

checkpoint-1875/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:743f1f6fe0e37725433bda0b56e65a0c808acf8ff21fefeb923db931b8bc7f26
+size 1064

checkpoint-1875/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

checkpoint-1875/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-1875/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50264": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "errors": "replace",
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "RobertaTokenizer",
+  "trim_offsets": true,
+  "unk_token": "<unk>"
+}

checkpoint-1875/trainer_state.json ADDED Viewed

	@@ -0,0 +1,169 @@

+{
+  "best_metric": 0.7341778006943732,
+  "best_model_checkpoint": "./tc_roberta_ledgar_lora_v2/checkpoint-1875",
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 1875,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 10.646000862121582,
+      "learning_rate": 1.7761989342806394e-05,
+      "loss": 4.5715,
+      "step": 100
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 6.6032819747924805,
+      "learning_rate": 3.552397868561279e-05,
+      "loss": 4.1558,
+      "step": 200
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 5.015078544616699,
+      "learning_rate": 5.3285968028419185e-05,
+      "loss": 2.8017,
+      "step": 300
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 7.0364670753479,
+      "learning_rate": 7.104795737122558e-05,
+      "loss": 1.6959,
+      "step": 400
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 4.8350419998168945,
+      "learning_rate": 8.880994671403198e-05,
+      "loss": 1.1395,
+      "step": 500
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 6.103910446166992,
+      "learning_rate": 9.999565001331225e-05,
+      "loss": 0.9846,
+      "step": 600
+    },
+    {
+      "epoch": 0.37333333333333335,
+      "grad_norm": 6.555258274078369,
+      "learning_rate": 9.994037264113944e-05,
+      "loss": 0.8946,
+      "step": 700
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 6.398642539978027,
+      "learning_rate": 9.982162701557139e-05,
+      "loss": 0.7935,
+      "step": 800
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 5.122348785400391,
+      "learning_rate": 9.963956404812623e-05,
+      "loss": 0.793,
+      "step": 900
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 5.0598368644714355,
+      "learning_rate": 9.939441511910694e-05,
+      "loss": 0.7565,
+      "step": 1000
+    },
+    {
+      "epoch": 0.5866666666666667,
+      "grad_norm": 3.654729127883911,
+      "learning_rate": 9.908649178354454e-05,
+      "loss": 0.751,
+      "step": 1100
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 2.905059337615967,
+      "learning_rate": 9.871618537524881e-05,
+      "loss": 0.6975,
+      "step": 1200
+    },
+    {
+      "epoch": 0.6933333333333334,
+      "grad_norm": 4.096899032592773,
+      "learning_rate": 9.828396650946974e-05,
+      "loss": 0.689,
+      "step": 1300
+    },
+    {
+      "epoch": 0.7466666666666667,
+      "grad_norm": 5.100860595703125,
+      "learning_rate": 9.779038448480173e-05,
+      "loss": 0.6782,
+      "step": 1400
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 5.342113018035889,
+      "learning_rate": 9.723606658509063e-05,
+      "loss": 0.6509,
+      "step": 1500
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 3.6632707118988037,
+      "learning_rate": 9.662171728223081e-05,
+      "loss": 0.6425,
+      "step": 1600
+    },
+    {
+      "epoch": 0.9066666666666666,
+      "grad_norm": 3.0369760990142822,
+      "learning_rate": 9.594811734086548e-05,
+      "loss": 0.6456,
+      "step": 1700
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 4.97806453704834,
+      "learning_rate": 9.521612282612803e-05,
+      "loss": 0.6601,
+      "step": 1800
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.8317,
+      "eval_loss": 0.5902224183082581,
+      "eval_macro_f1": 0.7341778006943732,
+      "eval_runtime": 437.5217,
+      "eval_samples_per_second": 22.856,
+      "eval_steps_per_second": 0.715,
+      "step": 1875
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 9375,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 3.936710002364006e+16,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-1875/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18be2ea0eb3d9400c09b51e632353826ed47ff8ae70a076cad1cc0cbac400c7d
+size 5176

checkpoint-1875/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-3750/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: roberta-large
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

checkpoint-3750/adapter_config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "roberta-large",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "key",
+    "value",
+    "output.dense",
+    "intermediate.dense",
+    "query"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-3750/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:83492157040d80752bf4b71703faa808b11f0c5ee86bde7fbfebc94810402c9a
+size 32962328

checkpoint-3750/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-3750/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:48814f04bd35d08a46d3eb6668995ed587e41ba28ecfd071724c82ad9d55f5ac
+size 66085050

checkpoint-3750/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2b4f9108da3b8ebefbc583fe65375f9e90dfe2f92cc450a1a89addb86dc1a9ef
+size 14244

checkpoint-3750/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e990c7f435260a3f7a7073eb871776183412cea47e6adc89d423ea41302f008e
+size 1064

checkpoint-3750/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

checkpoint-3750/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-3750/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50264": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "errors": "replace",
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "RobertaTokenizer",
+  "trim_offsets": true,
+  "unk_token": "<unk>"
+}

checkpoint-3750/trainer_state.json ADDED Viewed

	@@ -0,0 +1,312 @@

+{
+  "best_metric": 0.7566410403530152,
+  "best_model_checkpoint": "./tc_roberta_ledgar_lora_v2/checkpoint-3750",
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 3750,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 10.646000862121582,
+      "learning_rate": 1.7761989342806394e-05,
+      "loss": 4.5715,
+      "step": 100
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 6.6032819747924805,
+      "learning_rate": 3.552397868561279e-05,
+      "loss": 4.1558,
+      "step": 200
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 5.015078544616699,
+      "learning_rate": 5.3285968028419185e-05,
+      "loss": 2.8017,
+      "step": 300
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 7.0364670753479,
+      "learning_rate": 7.104795737122558e-05,
+      "loss": 1.6959,
+      "step": 400
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 4.8350419998168945,
+      "learning_rate": 8.880994671403198e-05,
+      "loss": 1.1395,
+      "step": 500
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 6.103910446166992,
+      "learning_rate": 9.999565001331225e-05,
+      "loss": 0.9846,
+      "step": 600
+    },
+    {
+      "epoch": 0.37333333333333335,
+      "grad_norm": 6.555258274078369,
+      "learning_rate": 9.994037264113944e-05,
+      "loss": 0.8946,
+      "step": 700
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 6.398642539978027,
+      "learning_rate": 9.982162701557139e-05,
+      "loss": 0.7935,
+      "step": 800
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 5.122348785400391,
+      "learning_rate": 9.963956404812623e-05,
+      "loss": 0.793,
+      "step": 900
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 5.0598368644714355,
+      "learning_rate": 9.939441511910694e-05,
+      "loss": 0.7565,
+      "step": 1000
+    },
+    {
+      "epoch": 0.5866666666666667,
+      "grad_norm": 3.654729127883911,
+      "learning_rate": 9.908649178354454e-05,
+      "loss": 0.751,
+      "step": 1100
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 2.905059337615967,
+      "learning_rate": 9.871618537524881e-05,
+      "loss": 0.6975,
+      "step": 1200
+    },
+    {
+      "epoch": 0.6933333333333334,
+      "grad_norm": 4.096899032592773,
+      "learning_rate": 9.828396650946974e-05,
+      "loss": 0.689,
+      "step": 1300
+    },
+    {
+      "epoch": 0.7466666666666667,
+      "grad_norm": 5.100860595703125,
+      "learning_rate": 9.779038448480173e-05,
+      "loss": 0.6782,
+      "step": 1400
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 5.342113018035889,
+      "learning_rate": 9.723606658509063e-05,
+      "loss": 0.6509,
+      "step": 1500
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 3.6632707118988037,
+      "learning_rate": 9.662171728223081e-05,
+      "loss": 0.6425,
+      "step": 1600
+    },
+    {
+      "epoch": 0.9066666666666666,
+      "grad_norm": 3.0369760990142822,
+      "learning_rate": 9.594811734086548e-05,
+      "loss": 0.6456,
+      "step": 1700
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 4.97806453704834,
+      "learning_rate": 9.521612282612803e-05,
+      "loss": 0.6601,
+      "step": 1800
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.8317,
+      "eval_loss": 0.5902224183082581,
+      "eval_macro_f1": 0.7341778006943732,
+      "eval_runtime": 437.5217,
+      "eval_samples_per_second": 22.856,
+      "eval_steps_per_second": 0.715,
+      "step": 1875
+    },
+    {
+      "epoch": 1.0133333333333334,
+      "grad_norm": 4.172771453857422,
+      "learning_rate": 9.442666401568534e-05,
+      "loss": 0.6399,
+      "step": 1900
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 5.650057792663574,
+      "learning_rate": 9.358074421746598e-05,
+      "loss": 0.6001,
+      "step": 2000
+    },
+    {
+      "epoch": 1.12,
+      "grad_norm": 3.94273042678833,
+      "learning_rate": 9.267943849457557e-05,
+      "loss": 0.576,
+      "step": 2100
+    },
+    {
+      "epoch": 1.1733333333333333,
+      "grad_norm": 3.331911325454712,
+      "learning_rate": 9.172389229901974e-05,
+      "loss": 0.5939,
+      "step": 2200
+    },
+    {
+      "epoch": 1.2266666666666666,
+      "grad_norm": 3.3338027000427246,
+      "learning_rate": 9.071532001597156e-05,
+      "loss": 0.5522,
+      "step": 2300
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 3.442885160446167,
+      "learning_rate": 8.965500342043274e-05,
+      "loss": 0.5344,
+      "step": 2400
+    },
+    {
+      "epoch": 1.3333333333333333,
+      "grad_norm": 3.6115713119506836,
+      "learning_rate": 8.854429004825062e-05,
+      "loss": 0.5329,
+      "step": 2500
+    },
+    {
+      "epoch": 1.3866666666666667,
+      "grad_norm": 2.2793359756469727,
+      "learning_rate": 8.738459148356101e-05,
+      "loss": 0.5508,
+      "step": 2600
+    },
+    {
+      "epoch": 1.44,
+      "grad_norm": 4.802005767822266,
+      "learning_rate": 8.617738156483314e-05,
+      "loss": 0.552,
+      "step": 2700
+    },
+    {
+      "epoch": 1.4933333333333334,
+      "grad_norm": 5.260004997253418,
+      "learning_rate": 8.492419451179685e-05,
+      "loss": 0.5146,
+      "step": 2800
+    },
+    {
+      "epoch": 1.5466666666666666,
+      "grad_norm": 1.7511740922927856,
+      "learning_rate": 8.36266229756325e-05,
+      "loss": 0.541,
+      "step": 2900
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 4.835184097290039,
+      "learning_rate": 8.228631601490133e-05,
+      "loss": 0.5497,
+      "step": 3000
+    },
+    {
+      "epoch": 1.6533333333333333,
+      "grad_norm": 3.9777235984802246,
+      "learning_rate": 8.090497699978887e-05,
+      "loss": 0.5387,
+      "step": 3100
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 4.145409107208252,
+      "learning_rate": 7.948436144732472e-05,
+      "loss": 0.5404,
+      "step": 3200
+    },
+    {
+      "epoch": 1.76,
+      "grad_norm": 5.38320255279541,
+      "learning_rate": 7.802627479032992e-05,
+      "loss": 0.5341,
+      "step": 3300
+    },
+    {
+      "epoch": 1.8133333333333335,
+      "grad_norm": 4.600132942199707,
+      "learning_rate": 7.65325700829273e-05,
+      "loss": 0.5253,
+      "step": 3400
+    },
+    {
+      "epoch": 1.8666666666666667,
+      "grad_norm": 3.0599677562713623,
+      "learning_rate": 7.500514564553084e-05,
+      "loss": 0.5523,
+      "step": 3500
+    },
+    {
+      "epoch": 1.92,
+      "grad_norm": 5.470056056976318,
+      "learning_rate": 7.344594265230701e-05,
+      "loss": 0.5216,
+      "step": 3600
+    },
+    {
+      "epoch": 1.9733333333333334,
+      "grad_norm": 3.4144089221954346,
+      "learning_rate": 7.185694266417408e-05,
+      "loss": 0.4931,
+      "step": 3700
+    },
+    {
+      "epoch": 2.0,
+      "eval_accuracy": 0.851,
+      "eval_loss": 0.5207030773162842,
+      "eval_macro_f1": 0.7566410403530152,
+      "eval_runtime": 437.557,
+      "eval_samples_per_second": 22.854,
+      "eval_steps_per_second": 0.715,
+      "step": 3750
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 9375,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 7.879117263417754e+16,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-3750/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18be2ea0eb3d9400c09b51e632353826ed47ff8ae70a076cad1cc0cbac400c7d
+size 5176

checkpoint-3750/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-5625/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: roberta-large
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

checkpoint-5625/adapter_config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "roberta-large",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "key",
+    "value",
+    "output.dense",
+    "intermediate.dense",
+    "query"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-5625/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a840783475f155164e7dbd2287891d584641035ebe5a1c70beeb28d0f7462c8f
+size 32962328

checkpoint-5625/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-5625/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f209f09a6f2e426cd543bada5bac3c84b139e7bef63368658d9bac77ed5c6ca9
+size 66085050

checkpoint-5625/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a96eb1a09489a51eb58a808c0e28ec1ae469be5e2990e2e2da674aaa28669cd2
+size 14244

checkpoint-5625/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7c50f94388c01b54f1a342a2e9718481912b1bb121c4c40ba9de8ab6d65121cd
+size 1064

checkpoint-5625/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}

checkpoint-5625/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-5625/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,57 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "50264": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "errors": "replace",
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "tokenizer_class": "RobertaTokenizer",
+  "trim_offsets": true,
+  "unk_token": "<unk>"
+}

checkpoint-5625/trainer_state.json ADDED Viewed

	@@ -0,0 +1,455 @@

+{
+  "best_metric": 0.7745073192895859,
+  "best_model_checkpoint": "./tc_roberta_ledgar_lora_v2/checkpoint-5625",
+  "epoch": 3.0,
+  "eval_steps": 500,
+  "global_step": 5625,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.05333333333333334,
+      "grad_norm": 10.646000862121582,
+      "learning_rate": 1.7761989342806394e-05,
+      "loss": 4.5715,
+      "step": 100
+    },
+    {
+      "epoch": 0.10666666666666667,
+      "grad_norm": 6.6032819747924805,
+      "learning_rate": 3.552397868561279e-05,
+      "loss": 4.1558,
+      "step": 200
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 5.015078544616699,
+      "learning_rate": 5.3285968028419185e-05,
+      "loss": 2.8017,
+      "step": 300
+    },
+    {
+      "epoch": 0.21333333333333335,
+      "grad_norm": 7.0364670753479,
+      "learning_rate": 7.104795737122558e-05,
+      "loss": 1.6959,
+      "step": 400
+    },
+    {
+      "epoch": 0.26666666666666666,
+      "grad_norm": 4.8350419998168945,
+      "learning_rate": 8.880994671403198e-05,
+      "loss": 1.1395,
+      "step": 500
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 6.103910446166992,
+      "learning_rate": 9.999565001331225e-05,
+      "loss": 0.9846,
+      "step": 600
+    },
+    {
+      "epoch": 0.37333333333333335,
+      "grad_norm": 6.555258274078369,
+      "learning_rate": 9.994037264113944e-05,
+      "loss": 0.8946,
+      "step": 700
+    },
+    {
+      "epoch": 0.4266666666666667,
+      "grad_norm": 6.398642539978027,
+      "learning_rate": 9.982162701557139e-05,
+      "loss": 0.7935,
+      "step": 800
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 5.122348785400391,
+      "learning_rate": 9.963956404812623e-05,
+      "loss": 0.793,
+      "step": 900
+    },
+    {
+      "epoch": 0.5333333333333333,
+      "grad_norm": 5.0598368644714355,
+      "learning_rate": 9.939441511910694e-05,
+      "loss": 0.7565,
+      "step": 1000
+    },
+    {
+      "epoch": 0.5866666666666667,
+      "grad_norm": 3.654729127883911,
+      "learning_rate": 9.908649178354454e-05,
+      "loss": 0.751,
+      "step": 1100
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 2.905059337615967,
+      "learning_rate": 9.871618537524881e-05,
+      "loss": 0.6975,
+      "step": 1200
+    },
+    {
+      "epoch": 0.6933333333333334,
+      "grad_norm": 4.096899032592773,
+      "learning_rate": 9.828396650946974e-05,
+      "loss": 0.689,
+      "step": 1300
+    },
+    {
+      "epoch": 0.7466666666666667,
+      "grad_norm": 5.100860595703125,
+      "learning_rate": 9.779038448480173e-05,
+      "loss": 0.6782,
+      "step": 1400
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 5.342113018035889,
+      "learning_rate": 9.723606658509063e-05,
+      "loss": 0.6509,
+      "step": 1500
+    },
+    {
+      "epoch": 0.8533333333333334,
+      "grad_norm": 3.6632707118988037,
+      "learning_rate": 9.662171728223081e-05,
+      "loss": 0.6425,
+      "step": 1600
+    },
+    {
+      "epoch": 0.9066666666666666,
+      "grad_norm": 3.0369760990142822,
+      "learning_rate": 9.594811734086548e-05,
+      "loss": 0.6456,
+      "step": 1700
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 4.97806453704834,
+      "learning_rate": 9.521612282612803e-05,
+      "loss": 0.6601,
+      "step": 1800
+    },
+    {
+      "epoch": 1.0,
+      "eval_accuracy": 0.8317,
+      "eval_loss": 0.5902224183082581,
+      "eval_macro_f1": 0.7341778006943732,
+      "eval_runtime": 437.5217,
+      "eval_samples_per_second": 22.856,
+      "eval_steps_per_second": 0.715,
+      "step": 1875
+    },
+    {
+      "epoch": 1.0133333333333334,
+      "grad_norm": 4.172771453857422,
+      "learning_rate": 9.442666401568534e-05,
+      "loss": 0.6399,
+      "step": 1900
+    },
+    {
+      "epoch": 1.0666666666666667,
+      "grad_norm": 5.650057792663574,
+      "learning_rate": 9.358074421746598e-05,
+      "loss": 0.6001,
+      "step": 2000
+    },
+    {
+      "epoch": 1.12,
+      "grad_norm": 3.94273042678833,
+      "learning_rate": 9.267943849457557e-05,
+      "loss": 0.576,
+      "step": 2100
+    },
+    {
+      "epoch": 1.1733333333333333,
+      "grad_norm": 3.331911325454712,
+      "learning_rate": 9.172389229901974e-05,
+      "loss": 0.5939,
+      "step": 2200
+    },
+    {
+      "epoch": 1.2266666666666666,
+      "grad_norm": 3.3338027000427246,
+      "learning_rate": 9.071532001597156e-05,
+      "loss": 0.5522,
+      "step": 2300
+    },
+    {
+      "epoch": 1.28,
+      "grad_norm": 3.442885160446167,
+      "learning_rate": 8.965500342043274e-05,
+      "loss": 0.5344,
+      "step": 2400
+    },
+    {
+      "epoch": 1.3333333333333333,
+      "grad_norm": 3.6115713119506836,
+      "learning_rate": 8.854429004825062e-05,
+      "loss": 0.5329,
+      "step": 2500
+    },
+    {
+      "epoch": 1.3866666666666667,
+      "grad_norm": 2.2793359756469727,
+      "learning_rate": 8.738459148356101e-05,
+      "loss": 0.5508,
+      "step": 2600
+    },
+    {
+      "epoch": 1.44,
+      "grad_norm": 4.802005767822266,
+      "learning_rate": 8.617738156483314e-05,
+      "loss": 0.552,
+      "step": 2700
+    },
+    {
+      "epoch": 1.4933333333333334,
+      "grad_norm": 5.260004997253418,
+      "learning_rate": 8.492419451179685e-05,
+      "loss": 0.5146,
+      "step": 2800
+    },
+    {
+      "epoch": 1.5466666666666666,
+      "grad_norm": 1.7511740922927856,
+      "learning_rate": 8.36266229756325e-05,
+      "loss": 0.541,
+      "step": 2900
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 4.835184097290039,
+      "learning_rate": 8.228631601490133e-05,
+      "loss": 0.5497,
+      "step": 3000
+    },
+    {
+      "epoch": 1.6533333333333333,
+      "grad_norm": 3.9777235984802246,
+      "learning_rate": 8.090497699978887e-05,
+      "loss": 0.5387,
+      "step": 3100
+    },
+    {
+      "epoch": 1.7066666666666666,
+      "grad_norm": 4.145409107208252,
+      "learning_rate": 7.948436144732472e-05,
+      "loss": 0.5404,
+      "step": 3200
+    },
+    {
+      "epoch": 1.76,
+      "grad_norm": 5.38320255279541,
+      "learning_rate": 7.802627479032992e-05,
+      "loss": 0.5341,
+      "step": 3300
+    },
+    {
+      "epoch": 1.8133333333333335,
+      "grad_norm": 4.600132942199707,
+      "learning_rate": 7.65325700829273e-05,
+      "loss": 0.5253,
+      "step": 3400
+    },
+    {
+      "epoch": 1.8666666666666667,
+      "grad_norm": 3.0599677562713623,
+      "learning_rate": 7.500514564553084e-05,
+      "loss": 0.5523,
+      "step": 3500
+    },
+    {
+      "epoch": 1.92,
+      "grad_norm": 5.470056056976318,
+      "learning_rate": 7.344594265230701e-05,
+      "loss": 0.5216,
+      "step": 3600
+    },
+    {
+      "epoch": 1.9733333333333334,
+      "grad_norm": 3.4144089221954346,
+      "learning_rate": 7.185694266417408e-05,
+      "loss": 0.4931,
+      "step": 3700
+    },
+    {
+      "epoch": 2.0,
+      "eval_accuracy": 0.851,
+      "eval_loss": 0.5207030773162842,
+      "eval_macro_f1": 0.7566410403530152,
+      "eval_runtime": 437.557,
+      "eval_samples_per_second": 22.854,
+      "eval_steps_per_second": 0.715,
+      "step": 3750
+    },
+    {
+      "epoch": 2.026666666666667,
+      "grad_norm": 3.020866870880127,
+      "learning_rate": 7.024016511047464e-05,
+      "loss": 0.4712,
+      "step": 3800
+    },
+    {
+      "epoch": 2.08,
+      "grad_norm": 6.192739486694336,
+      "learning_rate": 6.859766472252193e-05,
+      "loss": 0.4314,
+      "step": 3900
+    },
+    {
+      "epoch": 2.1333333333333333,
+      "grad_norm": 3.5494303703308105,
+      "learning_rate": 6.693152892228168e-05,
+      "loss": 0.4736,
+      "step": 4000
+    },
+    {
+      "epoch": 2.1866666666666665,
+      "grad_norm": 3.1296679973602295,
+      "learning_rate": 6.524387516950768e-05,
+      "loss": 0.4395,
+      "step": 4100
+    },
+    {
+      "epoch": 2.24,
+      "grad_norm": 3.564858913421631,
+      "learning_rate": 6.353684827070339e-05,
+      "loss": 0.4291,
+      "step": 4200
+    },
+    {
+      "epoch": 2.2933333333333334,
+      "grad_norm": 2.8575546741485596,
+      "learning_rate": 6.181261765332872e-05,
+      "loss": 0.4209,
+      "step": 4300
+    },
+    {
+      "epoch": 2.3466666666666667,
+      "grad_norm": 5.188271999359131,
+      "learning_rate": 6.007337460871666e-05,
+      "loss": 0.4664,
+      "step": 4400
+    },
+    {
+      "epoch": 2.4,
+      "grad_norm": 2.8347561359405518,
+      "learning_rate": 5.832132950720357e-05,
+      "loss": 0.4596,
+      "step": 4500
+    },
+    {
+      "epoch": 2.453333333333333,
+      "grad_norm": 3.6415882110595703,
+      "learning_rate": 5.6558708989012196e-05,
+      "loss": 0.4261,
+      "step": 4600
+    },
+    {
+      "epoch": 2.506666666666667,
+      "grad_norm": 3.578502893447876,
+      "learning_rate": 5.4787753134457775e-05,
+      "loss": 0.428,
+      "step": 4700
+    },
+    {
+      "epoch": 2.56,
+      "grad_norm": 4.507837295532227,
+      "learning_rate": 5.301071261707322e-05,
+      "loss": 0.4474,
+      "step": 4800
+    },
+    {
+      "epoch": 2.6133333333333333,
+      "grad_norm": 5.312231540679932,
+      "learning_rate": 5.1229845843271774e-05,
+      "loss": 0.4236,
+      "step": 4900
+    },
+    {
+      "epoch": 2.6666666666666665,
+      "grad_norm": 3.6730709075927734,
+      "learning_rate": 4.944741608218189e-05,
+      "loss": 0.46,
+      "step": 5000
+    },
+    {
+      "epoch": 2.7199999999999998,
+      "grad_norm": 3.846958637237549,
+      "learning_rate": 4.7665688589302337e-05,
+      "loss": 0.4552,
+      "step": 5100
+    },
+    {
+      "epoch": 2.7733333333333334,
+      "grad_norm": 3.6105239391326904,
+      "learning_rate": 4.5886927727632815e-05,
+      "loss": 0.4256,
+      "step": 5200
+    },
+    {
+      "epoch": 2.8266666666666667,
+      "grad_norm": 3.2457873821258545,
+      "learning_rate": 4.4113394089938806e-05,
+      "loss": 0.4394,
+      "step": 5300
+    },
+    {
+      "epoch": 2.88,
+      "grad_norm": 3.628878593444824,
+      "learning_rate": 4.234734162580795e-05,
+      "loss": 0.4428,
+      "step": 5400
+    },
+    {
+      "epoch": 2.9333333333333336,
+      "grad_norm": 3.633319854736328,
+      "learning_rate": 4.059101477714921e-05,
+      "loss": 0.4462,
+      "step": 5500
+    },
+    {
+      "epoch": 2.986666666666667,
+      "grad_norm": 3.172325849533081,
+      "learning_rate": 3.88466456257749e-05,
+      "loss": 0.4115,
+      "step": 5600
+    },
+    {
+      "epoch": 3.0,
+      "eval_accuracy": 0.8584,
+      "eval_loss": 0.4905270040035248,
+      "eval_macro_f1": 0.7745073192895859,
+      "eval_runtime": 437.609,
+      "eval_samples_per_second": 22.851,
+      "eval_steps_per_second": 0.715,
+      "step": 5625
+    }
+  ],
+  "logging_steps": 100,
+  "max_steps": 9375,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 5,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 1.1810704520573338e+17,
+  "train_batch_size": 16,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-5625/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:18be2ea0eb3d9400c09b51e632353826ed47ff8ae70a076cad1cc0cbac400c7d
+size 5176

checkpoint-5625/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-7500/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: roberta-large
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

checkpoint-7500/adapter_config.json ADDED Viewed

	@@ -0,0 +1,36 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "roberta-large",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": [
+    "classifier",
+    "classifier",
+    "score"
+  ],
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "key",
+    "value",
+    "output.dense",
+    "intermediate.dense",
+    "query"
+  ],
+  "task_type": "SEQ_CLS",
+  "use_dora": false,
+  "use_rslora": false
+}

checkpoint-7500/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:5fc2e774718d55d509855a365d7d94c4bed6d440d662af251fd888b94eb57536
+size 32962328

checkpoint-7500/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

checkpoint-7500/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:7417605e76ae019351fa1e3fdd3263753f06e376b82f0a502d46a3313fee1bb4
+size 66085050

checkpoint-7500/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ce2afd1de648f54b1eca8e74924789c771766a676748743853034ceaa3304949
+size 14244

checkpoint-7500/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4b964f7caaad143747db848e702280ab628449e59e42f9ec880b68c45ccc30fe
+size 1064

checkpoint-7500/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,15 @@

+{
+  "bos_token": "<s>",
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "unk_token": "<unk>"
+}