Instructions to use jjee2/lora_recycle with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jjee2/lora_recycle with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="jjee2/lora_recycle",
	filename="Aratron1811__llama-3.1-8B-Instruct-abliterated-comrade/Meta-Llama-3.1-8B-Instruct-abliterated-TQ2_0.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use jjee2/lora_recycle with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jjee2/lora_recycle:TQ2_0
# Run inference directly in the terminal:
llama cli -hf jjee2/lora_recycle:TQ2_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf jjee2/lora_recycle:TQ2_0
# Run inference directly in the terminal:
llama cli -hf jjee2/lora_recycle:TQ2_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf jjee2/lora_recycle:TQ2_0
# Run inference directly in the terminal:
./llama-cli -hf jjee2/lora_recycle:TQ2_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf jjee2/lora_recycle:TQ2_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf jjee2/lora_recycle:TQ2_0

Use Docker

docker model run hf.co/jjee2/lora_recycle:TQ2_0

LM Studio
Jan
Ollama
How to use jjee2/lora_recycle with Ollama:
```
ollama run hf.co/jjee2/lora_recycle:TQ2_0
```

Unsloth Studio

How to use jjee2/lora_recycle with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jjee2/lora_recycle to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for jjee2/lora_recycle to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for jjee2/lora_recycle to start chatting

Atomic Chat new
Docker Model Runner
How to use jjee2/lora_recycle with Docker Model Runner:
```
docker model run hf.co/jjee2/lora_recycle:TQ2_0
```

Lemonade

How to use jjee2/lora_recycle with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull jjee2/lora_recycle:TQ2_0

Run and chat with the model

lemonade run user.lora_recycle-TQ2_0

List all available models

lemonade list

jjee2 commited on Apr 14

Commit

5072c3d

verified ·

1 Parent(s): a305724

Add andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora

Browse files

Files changed (22) hide show

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/.gitattributes +35 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/README.md +202 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/adapter_config.json +30 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/adapter_model.safetensors +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/README.md +202 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/adapter_config.json +30 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/adapter_model.safetensors +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/optimizer.pt +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/rng_state.pth +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/scheduler.pt +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/trainer_state.json +625 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/training_args.bin +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/README.md +202 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/adapter_config.json +30 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/adapter_model.safetensors +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/optimizer.pt +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/rng_state.pth +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/scheduler.pt +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/trainer_state.json +370 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/training_args.bin +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/runs/Dec21_10-36-12_cyrus-the-great/events.out.tfevents.1734777373.cyrus-the-great.830823.0 +3 -0
andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/training_configs.yml +57 -0

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: meta-llama/Llama-3.1-8B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/adapter_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.1-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e6c9144091eac8c846e7657c1d75a3b8b42e8f5b9267eb8c24bad6616ffb02c9
+size 37774528

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: meta-llama/Llama-3.1-8B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/adapter_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.1-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b974d0b8f83858d24a48d340d42fc98e34c41acb75006df93ec3e34c12776e5c
+size 37774528

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8885734f0a99ce3139abfe912b6eca07ec7829a85ae269cc5898c2cb584c270e
+size 75659194

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e46fbebd5283a63bf601bb3552b1a4433ba62ce59c77f648dcde8943b7c8012a
+size 14244

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0ace79a87daba1524d7711a9cb3d2781690219d692d452f2ceb67233aa8946a4
+size 1064

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/trainer_state.json ADDED Viewed

	@@ -0,0 +1,625 @@

+{
+  "best_metric": 0.07156345248222351,
+  "best_model_checkpoint": "./saved_models/LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65",
+  "epoch": 11.682539682539682,
+  "eval_steps": 5,
+  "global_step": 115,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.20317460317460317,
+      "grad_norm": 7.208749294281006,
+      "learning_rate": 5.9999999999999995e-05,
+      "loss": 0.3688,
+      "step": 2
+    },
+    {
+      "epoch": 0.40634920634920635,
+      "grad_norm": 4.599120140075684,
+      "learning_rate": 0.00011999999999999999,
+      "loss": 0.2897,
+      "step": 4
+    },
+    {
+      "epoch": 0.5079365079365079,
+      "eval_loss": 0.1339753419160843,
+      "eval_runtime": 10.3523,
+      "eval_samples_per_second": 8.694,
+      "eval_steps_per_second": 8.694,
+      "step": 5
+    },
+    {
+      "epoch": 0.6095238095238096,
+      "grad_norm": 1.3026432991027832,
+      "learning_rate": 0.00017999999999999998,
+      "loss": 0.1523,
+      "step": 6
+    },
+    {
+      "epoch": 0.8126984126984127,
+      "grad_norm": 1.4605127573013306,
+      "learning_rate": 0.00023999999999999998,
+      "loss": 0.0937,
+      "step": 8
+    },
+    {
+      "epoch": 1.0158730158730158,
+      "grad_norm": 1.1947855949401855,
+      "learning_rate": 0.0003,
+      "loss": 0.1145,
+      "step": 10
+    },
+    {
+      "epoch": 1.0158730158730158,
+      "eval_loss": 0.1226421669125557,
+      "eval_runtime": 10.4454,
+      "eval_samples_per_second": 8.616,
+      "eval_steps_per_second": 8.616,
+      "step": 10
+    },
+    {
+      "epoch": 1.2190476190476192,
+      "grad_norm": 1.8900386095046997,
+      "learning_rate": 0.00029981054349090264,
+      "loss": 0.1028,
+      "step": 12
+    },
+    {
+      "epoch": 1.4222222222222223,
+      "grad_norm": 0.3694791793823242,
+      "learning_rate": 0.000299242652547195,
+      "loss": 0.0924,
+      "step": 14
+    },
+    {
+      "epoch": 1.5238095238095237,
+      "eval_loss": 0.09840905666351318,
+      "eval_runtime": 10.4922,
+      "eval_samples_per_second": 8.578,
+      "eval_steps_per_second": 8.578,
+      "step": 15
+    },
+    {
+      "epoch": 1.6253968253968254,
+      "grad_norm": 0.9269917011260986,
+      "learning_rate": 0.0002982977617106871,
+      "loss": 0.1039,
+      "step": 16
+    },
+    {
+      "epoch": 1.8285714285714287,
+      "grad_norm": 0.37482526898384094,
+      "learning_rate": 0.000296978257857637,
+      "loss": 0.0763,
+      "step": 18
+    },
+    {
+      "epoch": 2.0317460317460316,
+      "grad_norm": 0.5262147784233093,
+      "learning_rate": 0.00029528747416929463,
+      "loss": 0.1047,
+      "step": 20
+    },
+    {
+      "epoch": 2.0317460317460316,
+      "eval_loss": 0.08831651508808136,
+      "eval_runtime": 10.4923,
+      "eval_samples_per_second": 8.578,
+      "eval_steps_per_second": 8.578,
+      "step": 20
+    },
+    {
+      "epoch": 2.234920634920635,
+      "grad_norm": 0.5189316868782043,
+      "learning_rate": 0.0002932296817119964,
+      "loss": 0.0895,
+      "step": 22
+    },
+    {
+      "epoch": 2.4380952380952383,
+      "grad_norm": 0.5379623174667358,
+      "learning_rate": 0.0002908100786480811,
+      "loss": 0.1086,
+      "step": 24
+    },
+    {
+      "epoch": 2.5396825396825395,
+      "eval_loss": 0.10184311121702194,
+      "eval_runtime": 10.5079,
+      "eval_samples_per_second": 8.565,
+      "eval_steps_per_second": 8.565,
+      "step": 25
+    },
+    {
+      "epoch": 2.641269841269841,
+      "grad_norm": 0.4908989369869232,
+      "learning_rate": 0.00028803477710488055,
+      "loss": 0.0791,
+      "step": 26
+    },
+    {
+      "epoch": 2.8444444444444446,
+      "grad_norm": 1.1072629690170288,
+      "learning_rate": 0.00028491078773495564,
+      "loss": 0.0737,
+      "step": 28
+    },
+    {
+      "epoch": 3.0476190476190474,
+      "grad_norm": 0.27767929434776306,
+      "learning_rate": 0.0002814460020065795,
+      "loss": 0.0701,
+      "step": 30
+    },
+    {
+      "epoch": 3.0476190476190474,
+      "eval_loss": 0.0876610055565834,
+      "eval_runtime": 10.4961,
+      "eval_samples_per_second": 8.575,
+      "eval_steps_per_second": 8.575,
+      "step": 30
+    },
+    {
+      "epoch": 3.250793650793651,
+      "grad_norm": 0.7897810935974121,
+      "learning_rate": 0.00027764917226920377,
+      "loss": 0.0871,
+      "step": 32
+    },
+    {
+      "epoch": 3.453968253968254,
+      "grad_norm": 0.40734362602233887,
+      "learning_rate": 0.0002735298896442641,
+      "loss": 0.0872,
+      "step": 34
+    },
+    {
+      "epoch": 3.5555555555555554,
+      "eval_loss": 0.08736297488212585,
+      "eval_runtime": 10.487,
+      "eval_samples_per_second": 8.582,
+      "eval_steps_per_second": 8.582,
+      "step": 35
+    },
+    {
+      "epoch": 3.657142857142857,
+      "grad_norm": 0.4120597839355469,
+      "learning_rate": 0.0002690985597971753,
+      "loss": 0.0723,
+      "step": 36
+    },
+    {
+      "epoch": 3.8603174603174604,
+      "grad_norm": 0.40842726826667786,
+      "learning_rate": 0.0002643663766517172,
+      "loss": 0.0797,
+      "step": 38
+    },
+    {
+      "epoch": 4.063492063492063,
+      "grad_norm": 1.359442949295044,
+      "learning_rate": 0.0002593452941132117,
+      "loss": 0.0954,
+      "step": 40
+    },
+    {
+      "epoch": 4.063492063492063,
+      "eval_loss": 0.08213387429714203,
+      "eval_runtime": 10.4786,
+      "eval_samples_per_second": 8.589,
+      "eval_steps_per_second": 8.589,
+      "step": 40
+    },
+    {
+      "epoch": 4.266666666666667,
+      "grad_norm": 0.5902419686317444,
+      "learning_rate": 0.0002540479958719207,
+      "loss": 0.0782,
+      "step": 42
+    },
+    {
+      "epoch": 4.46984126984127,
+      "grad_norm": 0.6726852059364319,
+      "learning_rate": 0.00024848786336294346,
+      "loss": 0.0542,
+      "step": 44
+    },
+    {
+      "epoch": 4.571428571428571,
+      "eval_loss": 0.08238276094198227,
+      "eval_runtime": 10.4811,
+      "eval_samples_per_second": 8.587,
+      "eval_steps_per_second": 8.587,
+      "step": 45
+    },
+    {
+      "epoch": 4.673015873015873,
+      "grad_norm": 0.4726838171482086,
+      "learning_rate": 0.00024267894196355015,
+      "loss": 0.0862,
+      "step": 46
+    },
+    {
+      "epoch": 4.876190476190477,
+      "grad_norm": 0.45040690898895264,
+      "learning_rate": 0.0002366359055133401,
+      "loss": 0.0938,
+      "step": 48
+    },
+    {
+      "epoch": 5.079365079365079,
+      "grad_norm": 0.5370205044746399,
+      "learning_rate": 0.00023037401924684946,
+      "loss": 0.0735,
+      "step": 50
+    },
+    {
+      "epoch": 5.079365079365079,
+      "eval_loss": 0.09416916221380234,
+      "eval_runtime": 10.4765,
+      "eval_samples_per_second": 8.591,
+      "eval_steps_per_second": 8.591,
+      "step": 50
+    },
+    {
+      "epoch": 5.282539682539682,
+      "grad_norm": 0.3069893717765808,
+      "learning_rate": 0.00022390910123224373,
+      "loss": 0.0629,
+      "step": 52
+    },
+    {
+      "epoch": 5.485714285714286,
+      "grad_norm": 0.5242905616760254,
+      "learning_rate": 0.00021725748241350486,
+      "loss": 0.069,
+      "step": 54
+    },
+    {
+      "epoch": 5.587301587301587,
+      "eval_loss": 0.08052200078964233,
+      "eval_runtime": 10.492,
+      "eval_samples_per_second": 8.578,
+      "eval_steps_per_second": 8.578,
+      "step": 55
+    },
+    {
+      "epoch": 5.688888888888889,
+      "grad_norm": 0.9684988260269165,
+      "learning_rate": 0.0002104359653570494,
+      "loss": 0.0802,
+      "step": 56
+    },
+    {
+      "epoch": 5.8920634920634924,
+      "grad_norm": 0.8387410044670105,
+      "learning_rate": 0.00020346178180698758,
+      "loss": 0.07,
+      "step": 58
+    },
+    {
+      "epoch": 6.095238095238095,
+      "grad_norm": 0.6904064416885376,
+      "learning_rate": 0.0001963525491562421,
+      "loss": 0.0594,
+      "step": 60
+    },
+    {
+      "epoch": 6.095238095238095,
+      "eval_loss": 0.07975321263074875,
+      "eval_runtime": 10.4752,
+      "eval_samples_per_second": 8.592,
+      "eval_steps_per_second": 8.592,
+      "step": 60
+    },
+    {
+      "epoch": 6.298412698412698,
+      "grad_norm": 0.5374441742897034,
+      "learning_rate": 0.00018912622594348454,
+      "loss": 0.0753,
+      "step": 62
+    },
+    {
+      "epoch": 6.501587301587302,
+      "grad_norm": 0.6191171407699585,
+      "learning_rate": 0.0001818010664883082,
+      "loss": 0.0368,
+      "step": 64
+    },
+    {
+      "epoch": 6.603174603174603,
+      "eval_loss": 0.07156345248222351,
+      "eval_runtime": 10.4706,
+      "eval_samples_per_second": 8.596,
+      "eval_steps_per_second": 8.596,
+      "step": 65
+    },
+    {
+      "epoch": 6.704761904761905,
+      "grad_norm": 0.30844610929489136,
+      "learning_rate": 0.00017439557477923255,
+      "loss": 0.0457,
+      "step": 66
+    },
+    {
+      "epoch": 6.907936507936508,
+      "grad_norm": 0.43306764960289,
+      "learning_rate": 0.00016692845773102222,
+      "loss": 0.0669,
+      "step": 68
+    },
+    {
+      "epoch": 7.111111111111111,
+      "grad_norm": 0.5746404528617859,
+      "learning_rate": 0.000159418577929397,
+      "loss": 0.054,
+      "step": 70
+    },
+    {
+      "epoch": 7.111111111111111,
+      "eval_loss": 0.07338189333677292,
+      "eval_runtime": 10.4809,
+      "eval_samples_per_second": 8.587,
+      "eval_steps_per_second": 8.587,
+      "step": 70
+    },
+    {
+      "epoch": 7.314285714285714,
+      "grad_norm": 0.323098361492157,
+      "learning_rate": 0.00015188490598250288,
+      "loss": 0.0311,
+      "step": 72
+    },
+    {
+      "epoch": 7.517460317460317,
+      "grad_norm": 0.44577065110206604,
+      "learning_rate": 0.0001443464725995098,
+      "loss": 0.0531,
+      "step": 74
+    },
+    {
+      "epoch": 7.619047619047619,
+      "eval_loss": 0.08279137313365936,
+      "eval_runtime": 10.4694,
+      "eval_samples_per_second": 8.597,
+      "eval_steps_per_second": 8.597,
+      "step": 75
+    },
+    {
+      "epoch": 7.720634920634921,
+      "grad_norm": 0.8153820037841797,
+      "learning_rate": 0.00013682232051738852,
+      "loss": 0.055,
+      "step": 76
+    },
+    {
+      "epoch": 7.923809523809524,
+      "grad_norm": 0.9654045701026917,
+      "learning_rate": 0.00012933145639730428,
+      "loss": 0.0459,
+      "step": 78
+    },
+    {
+      "epoch": 8.126984126984127,
+      "grad_norm": 0.902458131313324,
+      "learning_rate": 0.00012189280281214126,
+      "loss": 0.0432,
+      "step": 80
+    },
+    {
+      "epoch": 8.126984126984127,
+      "eval_loss": 0.1108090803027153,
+      "eval_runtime": 10.4735,
+      "eval_samples_per_second": 8.593,
+      "eval_steps_per_second": 8.593,
+      "step": 80
+    },
+    {
+      "epoch": 8.33015873015873,
+      "grad_norm": 1.053289771080017,
+      "learning_rate": 0.00011452515044644132,
+      "loss": 0.0349,
+      "step": 82
+    },
+    {
+      "epoch": 8.533333333333333,
+      "grad_norm": 0.5674147605895996,
+      "learning_rate": 0.00010724711062950358,
+      "loss": 0.0382,
+      "step": 84
+    },
+    {
+      "epoch": 8.634920634920634,
+      "eval_loss": 0.0874641165137291,
+      "eval_runtime": 10.4738,
+      "eval_samples_per_second": 8.593,
+      "eval_steps_per_second": 8.593,
+      "step": 85
+    },
+    {
+      "epoch": 8.736507936507937,
+      "grad_norm": 0.8463736772537231,
+      "learning_rate": 0.00010007706832155201,
+      "loss": 0.023,
+      "step": 86
+    },
+    {
+      "epoch": 8.93968253968254,
+      "grad_norm": 0.8869884014129639,
+      "learning_rate": 9.303313567172985e-05,
+      "loss": 0.0575,
+      "step": 88
+    },
+    {
+      "epoch": 9.142857142857142,
+      "grad_norm": 1.5545995235443115,
+      "learning_rate": 8.613310626523909e-05,
+      "loss": 0.0549,
+      "step": 90
+    },
+    {
+      "epoch": 9.142857142857142,
+      "eval_loss": 0.1318901926279068,
+      "eval_runtime": 10.4742,
+      "eval_samples_per_second": 8.593,
+      "eval_steps_per_second": 8.593,
+      "step": 90
+    },
+    {
+      "epoch": 9.346031746031747,
+      "grad_norm": 0.5731764435768127,
+      "learning_rate": 7.939441017520011e-05,
+      "loss": 0.0233,
+      "step": 92
+    },
+    {
+      "epoch": 9.549206349206349,
+      "grad_norm": 1.0911638736724854,
+      "learning_rate": 7.283406993277401e-05,
+      "loss": 0.0316,
+      "step": 94
+    },
+    {
+      "epoch": 9.65079365079365,
+      "eval_loss": 0.11357051134109497,
+      "eval_runtime": 10.4661,
+      "eval_samples_per_second": 8.599,
+      "eval_steps_per_second": 8.599,
+      "step": 95
+    },
+    {
+      "epoch": 9.752380952380953,
+      "grad_norm": 1.1953704357147217,
+      "learning_rate": 6.646865752677186e-05,
+      "loss": 0.0268,
+      "step": 96
+    },
+    {
+      "epoch": 9.955555555555556,
+      "grad_norm": 0.7195191979408264,
+      "learning_rate": 6.031425254137222e-05,
+      "loss": 0.0232,
+      "step": 98
+    },
+    {
+      "epoch": 10.158730158730158,
+      "grad_norm": 0.2522680461406708,
+      "learning_rate": 5.4386401537696536e-05,
+      "loss": 0.0231,
+      "step": 100
+    },
+    {
+      "epoch": 10.158730158730158,
+      "eval_loss": 0.08241196721792221,
+      "eval_runtime": 10.4831,
+      "eval_samples_per_second": 8.585,
+      "eval_steps_per_second": 8.585,
+      "step": 100
+    },
+    {
+      "epoch": 10.361904761904762,
+      "grad_norm": 0.596468448638916,
+      "learning_rate": 4.8700078781846326e-05,
+      "loss": 0.0299,
+      "step": 102
+    },
+    {
+      "epoch": 10.565079365079365,
+      "grad_norm": 0.8602021336555481,
+      "learning_rate": 4.3269648418607194e-05,
+      "loss": 0.0214,
+      "step": 104
+    },
+    {
+      "epoch": 10.666666666666666,
+      "eval_loss": 0.11145415902137756,
+      "eval_runtime": 10.4699,
+      "eval_samples_per_second": 8.596,
+      "eval_steps_per_second": 8.596,
+      "step": 105
+    },
+    {
+      "epoch": 10.768253968253969,
+      "grad_norm": 0.7947582006454468,
+      "learning_rate": 3.810882818637268e-05,
+      "loss": 0.0365,
+      "step": 106
+    },
+    {
+      "epoch": 10.971428571428572,
+      "grad_norm": 1.117112636566162,
+      "learning_rate": 3.32306547649465e-05,
+      "loss": 0.0131,
+      "step": 108
+    },
+    {
+      "epoch": 11.174603174603174,
+      "grad_norm": 0.6238898634910583,
+      "learning_rate": 2.8647450843757897e-05,
+      "loss": 0.0199,
+      "step": 110
+    },
+    {
+      "epoch": 11.174603174603174,
+      "eval_loss": 0.13976602256298065,
+      "eval_runtime": 10.51,
+      "eval_samples_per_second": 8.563,
+      "eval_steps_per_second": 8.563,
+      "step": 110
+    },
+    {
+      "epoch": 11.377777777777778,
+      "grad_norm": 0.3857702612876892,
+      "learning_rate": 2.437079399367875e-05,
+      "loss": 0.0144,
+      "step": 112
+    },
+    {
+      "epoch": 11.58095238095238,
+      "grad_norm": 1.5110450983047485,
+      "learning_rate": 2.041148742107471e-05,
+      "loss": 0.0372,
+      "step": 114
+    },
+    {
+      "epoch": 11.682539682539682,
+      "eval_loss": 0.11623632907867432,
+      "eval_runtime": 10.4951,
+      "eval_samples_per_second": 8.575,
+      "eval_steps_per_second": 8.575,
+      "step": 115
+    }
+  ],
+  "logging_steps": 2,
+  "max_steps": 135,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 15,
+  "save_steps": 5,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 10,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 8.421050769034445e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-115/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44336846e461e90cba2ddce4a840d239ded25ff1e71223de9f019653cc19296c
+size 5304

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: meta-llama/Llama-3.1-8B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.12.0

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/adapter_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "meta-llama/Llama-3.1-8B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "v_proj",
+    "q_proj",
+    "k_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e6c9144091eac8c846e7657c1d75a3b8b42e8f5b9267eb8c24bad6616ffb02c9
+size 37774528

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:099b4d495473b7bac31f1c89253d2eb42b1dd40cabc15e658eb099bc09983e48
+size 75659194

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bf97a7ce5aed942ba1a1f49657839f3f8c1d0c0a0a50b5f2414ef1de799e744f
+size 14244

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a81145950bcb4f63a1bb64e73b8215484699f329cdfc0e19b858c9767ce35513
+size 1064

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/trainer_state.json ADDED Viewed

	@@ -0,0 +1,370 @@

+{
+  "best_metric": 0.07156345248222351,
+  "best_model_checkpoint": "./saved_models/LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65",
+  "epoch": 6.603174603174603,
+  "eval_steps": 5,
+  "global_step": 65,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.20317460317460317,
+      "grad_norm": 7.208749294281006,
+      "learning_rate": 5.9999999999999995e-05,
+      "loss": 0.3688,
+      "step": 2
+    },
+    {
+      "epoch": 0.40634920634920635,
+      "grad_norm": 4.599120140075684,
+      "learning_rate": 0.00011999999999999999,
+      "loss": 0.2897,
+      "step": 4
+    },
+    {
+      "epoch": 0.5079365079365079,
+      "eval_loss": 0.1339753419160843,
+      "eval_runtime": 10.3523,
+      "eval_samples_per_second": 8.694,
+      "eval_steps_per_second": 8.694,
+      "step": 5
+    },
+    {
+      "epoch": 0.6095238095238096,
+      "grad_norm": 1.3026432991027832,
+      "learning_rate": 0.00017999999999999998,
+      "loss": 0.1523,
+      "step": 6
+    },
+    {
+      "epoch": 0.8126984126984127,
+      "grad_norm": 1.4605127573013306,
+      "learning_rate": 0.00023999999999999998,
+      "loss": 0.0937,
+      "step": 8
+    },
+    {
+      "epoch": 1.0158730158730158,
+      "grad_norm": 1.1947855949401855,
+      "learning_rate": 0.0003,
+      "loss": 0.1145,
+      "step": 10
+    },
+    {
+      "epoch": 1.0158730158730158,
+      "eval_loss": 0.1226421669125557,
+      "eval_runtime": 10.4454,
+      "eval_samples_per_second": 8.616,
+      "eval_steps_per_second": 8.616,
+      "step": 10
+    },
+    {
+      "epoch": 1.2190476190476192,
+      "grad_norm": 1.8900386095046997,
+      "learning_rate": 0.00029981054349090264,
+      "loss": 0.1028,
+      "step": 12
+    },
+    {
+      "epoch": 1.4222222222222223,
+      "grad_norm": 0.3694791793823242,
+      "learning_rate": 0.000299242652547195,
+      "loss": 0.0924,
+      "step": 14
+    },
+    {
+      "epoch": 1.5238095238095237,
+      "eval_loss": 0.09840905666351318,
+      "eval_runtime": 10.4922,
+      "eval_samples_per_second": 8.578,
+      "eval_steps_per_second": 8.578,
+      "step": 15
+    },
+    {
+      "epoch": 1.6253968253968254,
+      "grad_norm": 0.9269917011260986,
+      "learning_rate": 0.0002982977617106871,
+      "loss": 0.1039,
+      "step": 16
+    },
+    {
+      "epoch": 1.8285714285714287,
+      "grad_norm": 0.37482526898384094,
+      "learning_rate": 0.000296978257857637,
+      "loss": 0.0763,
+      "step": 18
+    },
+    {
+      "epoch": 2.0317460317460316,
+      "grad_norm": 0.5262147784233093,
+      "learning_rate": 0.00029528747416929463,
+      "loss": 0.1047,
+      "step": 20
+    },
+    {
+      "epoch": 2.0317460317460316,
+      "eval_loss": 0.08831651508808136,
+      "eval_runtime": 10.4923,
+      "eval_samples_per_second": 8.578,
+      "eval_steps_per_second": 8.578,
+      "step": 20
+    },
+    {
+      "epoch": 2.234920634920635,
+      "grad_norm": 0.5189316868782043,
+      "learning_rate": 0.0002932296817119964,
+      "loss": 0.0895,
+      "step": 22
+    },
+    {
+      "epoch": 2.4380952380952383,
+      "grad_norm": 0.5379623174667358,
+      "learning_rate": 0.0002908100786480811,
+      "loss": 0.1086,
+      "step": 24
+    },
+    {
+      "epoch": 2.5396825396825395,
+      "eval_loss": 0.10184311121702194,
+      "eval_runtime": 10.5079,
+      "eval_samples_per_second": 8.565,
+      "eval_steps_per_second": 8.565,
+      "step": 25
+    },
+    {
+      "epoch": 2.641269841269841,
+      "grad_norm": 0.4908989369869232,
+      "learning_rate": 0.00028803477710488055,
+      "loss": 0.0791,
+      "step": 26
+    },
+    {
+      "epoch": 2.8444444444444446,
+      "grad_norm": 1.1072629690170288,
+      "learning_rate": 0.00028491078773495564,
+      "loss": 0.0737,
+      "step": 28
+    },
+    {
+      "epoch": 3.0476190476190474,
+      "grad_norm": 0.27767929434776306,
+      "learning_rate": 0.0002814460020065795,
+      "loss": 0.0701,
+      "step": 30
+    },
+    {
+      "epoch": 3.0476190476190474,
+      "eval_loss": 0.0876610055565834,
+      "eval_runtime": 10.4961,
+      "eval_samples_per_second": 8.575,
+      "eval_steps_per_second": 8.575,
+      "step": 30
+    },
+    {
+      "epoch": 3.250793650793651,
+      "grad_norm": 0.7897810935974121,
+      "learning_rate": 0.00027764917226920377,
+      "loss": 0.0871,
+      "step": 32
+    },
+    {
+      "epoch": 3.453968253968254,
+      "grad_norm": 0.40734362602233887,
+      "learning_rate": 0.0002735298896442641,
+      "loss": 0.0872,
+      "step": 34
+    },
+    {
+      "epoch": 3.5555555555555554,
+      "eval_loss": 0.08736297488212585,
+      "eval_runtime": 10.487,
+      "eval_samples_per_second": 8.582,
+      "eval_steps_per_second": 8.582,
+      "step": 35
+    },
+    {
+      "epoch": 3.657142857142857,
+      "grad_norm": 0.4120597839355469,
+      "learning_rate": 0.0002690985597971753,
+      "loss": 0.0723,
+      "step": 36
+    },
+    {
+      "epoch": 3.8603174603174604,
+      "grad_norm": 0.40842726826667786,
+      "learning_rate": 0.0002643663766517172,
+      "loss": 0.0797,
+      "step": 38
+    },
+    {
+      "epoch": 4.063492063492063,
+      "grad_norm": 1.359442949295044,
+      "learning_rate": 0.0002593452941132117,
+      "loss": 0.0954,
+      "step": 40
+    },
+    {
+      "epoch": 4.063492063492063,
+      "eval_loss": 0.08213387429714203,
+      "eval_runtime": 10.4786,
+      "eval_samples_per_second": 8.589,
+      "eval_steps_per_second": 8.589,
+      "step": 40
+    },
+    {
+      "epoch": 4.266666666666667,
+      "grad_norm": 0.5902419686317444,
+      "learning_rate": 0.0002540479958719207,
+      "loss": 0.0782,
+      "step": 42
+    },
+    {
+      "epoch": 4.46984126984127,
+      "grad_norm": 0.6726852059364319,
+      "learning_rate": 0.00024848786336294346,
+      "loss": 0.0542,
+      "step": 44
+    },
+    {
+      "epoch": 4.571428571428571,
+      "eval_loss": 0.08238276094198227,
+      "eval_runtime": 10.4811,
+      "eval_samples_per_second": 8.587,
+      "eval_steps_per_second": 8.587,
+      "step": 45
+    },
+    {
+      "epoch": 4.673015873015873,
+      "grad_norm": 0.4726838171482086,
+      "learning_rate": 0.00024267894196355015,
+      "loss": 0.0862,
+      "step": 46
+    },
+    {
+      "epoch": 4.876190476190477,
+      "grad_norm": 0.45040690898895264,
+      "learning_rate": 0.0002366359055133401,
+      "loss": 0.0938,
+      "step": 48
+    },
+    {
+      "epoch": 5.079365079365079,
+      "grad_norm": 0.5370205044746399,
+      "learning_rate": 0.00023037401924684946,
+      "loss": 0.0735,
+      "step": 50
+    },
+    {
+      "epoch": 5.079365079365079,
+      "eval_loss": 0.09416916221380234,
+      "eval_runtime": 10.4765,
+      "eval_samples_per_second": 8.591,
+      "eval_steps_per_second": 8.591,
+      "step": 50
+    },
+    {
+      "epoch": 5.282539682539682,
+      "grad_norm": 0.3069893717765808,
+      "learning_rate": 0.00022390910123224373,
+      "loss": 0.0629,
+      "step": 52
+    },
+    {
+      "epoch": 5.485714285714286,
+      "grad_norm": 0.5242905616760254,
+      "learning_rate": 0.00021725748241350486,
+      "loss": 0.069,
+      "step": 54
+    },
+    {
+      "epoch": 5.587301587301587,
+      "eval_loss": 0.08052200078964233,
+      "eval_runtime": 10.492,
+      "eval_samples_per_second": 8.578,
+      "eval_steps_per_second": 8.578,
+      "step": 55
+    },
+    {
+      "epoch": 5.688888888888889,
+      "grad_norm": 0.9684988260269165,
+      "learning_rate": 0.0002104359653570494,
+      "loss": 0.0802,
+      "step": 56
+    },
+    {
+      "epoch": 5.8920634920634924,
+      "grad_norm": 0.8387410044670105,
+      "learning_rate": 0.00020346178180698758,
+      "loss": 0.07,
+      "step": 58
+    },
+    {
+      "epoch": 6.095238095238095,
+      "grad_norm": 0.6904064416885376,
+      "learning_rate": 0.0001963525491562421,
+      "loss": 0.0594,
+      "step": 60
+    },
+    {
+      "epoch": 6.095238095238095,
+      "eval_loss": 0.07975321263074875,
+      "eval_runtime": 10.4752,
+      "eval_samples_per_second": 8.592,
+      "eval_steps_per_second": 8.592,
+      "step": 60
+    },
+    {
+      "epoch": 6.298412698412698,
+      "grad_norm": 0.5374441742897034,
+      "learning_rate": 0.00018912622594348454,
+      "loss": 0.0753,
+      "step": 62
+    },
+    {
+      "epoch": 6.501587301587302,
+      "grad_norm": 0.6191171407699585,
+      "learning_rate": 0.0001818010664883082,
+      "loss": 0.0368,
+      "step": 64
+    },
+    {
+      "epoch": 6.603174603174603,
+      "eval_loss": 0.07156345248222351,
+      "eval_runtime": 10.4706,
+      "eval_samples_per_second": 8.596,
+      "eval_steps_per_second": 8.596,
+      "step": 65
+    }
+  ],
+  "logging_steps": 2,
+  "max_steps": 135,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 15,
+  "save_steps": 5,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 10,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.756807468469453e+16,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/checkpoint-65/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:44336846e461e90cba2ddce4a840d239ded25ff1e71223de9f019653cc19296c
+size 5304

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/runs/Dec21_10-36-12_cyrus-the-great/events.out.tfevents.1734777373.cyrus-the-great.830823.0 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a26b97b45d1e5f9cdee38e8583585cdbda0742f9b837622ff510c5acfe9d2457
+size 23652

andrewzamai__LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora/training_configs.yml ADDED Viewed

	@@ -0,0 +1,57 @@

+base_model: meta-llama/Llama-3.1-8B-Instruct
+#base_model: ProbeMedicalYonseiMAILab/medllama3-v20
+# NB: HF model may require to login with an account with granted access
+# dataset (instruction, input, output) columns
+prompt_template_name: diff_diagnosis_direct_pred
+system_message: You are an expert neurologist assistant specialized in neurodegenerative diseases.
+data_path: ./data/raw/CNADFTD/SFT/CNADFTD_ADNI2+NIFD_10_fold_CV_AN_gathered_equally_represented_0234/train.jsonl
+val_data_path: ./data/raw/CNADFTD/SFT/CNADFTD_ADNI2+NIFD_10_fold_CV_AN_gathered_equally_represented_0234/validation.jsonl
+select_train_portion: -1
+val_set_size: -1  # if -1 use all validation data
+output_dir: ./saved_models/LLaMA_3_8B_CNADFTD_3_classes_equally_represented_0234_lora
+early_stopping_patience: 10
+#training hyperparams
+batch_size: 32
+micro_batch_size: 1
+num_epochs: 15
+learning_rate: 3.0e-4
+cutoff_len: 4096
+warmup_steps: 10
+eval_steps: 5
+logging_steps: 2
+max_grad_norm: 1.0
+#lora hyperparams
+use_lora: True
+lora_alpha: 32
+lora_dropout: 0.1
+lora_r: 16
+lora_target_modules:
+  #- o_proj
+  #- down_proj
+  #- up_proj
+  #- gate_proj
+- q_proj
+- v_proj
+- k_proj
+# llm hyperparams
+# NTP loss only on Response
+train_on_inputs: False
+group_by_length: True
+#quant params
+load_8bit: False
+load_4bit: False
+#general param
+save_total_limit: 2
+use_flash_attention: True
+shuffle: True
+gradient_checkpointing: False