qq1990's picture
Update README.md
392715d verified
---
base_model:
- unsloth/Meta-Llama-3.1-70B-Instruct
library_name: peft
datasets:
- ARM-Development/11k_Tabular
language:
- en
---
## Model Card for `sciencebase-metadata-llama3-70b` *(v 1.0)*
### Model Details
| Field | Value |
|-------|-------|
| **Developed by** | Quan Quy, Travis Ping, Tudor Garbulet, Chirag Shah, Austin Aguilar |
| **Contact** | quyqm@ornl.gov • pingts@ornl.gov • garbuletvt@ornl.gov • shahch@ornl.gov • aguilaral@ornl.gov |
| **Funded by** | U.S. Geological Survey (USGS) & Oak Ridge National Laboratory – ARM Data Center |
| **Model type** | Autoregressive LLM, instruction-tuned for *structured → metadata* generation |
| **Base model** | `meta-llama/Llama-3.1-70B-Instruct` |
| **Languages** | English |
| **Finetuned from** | `unsloth/Meta-Llama-3.1-70B-Instruct` |
### Model Description
Fine-tuned on ≈ 9 000 ScienceBase “data → metadata” pairs to automate creation of FGDC/ISO-style metadata records for scientific datasets.
### Model Sources
| Resource | Link |
|----------|------|
| **Repository** | <https://huggingface.co/ARM-Development/Llama-3.3-70B-tabular-1.0> |
| **Demo** | <https://colab.research.google.com/drive/1saCEFhkBYDhQWkdTwnwiE_-AiWmD6p0f#scrollTo=WeniLP-Ah1QL> |
---
## Uses
### Direct Use
Generate schema-compliant metadata text from a JSON/CSV representation of a ScienceBase item.
### Downstream Use
Integrate as a micro-service in data-repository pipelines.
### Out-of-Scope
Open-ended content generation, or any application outside metadata curation.
---
## Bias, Risks, and Limitations
* Domain-specific bias toward ScienceBase field names.
* Possible hallucination of fields when prompts are underspecified.
---
## Training Details
### Training Data
* ~9 k ScienceBase records with curated metadata.
### Training Procedure
| Hyper-parameter | Value |
|-----------------|-------|
| Max sequence length | 20 000 |
| Precision | fp16 / bf16 (auto) |
| Quantisation | 4-bit QLoRA (`load_in_4bit=True`) |
| LoRA rank / α | 16 / 16 |
| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Optimiser | `adamw_8bit` |
| LR / schedule | 2 × 10⁻⁴, linear |
| Epochs | 1 |
| Effective batch | 4 (1 GPU × grad-acc 4) |
| Trainer | `trl` SFTTrainer + `peft` 0.15.2 |
### Hardware & Runtime
| Field | Value |
|-------|-------|
| GPU | 1 × NVIDIA A100 80 GB |
| Total training hours | ~120 hours |
| Cloud/HPC provider | ARM Cumulus HPC |
### Software Stack
| Package | Version |
|---------|---------|
| Python | 3.12.9 |
| PyTorch | 2.6.0 + CUDA 12.4 |
| Transformers | 4.51.3 |
| Accelerate | 1.6.0 |
| PEFT | 0.15.2 |
| Unsloth | 2025.3.19 |
| BitsAndBytes | 0.45.5 |
| TRL | 0.15.2 |
| Xformers | 0.0.29.post3 |
| Datasets | 3.5.0 |
| … |
---
## Evaluation
*Evaluation still in progress.*
---
## Technical Specifications
### Architecture & Objective
QLoRA-tuned `Llama-3.1-70B-Instruct`; causal-LM objective with structured-to-text instruction prompts.
---
## Model Card Authors
Quan Quy, Travis Ping, Tudor Garbulet, Chirag Shah, Austin Aguilar
---