Javad Taghia commited on
Commit ·
fefe61a
1
Parent(s): 40fefce
some updates on the env
Browse files- README.md +8 -2
- train_tulu.py +9 -0
README.md
CHANGED
|
@@ -27,12 +27,13 @@ Minimal setup to finetune a laptop-friendly Tulu checkpoint with QLoRA and track
|
|
| 27 |
1) Create the env (Conda)
|
| 28 |
```bash
|
| 29 |
conda env create -f environment.yml
|
| 30 |
-
conda activate
|
| 31 |
```
|
| 32 |
2) Add secrets (keep `.env` out of git)
|
| 33 |
```bash
|
| 34 |
cp .env.example .env
|
| 35 |
# Edit .env with your WANDB_API_KEY / project / entity
|
|
|
|
| 36 |
```
|
| 37 |
3) Verify packages (optional if you prefer pip)
|
| 38 |
```bash
|
|
@@ -59,8 +60,13 @@ Key flags:
|
|
| 59 |
- Ensure `WANDB_API_KEY`, `WANDB_PROJECT`, and (optionally) `WANDB_ENTITY` are set in `.env`.
|
| 60 |
- Each run captures hyperparameters and metrics; check the W&B UI for live loss curves and checkpoints.
|
| 61 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 62 |
## Output
|
| 63 |
-
- Finetuned adapters + tokenizer are written to `outputs/tulu-lora` (configurable via `--output_dir`).
|
|
|
|
| 64 |
|
| 65 |
## Troubleshooting
|
| 66 |
- OOM? Reduce `max_seq_length`, increase `gradient_accumulation_steps`, or switch to a smaller dataset.
|
|
|
|
| 27 |
1) Create the env (Conda)
|
| 28 |
```bash
|
| 29 |
conda env create -f environment.yml
|
| 30 |
+
conda activate deeai
|
| 31 |
```
|
| 32 |
2) Add secrets (keep `.env` out of git)
|
| 33 |
```bash
|
| 34 |
cp .env.example .env
|
| 35 |
# Edit .env with your WANDB_API_KEY / project / entity
|
| 36 |
+
# Optionally set BASE_MODEL_CACHE to choose where HF downloads models
|
| 37 |
```
|
| 38 |
3) Verify packages (optional if you prefer pip)
|
| 39 |
```bash
|
|
|
|
| 60 |
- Ensure `WANDB_API_KEY`, `WANDB_PROJECT`, and (optionally) `WANDB_ENTITY` are set in `.env`.
|
| 61 |
- Each run captures hyperparameters and metrics; check the W&B UI for live loss curves and checkpoints.
|
| 62 |
|
| 63 |
+
## Model cache location
|
| 64 |
+
- Base model weights download to the Hugging Face cache. You can point downloads to an external directory by setting `BASE_MODEL_CACHE` in `.env` (e.g., `/Volumes/JTQ-s/______GITLAB____/downloaded_base_models`); the script maps this to `HF_HOME`/`TRANSFORMERS_CACHE` before loading models.
|
| 65 |
+
- If `BASE_MODEL_CACHE` is not set, the default HF cache is used (typically `~/.cache/huggingface/hub`).
|
| 66 |
+
|
| 67 |
## Output
|
| 68 |
+
- Finetuned adapters + tokenizer are written to `outputs/tulu-lora` (configurable via `--output_dir`).
|
| 69 |
+
- `outputs/` is tracked via Git LFS (`.gitattributes`), so weights can be committed and pushed to the Hub. Run `git lfs install` once, then `git add outputs/...` before committing.
|
| 70 |
|
| 71 |
## Troubleshooting
|
| 72 |
- OOM? Reduce `max_seq_length`, increase `gradient_accumulation_steps`, or switch to a smaller dataset.
|
train_tulu.py
CHANGED
|
@@ -128,8 +128,17 @@ def parse_args() -> ScriptConfig:
|
|
| 128 |
return ScriptConfig(**vars(args))
|
| 129 |
|
| 130 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 131 |
def main():
|
| 132 |
load_dotenv()
|
|
|
|
| 133 |
cfg = parse_args()
|
| 134 |
|
| 135 |
init_wandb(cfg)
|
|
|
|
| 128 |
return ScriptConfig(**vars(args))
|
| 129 |
|
| 130 |
|
| 131 |
+
def configure_cache_from_env():
|
| 132 |
+
"""Allow user to redirect HF cache via BASE_MODEL_CACHE env."""
|
| 133 |
+
cache_dir = os.getenv("BASE_MODEL_CACHE")
|
| 134 |
+
if cache_dir:
|
| 135 |
+
os.environ.setdefault("HF_HOME", cache_dir)
|
| 136 |
+
os.environ.setdefault("TRANSFORMERS_CACHE", cache_dir)
|
| 137 |
+
|
| 138 |
+
|
| 139 |
def main():
|
| 140 |
load_dotenv()
|
| 141 |
+
configure_cache_from_env()
|
| 142 |
cfg = parse_args()
|
| 143 |
|
| 144 |
init_wandb(cfg)
|