Javad Taghia commited on
Commit
fefe61a
·
1 Parent(s): 40fefce

some updates on the env

Browse files
Files changed (2) hide show
  1. README.md +8 -2
  2. train_tulu.py +9 -0
README.md CHANGED
@@ -27,12 +27,13 @@ Minimal setup to finetune a laptop-friendly Tulu checkpoint with QLoRA and track
27
  1) Create the env (Conda)
28
  ```bash
29
  conda env create -f environment.yml
30
- conda activate tulu-train
31
  ```
32
  2) Add secrets (keep `.env` out of git)
33
  ```bash
34
  cp .env.example .env
35
  # Edit .env with your WANDB_API_KEY / project / entity
 
36
  ```
37
  3) Verify packages (optional if you prefer pip)
38
  ```bash
@@ -59,8 +60,13 @@ Key flags:
59
  - Ensure `WANDB_API_KEY`, `WANDB_PROJECT`, and (optionally) `WANDB_ENTITY` are set in `.env`.
60
  - Each run captures hyperparameters and metrics; check the W&B UI for live loss curves and checkpoints.
61
 
 
 
 
 
62
  ## Output
63
- - Finetuned adapters + tokenizer are written to `outputs/tulu-lora` (configurable via `--output_dir`). Push this to the Hub with `huggingface-cli upload` if desired.
 
64
 
65
  ## Troubleshooting
66
  - OOM? Reduce `max_seq_length`, increase `gradient_accumulation_steps`, or switch to a smaller dataset.
 
27
  1) Create the env (Conda)
28
  ```bash
29
  conda env create -f environment.yml
30
+ conda activate deeai
31
  ```
32
  2) Add secrets (keep `.env` out of git)
33
  ```bash
34
  cp .env.example .env
35
  # Edit .env with your WANDB_API_KEY / project / entity
36
+ # Optionally set BASE_MODEL_CACHE to choose where HF downloads models
37
  ```
38
  3) Verify packages (optional if you prefer pip)
39
  ```bash
 
60
  - Ensure `WANDB_API_KEY`, `WANDB_PROJECT`, and (optionally) `WANDB_ENTITY` are set in `.env`.
61
  - Each run captures hyperparameters and metrics; check the W&B UI for live loss curves and checkpoints.
62
 
63
+ ## Model cache location
64
+ - Base model weights download to the Hugging Face cache. You can point downloads to an external directory by setting `BASE_MODEL_CACHE` in `.env` (e.g., `/Volumes/JTQ-s/______GITLAB____/downloaded_base_models`); the script maps this to `HF_HOME`/`TRANSFORMERS_CACHE` before loading models.
65
+ - If `BASE_MODEL_CACHE` is not set, the default HF cache is used (typically `~/.cache/huggingface/hub`).
66
+
67
  ## Output
68
+ - Finetuned adapters + tokenizer are written to `outputs/tulu-lora` (configurable via `--output_dir`).
69
+ - `outputs/` is tracked via Git LFS (`.gitattributes`), so weights can be committed and pushed to the Hub. Run `git lfs install` once, then `git add outputs/...` before committing.
70
 
71
  ## Troubleshooting
72
  - OOM? Reduce `max_seq_length`, increase `gradient_accumulation_steps`, or switch to a smaller dataset.
train_tulu.py CHANGED
@@ -128,8 +128,17 @@ def parse_args() -> ScriptConfig:
128
  return ScriptConfig(**vars(args))
129
 
130
 
 
 
 
 
 
 
 
 
131
  def main():
132
  load_dotenv()
 
133
  cfg = parse_args()
134
 
135
  init_wandb(cfg)
 
128
  return ScriptConfig(**vars(args))
129
 
130
 
131
+ def configure_cache_from_env():
132
+ """Allow user to redirect HF cache via BASE_MODEL_CACHE env."""
133
+ cache_dir = os.getenv("BASE_MODEL_CACHE")
134
+ if cache_dir:
135
+ os.environ.setdefault("HF_HOME", cache_dir)
136
+ os.environ.setdefault("TRANSFORMERS_CACHE", cache_dir)
137
+
138
+
139
  def main():
140
  load_dotenv()
141
+ configure_cache_from_env()
142
  cfg = parse_args()
143
 
144
  init_wandb(cfg)