Update README.md
Browse files
README.md
CHANGED
|
@@ -37,41 +37,6 @@ This version is best when you want broad embeddings capturing item relationships
|
|
| 37 |
|
| 38 |
---
|
| 39 |
|
| 40 |
-
## How to enable each version in code
|
| 41 |
-
|
| 42 |
-
The behavior is controlled by the masking probabilities used in the collator:
|
| 43 |
-
|
| 44 |
-
- `TRAIN_RANDOM_MLM_PROB`
|
| 45 |
-
- `EVAL_RANDOM_MLM_PROB`
|
| 46 |
-
|
| 47 |
-
And by whether you “force-mask last event items” (enabled in the forecasting collator logic).
|
| 48 |
-
|
| 49 |
-
### Recommended settings
|
| 50 |
-
|
| 51 |
-
#### Forecasting-only (Version A)
|
| 52 |
-
- Train: `TRAIN_RANDOM_MLM_PROB = 0.0` (no random MLM noise)
|
| 53 |
-
- Eval: `EVAL_RANDOM_MLM_PROB = 0.0`
|
| 54 |
-
- Force-masking last event `ITEM_*` stays **ON**
|
| 55 |
-
|
| 56 |
-
This focuses learning and evaluation on last-event item prediction.
|
| 57 |
-
|
| 58 |
-
#### Forecasting + regularization (Version A + random noise)
|
| 59 |
-
- Train: `TRAIN_RANDOM_MLM_PROB = 0.15`
|
| 60 |
-
- Eval: `EVAL_RANDOM_MLM_PROB = 0.0`
|
| 61 |
-
- Force-masking last event `ITEM_*` stays **ON**
|
| 62 |
-
|
| 63 |
-
This is the default “forecasting twist” setup: train with extra random MLM, evaluate cleanly on forecasting.
|
| 64 |
-
|
| 65 |
-
#### Random MLM across full sequence (Version B)
|
| 66 |
-
- Train: `TRAIN_RANDOM_MLM_PROB = 0.15`
|
| 67 |
-
- Eval: `EVAL_RANDOM_MLM_PROB = 0.15` (or any non-zero)
|
| 68 |
-
- (Optional) disable force-masking last-event items if you want *pure* standard MLM
|
| 69 |
-
|
| 70 |
-
> Note: In the current `ForecastingCollator`, force-masking last-event items is always applied.
|
| 71 |
-
> If you want **pure random MLM** (no forecasting), add a flag like `force_last_event=False` and skip the `prob[force_mask] = 1.0` step.
|
| 72 |
-
|
| 73 |
-
---
|
| 74 |
-
|
| 75 |
## What the forecasting masking means (in practice)
|
| 76 |
|
| 77 |
A packed firm sequence looks like:
|
|
@@ -109,7 +74,6 @@ For forecasting, **Item@K** is the main metric because it directly measures how
|
|
| 109 |
- **Acc@5:** `0.6651`
|
| 110 |
- **Acc@10:** `0.6944`
|
| 111 |
|
| 112 |
-
> “HARD” refers to the stricter evaluation setting used in our validation protocol (forecasting-focused metrics on masked targets).
|
| 113 |
|
| 114 |
---
|
| 115 |
|
|
|
|
| 37 |
|
| 38 |
---
|
| 39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
## What the forecasting masking means (in practice)
|
| 41 |
|
| 42 |
A packed firm sequence looks like:
|
|
|
|
| 74 |
- **Acc@5:** `0.6651`
|
| 75 |
- **Acc@10:** `0.6944`
|
| 76 |
|
|
|
|
| 77 |
|
| 78 |
---
|
| 79 |
|