Update
Browse files
README.md
CHANGED
|
@@ -1,3 +1,97 @@
|
|
| 1 |
---
|
| 2 |
license: gpl-3.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: gpl-3.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
library_name: onnx
|
| 6 |
+
tags:
|
| 7 |
+
- audio
|
| 8 |
+
- music
|
| 9 |
+
- music-recommendation
|
| 10 |
+
- clap
|
| 11 |
+
- onnx
|
| 12 |
+
- on-device
|
| 13 |
+
- android
|
| 14 |
+
- mobile
|
| 15 |
+
pipeline_tag: feature-extraction
|
| 16 |
---
|
| 17 |
+
|
| 18 |
+
# LatentJam Models
|
| 19 |
+
|
| 20 |
+
ONNX model weights for [LatentJam](https://github.com/Nikita-sud/latentjam) β a privacy-first Android music player that recommends what to play next entirely on-device. These models live here because [`clap_audio.onnx`](./clap_audio.onnx) is 116 MB and exceeds GitHub's 100 MB per-file cap.
|
| 21 |
+
|
| 22 |
+
The Android app downloads these files at build time via [`scripts/download-models.sh`](https://github.com/Nikita-sud/latentjam/blob/main/scripts/download-models.sh) and bundles them into `app/src/main/assets/ml/`. Inference at runtime uses [ONNX Runtime](https://onnxruntime.ai/) with the [Qualcomm QNN execution provider](https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html) for Hexagon NPU offload on Snapdragon devices, falling back to CPU on everything else.
|
| 23 |
+
|
| 24 |
+
## Files
|
| 25 |
+
|
| 26 |
+
| File | Size | Role |
|
| 27 |
+
|---|---|---|
|
| 28 |
+
| [`clap_audio.onnx`](./clap_audio.onnx) | 116 MB | Audio encoder derived from CLAP. Consumes a 15 s mono PCM chunk at 48 kHz, produces a 512-d L2-normalized embedding per track. Runs once per track during library indexing, then the embedding is cached in the app's Room database. |
|
| 29 |
+
| [`predictor_state.onnx`](./predictor_state.onnx) | 32 MB | Transformer-style state encoder. Reads a sequence of recent listening events (skip / listen-through / replay, weighted by recency) and produces a user-state vector. |
|
| 30 |
+
| [`predictor_scorer_n100.onnx`](./predictor_scorer_n100.onnx) | 5 MB | Top-100 candidate scorer. Given the predictor state and 100 candidate embeddings (chosen by approximate-nearest-neighbor retrieval against the user state), scores each candidate. The highest score becomes the next track in smart-shuffle mode. |
|
| 31 |
+
| [`embedding_version.txt`](./embedding_version.txt) | 69 B | Bumps when the encoder changes. The app re-extracts all embeddings on mismatch. |
|
| 32 |
+
| [`predictor_version.txt`](./predictor_version.txt) | 20 B | Bumps when the predictor changes. The app drops the predictor cache on mismatch. |
|
| 33 |
+
|
| 34 |
+
## Intended use
|
| 35 |
+
|
| 36 |
+
- Powering the **smart-shuffle** feature in the LatentJam Android app: cycling the shuffle button to `SMART` picks the next track using these models.
|
| 37 |
+
- Experimenting with on-device music recommendation on mobile. The encoder + predictor are deliberately small β the entire pipeline (audio decode β encoder β state encoder β scorer) runs end-to-end in under a second on a Snapdragon 8 Gen 3 with the Hexagon NPU enabled.
|
| 38 |
+
|
| 39 |
+
These models are **not** intended for:
|
| 40 |
+
- Server-side recommendation (use a bigger CLAP variant and a proper retrieval index)
|
| 41 |
+
- Music classification or tagging
|
| 42 |
+
- Generating audio
|
| 43 |
+
|
| 44 |
+
## Pipeline overview
|
| 45 |
+
|
| 46 |
+
```
|
| 47 |
+
Library indexing (one-time, in background)
|
| 48 |
+
ββββββββββββββββββββββββββββββββββββββββ
|
| 49 |
+
mp3 / flac / opus / m4a / ogg βββ€ native C++ decoder (in LatentJam) β
|
| 50 |
+
β β β
|
| 51 |
+
β 15 s mono PCM at 48 kHz β
|
| 52 |
+
β β β
|
| 53 |
+
β clap_audio.onnx (this repo) β
|
| 54 |
+
β β β
|
| 55 |
+
β 512-d embedding, L2-normalized β
|
| 56 |
+
β β β
|
| 57 |
+
β Room (on-device cache) β
|
| 58 |
+
ββββββββββββββββββββββββββββββββββββββββ
|
| 59 |
+
|
| 60 |
+
Smart-shuffle inference (on demand)
|
| 61 |
+
ββββββββββββββββββββββββββββββββββββββββ
|
| 62 |
+
listening history (Room) βββ€ predictor_state.onnx (this repo) β
|
| 63 |
+
β β β
|
| 64 |
+
β user-state vector β
|
| 65 |
+
β β β
|
| 66 |
+
β ANN retrieval over cached embeddings β
|
| 67 |
+
β β β
|
| 68 |
+
β 100 candidate tracks β
|
| 69 |
+
β β β
|
| 70 |
+
β predictor_scorer_n100.onnx β
|
| 71 |
+
β β β
|
| 72 |
+
β next track β
|
| 73 |
+
βββββββββοΏ½οΏ½ββββββββββββββββββββββββββββββ
|
| 74 |
+
```
|
| 75 |
+
|
| 76 |
+
## Privacy
|
| 77 |
+
|
| 78 |
+
- All inference is on-device. No audio, no embeddings, no listening history is ever transmitted anywhere.
|
| 79 |
+
- The LatentJam Android app does not request the `INTERNET` permission for the recommender. The only network access is the build-time download from this repo onto the developer's machine.
|
| 80 |
+
|
| 81 |
+
## Limitations
|
| 82 |
+
|
| 83 |
+
- Smart mode requires that an embedding has been computed for every track. The first time you index a large library this takes a while β the encoder runs in the background only when the device is **charging + idle** (via WorkManager) to avoid thermal throttling and battery drain.
|
| 84 |
+
- The encoder is CLAP-derived but distilled to fit on-device. Genre/mood discrimination is good for popular Western genres and weaker for genres CLAP's training data underrepresented.
|
| 85 |
+
- The predictor was trained on a closed user-history dataset and may not generalize perfectly to your taste right away. On-device fine-tuning is planned but not yet shipped (see [`ml/retrain/RetrainWorker.kt`](https://github.com/Nikita-sud/latentjam/blob/main/app/src/main/java/io/github/nikitasud/latentjam/ml/retrain/RetrainWorker.kt) in the app repo β currently a stub).
|
| 86 |
+
|
| 87 |
+
## License
|
| 88 |
+
|
| 89 |
+
GPL-3.0-or-later, matching the [LatentJam Android app](https://github.com/Nikita-sud/latentjam).
|
| 90 |
+
|
| 91 |
+
The CLAP audio encoder is derived from [LAION's CLAP](https://github.com/LAION-AI/CLAP) (CC0/MIT) and quantized + exported to ONNX for on-device use. The state encoder and scorer were trained from scratch for this project.
|
| 92 |
+
|
| 93 |
+
## Links
|
| 94 |
+
|
| 95 |
+
- π± **Android app**: https://github.com/Nikita-sud/latentjam
|
| 96 |
+
- π **Architecture notes**: https://github.com/Nikita-sud/latentjam/blob/main/ARCHITECTURE_NOTES.md
|
| 97 |
+
- π **Fork notice & attribution**: https://github.com/Nikita-sud/latentjam/blob/main/FORK_NOTICE.md
|