Upload tensor-compressed CodeRankEmbed model

Browse files

Files changed (9) hide show

README.md +92 -0
factorization_info.json +782 -0
load_compressed_model.py +225 -0
modules.json +14 -0
pytorch_model.bin +3 -0
special_tokens_map.json +37 -0
tokenizer.json +0 -0
tokenizer_config.json +64 -0
vocab.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,92 @@

+---
+license: mit
+tags:
+- tensor-compression
+- code-embeddings
+- factorized
+- tltorch
+base_model: nomic-ai/CodeRankEmbed
+---
+# CodeRankEmbed-compressed
+This is a tensor-compressed version of [nomic-ai/CodeRankEmbed](https://huggingface.co/nomic-ai/CodeRankEmbed) using tensor factorization.
+## Compression Details
+- **Compression method**: Tensor factorization using TLTorch
+- **Factorization types**: cp
+- **Ranks used**: 4
+- **Number of factorized layers**: 60
+- **Original model size**: 136.73M parameters
+- **Compressed model size**: 23.62M parameters
+- **Compression ratio**: 5.79x (82.7% reduction)
+## Usage
+To use this compressed model, you'll need to install the required dependencies and use the custom loading script:
+```bash
+pip install torch tensorly tltorch sentence-transformers
+```
+### Loading the model
+```python
+import torch
+import json
+from sentence_transformers import SentenceTransformer
+import tensorly as tl
+from tltorch.factorized_layers import FactorizedLinear, FactorizedEmbedding
+# Set TensorLy backend
+tl.set_backend("pytorch")
+# Load the model structure
+model = SentenceTransformer("nomic-ai/CodeRankEmbed", trust_remote_code=True)
+# Load factorization info
+with open("factorization_info.json", "r") as f:
+    factorized_info = json.load(f)
+# Reconstruct factorized layers (see load_compressed_model.py for full implementation)
+# ... reconstruction code ...
+# Load compressed weights
+checkpoint = torch.load("pytorch_model.bin", map_location="cpu")
+model.load_state_dict(checkpoint["state_dict"], strict=False)
+# Use the model
+embeddings = model.encode(["def hello_world():\n    print('Hello, World!')"])
+```
+## Model Files
+- `pytorch_model.bin`: Compressed model weights
+- `factorization_info.json`: Metadata about factorized layers
+- `tokenizer.json`, `vocab.txt`: Tokenizer files
+- `modules.json`: SentenceTransformer modules configuration
+## Performance
+The compressed model maintains good quality while being significantly smaller:
+- Similar embedding quality (average cosine similarity > 0.9 with original)
+- 5.79x smaller model size
+- Faster loading and inference on CPU
+## Citation
+If you use this compressed model, please cite the original CodeRankEmbed model:
+```bibtex
+@misc{nomic2024coderankembed,
+  title={CodeRankEmbed},
+  author={Nomic AI},
+  year={2024},
+  url={https://huggingface.co/nomic-ai/CodeRankEmbed}
+}
+```
+## License
+This compressed model inherits the license from the original model. Please check the original model's license for usage terms.

factorization_info.json ADDED Viewed

	@@ -0,0 +1,782 @@

+{
+  "0.auto_model.encoder.layers.0.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.0.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.0.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.0.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.0.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.1.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.1.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.1.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.1.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.1.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.2.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.2.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.2.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.2.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.2.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.3.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.3.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.3.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.3.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.3.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.4.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.4.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.4.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.4.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.4.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.5.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.5.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.5.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.5.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.5.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.6.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.6.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.6.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.6.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.6.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.7.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.7.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.7.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.7.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.7.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.8.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.8.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.8.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.8.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.8.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.9.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.9.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.9.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.9.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.9.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.10.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.10.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.10.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.10.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.10.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  },
+  "0.auto_model.encoder.layers.11.attn.Wqkv": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 2304,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      2304,
+      768
+    ],
+    "tensorized_shape": "((9, 16, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.11.attn.out_proj": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      768
+    ],
+    "tensorized_shape": "((4, 12, 16), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.11.mlp.fc11": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.11.mlp.fc12": {
+    "type": "FactorizedLinear",
+    "in_features": 768,
+    "out_features": 3072,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      3072,
+      768
+    ],
+    "tensorized_shape": "((8, 16, 24), (4, 12, 16))"
+  },
+  "0.auto_model.encoder.layers.11.mlp.fc2": {
+    "type": "FactorizedLinear",
+    "in_features": 3072,
+    "out_features": 768,
+    "bias": true,
+    "rank": 4,
+    "factorization": "cp",
+    "weight_shape": [
+      768,
+      3072
+    ],
+    "tensorized_shape": "((4, 12, 16), (12, 16, 16))"
+  }
+}

load_compressed_model.py ADDED Viewed

	@@ -0,0 +1,225 @@

+#!/usr/bin/env python3
+"""
+Load and use compressed models saved by compress_model.py
+"""
+import os
+import json
+import torch
+from transformers import AutoTokenizer
+from sentence_transformers import SentenceTransformer
+import tensorly as tl
+from tltorch.factorized_layers import FactorizedLinear, FactorizedEmbedding
+# Set TensorLy backend to PyTorch
+tl.set_backend("pytorch")
+def reconstruct_factorized_layer(layer_info, state_dict_prefix):
+    """Reconstruct a factorized layer from saved metadata."""
+    layer_type = layer_info["type"]
+    # Use defaults if factorization/rank not specified
+    factorization = layer_info.get("factorization", "cp")  # default to CP factorization
+    rank = layer_info.get("rank", 4)  # default rank of 4
+    if layer_type == "FactorizedLinear":
+        # Create a regular linear layer first
+        in_features = layer_info.get("in_features")
+        out_features = layer_info.get("out_features")
+        if in_features is None or out_features is None:
+            raise ValueError(f"Missing in_features or out_features for FactorizedLinear layer")
+        # Create a dummy linear layer
+        import torch.nn as nn
+        linear = nn.Linear(in_features, out_features, bias=layer_info.get("bias", True))
+        # Convert to factorized using the from_linear method
+        layer = FactorizedLinear.from_linear(
+            linear,
+            rank=rank,
+            factorization=factorization.upper(),  # The method expects uppercase
+            implementation='reconstructed'
+        )
+    elif layer_type == "FactorizedEmbedding":
+        # Create a regular embedding layer first
+        num_embeddings = layer_info.get("num_embeddings")
+        embedding_dim = layer_info.get("embedding_dim")
+        if num_embeddings is None or embedding_dim is None:
+            raise ValueError(f"Missing num_embeddings or embedding_dim for FactorizedEmbedding layer")
+        # Create a dummy embedding layer
+        import torch.nn as nn
+        embedding = nn.Embedding(
+            num_embeddings=num_embeddings,
+            embedding_dim=embedding_dim,
+            padding_idx=layer_info.get("padding_idx", None),
+            max_norm=layer_info.get("max_norm", None),
+            norm_type=layer_info.get("norm_type", 2.0),
+            scale_grad_by_freq=layer_info.get("scale_grad_by_freq", False),
+            sparse=layer_info.get("sparse", False)
+        )
+        # Convert to factorized using the from_embedding method
+        layer = FactorizedEmbedding.from_embedding(
+            embedding,
+            rank=rank,
+            factorization=factorization
+        )
+    else:
+        raise ValueError(f"Unknown factorized layer type: {layer_type}")
+    return layer
+def set_module_by_path(model, path, new_module):
+    """Set a module in the model by its dotted path."""
+    parts = path.split('.')
+    parent = model
+    # Navigate to the parent module
+    for part in parts[:-1]:
+        parent = getattr(parent, part)
+    # Set the new module
+    setattr(parent, parts[-1], new_module)
+def load_compressed_model(load_dir: str, device="cpu"):
+    """Load a compressed model from the saved artifacts."""
+    # Load factorization info
+    factorization_info_path = os.path.join(load_dir, "factorization_info.json")
+    if not os.path.exists(factorization_info_path):
+        raise FileNotFoundError(f"No factorization_info.json found in {load_dir}")
+    with open(factorization_info_path, "r") as f:
+        factorized_info = json.load(f)
+    # Load the saved checkpoint
+    checkpoint_path = os.path.join(load_dir, "pytorch_model.bin")
+    if not os.path.exists(checkpoint_path):
+        # Try alternative path
+        checkpoint_path = os.path.join(load_dir, "model_state.pt")
+        if not os.path.exists(checkpoint_path):
+            raise FileNotFoundError(f"No model checkpoint found in {load_dir}")
+    checkpoint = torch.load(checkpoint_path, map_location=device)
+    # Extract info from checkpoint
+    if isinstance(checkpoint, dict) and "state_dict" in checkpoint:
+        state_dict = checkpoint["state_dict"]
+        is_sentence_encoder = checkpoint.get("is_sentence_encoder", False)
+        model_name = checkpoint.get("model_name", "unknown")
+    else:
+        # Assume it's just the state dict
+        state_dict = checkpoint
+        is_sentence_encoder = False
+        model_name = "unknown"
+    print(f"Loading compressed model (sentence_encoder={is_sentence_encoder})")
+    # For sentence encoders, we need to reconstruct differently
+    if is_sentence_encoder:
+        # Try to load the base model first
+        # This is a simplified approach - in practice, you'd need the original model architecture
+        print("Note: Loading sentence encoders requires the original model architecture.")
+        print("The compressed weights will be loaded, but the model structure needs to be reconstructed manually.")
+        # Return the loaded components for manual reconstruction
+        return {
+            "state_dict": state_dict,
+            "factorized_info": factorized_info,
+            "is_sentence_encoder": True,
+            "model_name": model_name,
+        }
+    else:
+        # For standard transformers models, we can try to reconstruct
+        # This is also simplified - you'd need to know the original model class
+        print("Note: Loading compressed models requires knowing the original model architecture.")
+        return {
+            "state_dict": state_dict,
+            "factorized_info": factorized_info,
+            "is_sentence_encoder": False,
+            "model_name": model_name,
+        }
+def load_compressed_sentence_transformer(original_model_name: str, compressed_dir: str, device="cpu"):
+    """
+    Load a compressed SentenceTransformer model.
+    Args:
+        original_model_name: Name of the original model (e.g., "nomic-ai/CodeRankEmbed")
+        compressed_dir: Directory containing the compressed model
+        device: Device to load the model on
+    Returns:
+        Compressed SentenceTransformer model
+    """
+    # Load the original model structure
+    model = SentenceTransformer(original_model_name, device=device, trust_remote_code=True)
+    # Load compression artifacts
+    artifacts = load_compressed_model(compressed_dir, device)
+    if not artifacts.get("is_sentence_encoder"):
+        raise ValueError("The compressed model is not a sentence encoder")
+    # Load the compressed state dict
+    state_dict = artifacts["state_dict"]
+    factorized_info = artifacts["factorized_info"]
+    # Reconstruct factorized layers
+    for layer_path, layer_info in factorized_info.items():
+        # Create the factorized layer
+        factorized_layer = reconstruct_factorized_layer(layer_info, layer_path)
+        # Set it in the model
+        set_module_by_path(model, layer_path, factorized_layer)
+    # Load the state dict
+    model.load_state_dict(state_dict, strict=False)
+    return model
+def example_usage():
+    """Example of how to use the compressed model loader."""
+    compressed_dir = "coderank_compressed"
+    original_model = "nomic-ai/CodeRankEmbed"
+    print(f"Loading compressed model from {compressed_dir}")
+    try:
+        # For sentence transformers
+        model = load_compressed_sentence_transformer(
+            original_model_name=original_model,
+            compressed_dir=compressed_dir,
+            device="cpu"
+        )
+        # Test the model
+        sentences = ["def hello_world():\n    print('Hello, World!')", "System.out.println('Hello, World!');"]
+        embeddings = model.encode(sentences)
+        print(f"✔ Successfully loaded compressed model")
+        print(f"  Embedding shape: {embeddings.shape}")
+    except Exception as e:
+        print(f"⚠ Error loading compressed model: {e}")
+        print("\nTo manually load the compressed model:")
+        print("1. Load the factorization_info.json to see the compressed layer structure")
+        print("2. Reconstruct the model with factorized layers based on the metadata")
+        print("3. Load the state dict from pytorch_model.bin")
+if __name__ == "__main__":
+    example_usage()

modules.json ADDED Viewed

	@@ -0,0 +1,14 @@

+[
+  {
+    "idx": 0,
+    "name": "0",
+    "path": "",
+    "type": "sentence_transformers.models.Transformer.Transformer"
+  },
+  {
+    "idx": 1,
+    "name": "1",
+    "path": "",
+    "type": "sentence_transformers.models.Pooling.Pooling"
+  }
+]

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eed41647611d292536296319849913bf499397733c99f800a4c136b9e141b900
+size 94683034

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,37 @@

+{
+  "cls_token": {
+    "content": "[CLS]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "[MASK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "[PAD]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "[SEP]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "[UNK]",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,64 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "[PAD]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100": {
+      "content": "[UNK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "101": {
+      "content": "[CLS]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "102": {
+      "content": "[SEP]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "103": {
+      "content": "[MASK]",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [],
+  "clean_up_tokenization_spaces": true,
+  "cls_token": "[CLS]",
+  "do_lower_case": true,
+  "extra_special_tokens": {},
+  "mask_token": "[MASK]",
+  "max_length": 512,
+  "model_max_length": 8192,
+  "pad_to_multiple_of": null,
+  "pad_token": "[PAD]",
+  "pad_token_type_id": 0,
+  "padding_side": "right",
+  "sep_token": "[SEP]",
+  "stride": 0,
+  "strip_accents": null,
+  "tokenize_chinese_chars": true,
+  "tokenizer_class": "BertTokenizer",
+  "truncation_side": "right",
+  "truncation_strategy": "longest_first",
+  "unk_token": "[UNK]"
+}

vocab.txt ADDED Viewed

The diff for this file is too large to render. See raw diff