Upload 6 files

Files changed (6) hide show

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
----
-license: apache-2.0
----

+# CPython Embeddings (Safetensors)
+This folder contains semantic embeddings generated from CPython
+documentation and readable source files.
+## Contents
+- cpython_embeddings.safetensors : Vector embeddings
+- vocab.txt : Extracted vocabulary
+- tokenizer.json : Tokenizer config
+- config.json : Model configuration
+- metadata.json : Project metadata
+## Notes
+- This is NOT a Python interpreter
+- This is NOT a trained LLM
+- Intended for semantic search & analysis

config.json ADDED Viewed

+{
+  "model_type": "embedding",
+  "source": "cpython",
+  "embedding_dim": 384,
+  "num_embeddings": 2813,
+  "framework": "sentence-transformers",
+  "description": "CPython documentation and code text embeddings",
+  "license": "PSF"
+}

cpython_embeddings.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:969faaec7588a43d68d97c56d1102cfccf8473b81cc00adb9fbb56607dcb1886
+size 4320856

metadata.json ADDED Viewed

+{
+  "created_by": "Ananthu Sajeev",
+  "project": "Venomoussaversai",
+  "dataset": "CPython",
+  "files_used": [
+    "py",
+    "txt",
+    "rst"
+  ],
+  "purpose": "Semantic embeddings, not execution",
+  "warning": "Not an executable Python interpreter"
+}

tokenizer.json ADDED Viewed

+{
+  "type": "word",
+  "unk_token": "[UNK]",
+  "pad_token": "[PAD]",
+  "vocab_size": 50000,
+  "vocab_file": "vocab.txt"
+}

tokenizer.model ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:07eced375cec144d27c900241f3e339478dec958f92fddbc551f295c992038a3
+size 231508