Spaces:
Running
Running
| title: Normalized ConceptNet Explorer | |
| emoji: ⚡ | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.49.1 | |
| app_file: app.py | |
| pinned: true | |
| license: cc-by-sa-4.0 | |
| tags: | |
| - conceptnet | |
| - knowledge-graph | |
| - sqlite | |
| - normalized | |
| - gradio | |
| - fast-queries | |
| # ⚡ Normalized ConceptNet Explorer (V7) | |
| This application is a high-performance explorer for a normalized, filtered, and optimized version of the ConceptNet 5.5 knowledge graph. | |
| It is designed to be **extremely fast**, returning queries in milliseconds instead of minutes. It queries a 1.78 GB optimized SQLite database with integer-based joins, not the 23.6 GB un-normalized file. | |
| ## Features | |
| This app provides a full suite of tools to explore the normalized database: | |
| - **⚡ Semantic Profile**: Explore relations for any word in real-time. This now runs in ~4 fast SQL queries instead of 24+ slow ones. | |
| - **⚡ Query Builder**: Build custom queries (start node, relation, end node) that are executed with fast, integer-based joins. | |
| - **⚡ Raw SQL**: Execute SQL queries directly against the new, normalized database schema (see schema below). | |
| - **⚡ Schema**: Browse the new, efficient database schema, including all tables, indexes, and row counts. | |
| ## How It Works: The Normalized Database | |
| This app's speed and correctness come from the new database it queries: [cstr/conceptnet-normalized-multi](https://huggingface.co/datasets/cstr/conceptnet-normalized-multi). | |
| This database was created by a V7 normalization script that fixed critical issues found in the original data: | |
| 1. **Normalization (Speed & Size)**: The original 23.6 GB `edge` table (34M rows) was bloated with text URLs. The new 1.78 GB `edge_norm` table replaces these with tiny integer foreign keys. | |
| 2. **Data Correctness (V7 Fix)**: The original `node` table (28M rows) was used as the source of truth. We migrated all 28M nodes and their authoritative `language` columns. | |
| 3. **Preserves Cross-Language Links**: The 34M edges were filtered to keep any edge where at least one node (start or end) was in our 11 target languages (`en`, `de`, `fr`, `it`, `es`, `ar`, `fa`, `grc`, `he`, `la`, `hbo`). This is critical, as it correctly preserves cross-language connections (e.g., `犬 (ja) -> hund (de)`), which were broken in previous attempts. | |
| The result is a clean, fast, and data-correct database that contains all relevant connections for our target languages. | |
| ## Supported Languages | |
| This normalized version includes edges for 11 languages: | |
| - English (en) | |
| - German (de) | |
| - French (fr) | |
| - Italian (it) | |
| - Spanish (es) | |
| - Arabic (ar) | |
| - Persian (fa) | |
| - Ancient Greek (grc) | |
| - Hebrew (he) | |
| - Latin (la) | |
| - Biblical Hebrew (hbo) | |
| Cross-language connections from other languages to these target languages are preserved. | |
| ## Original Dataset Information | |
| This work includes data from ConceptNet 5, which was compiled by the Commonsense Computing Initiative. ConceptNet 5 is freely available under the Creative Commons Attribution-ShareAlike license (CC BY SA 4.0) from http://conceptnet.io. | |
| For a full list of licenses and attributions for included resources such as WordNet, Open Multilingual WordNet, and Wikimedia projects, please see the original dataset card. | |
| ## Citation Information | |
| If you use this data in your work, please cite the original ConceptNet 5.5 paper: | |
| ```bibtex | |
| @inproceedings{speer2017conceptnet, | |
| author = {Robyn Speer and Joshua Chin and Catherine Havasi}, | |
| title = {ConceptNet 5.5: An Open Multilingual Graph of General Knowledge}, | |
| booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, | |
| year = {2017}, | |
| pages = {4444--4451}, | |
| url = {http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14972} | |
| } | |
| ``` |