Spaces:

snskrt
/

README

Configuration error

App Files Files Community

13Aluminium commited on May 17, 2025

Commit

e9133ed

verified ·

1 Parent(s): e4c7b83

Delete README.md

Browse files

Files changed (1) hide show

README.md +0 -75

README.md DELETED Viewed

@@ -1,75 +0,0 @@
----
-title: README
-emoji: 🐢
-colorFrom: indigo
-colorTo: indigo
-sdk: static
-pinned: true
-license: apache-2.0
-short_description: Sanskrit scripture datasets, structured for NLP tasks.
-thumbnail: >-
-  https://cdn-uploads.huggingface.co/production/uploads/66fa59a2ec6983f03c2dd4e0/lF5toNU3-x6igbMxiA1wI.jpeg
----
-## 1. Shrimad Bhagavad Gita
-[Dataset on Hugging Face](https://huggingface.co/datasets/snskrt/Shrimad_Bhagavad_Gita)
-**Short Description:** A structured, chapter-wise dataset of the Śrīmad Bhagavad Gītā with expanded verse counts, enabling fine-grained analysis and modeling of each śloka.
-## 2. Devi Bhagavatam
-[Dataset on Hugging Face](https://huggingface.co/datasets/snskrt/Devi_Bhagavatam)
-**Dataset Structure:** Each record (CSV/JSON) includes:
-- `skanda` (string): Skanda number, e.g. "1"
-- `adhyaya_number` (string): Adhyāya index, e.g. "१.१"
-- `adhyaya_title` (string): Sanskrit chapter title
-- `a_index` (int): auxiliary sequence index
-- `m_index` (int): main sequence index
-- `text` (string): full śloka text
-**Dataset Description:**
-This dataset contains a complete, structured representation of the Śrīmad Devī-bhāgavatam mahāpurāṇe in CSV format, breaking down scripture into Skandas, Adhyāyas, and individual ślokas. Suited for NLP tasks like feature extraction, classification, translation, summarization, and generation.
-**Size:** ~18,702 ślokas
-## 3. Shiv Mahapuran
-[Dataset on Hugging Face](https://huggingface.co/datasets/snskrt/Shiv_Mahapuran)
-**Dataset Description:**
-This dataset contains a complete, structured representation of the Śiva Mahāpurāṇa (Śivapurāṇa) in CSV format. Data is organized into Saṃhitās (seven surviving Saṃhitās), Khaṇḍas, Adhyāyas, and individual ślokas, enabling precise NLP work on classical Sanskrit scripture.
-**Size:** ~24,489 ślokas
-**Dataset Structure:** Each record (CSV/JSON) includes:
-- `samhita` (string): Name of the Saṃhitā, e.g. "Rudrasaṃhitā"
-- `khanda` (string): Khanda name, e.g. "Parvati kand"
-- `khanda_number` (string): Khanda index, e.g. "1"
-- `adhyay` (string): Adhyāya title or number, e.g. "1.1"
-- `shloka_number` (int): Position of the śloka within the Adhyāya
-- `shloka_text` (string): Full Sanskrit text of the śloka
-## 4. Shiv Puran OCR (Image-Text)
-[Dataset on Hugging Face](https://huggingface.co/datasets/snskrt/Shiv_Puran_Image_text_OCR)
-**Dataset Description:**
-A dataset of cropped śloka images from the Vidyeśvara-saṃhitā, paired with their transcribed text. Perfect for training or evaluating OCR systems on classical Sanskrit script.
-**Contents:**
-- 734 cropped śloka images
-- A CSV mapping each image filename to its corresponding śloka text
-## 5. Shiv Puran OCR (Object Detection)
-[Dataset on Hugging Face](https://huggingface.co/datasets/snskrt/Shiv_puran_OCR)
-**Dataset Description:**
-Annotations and imagery to train object detection models that differentiate śloka vs. non-śloka content in scanned scripture pages. Once detected, ślokas can be cropped for OCR or parallel corpus creation.
-**Annotation Structure:**
-- Pages 0–102: Vidyeśvara Saṃhitā (manually annotated)
-- Pages 103–463: Rudra Saṃhitā (model-inferred + manual corrections)
-- Pages 464–508: Śat Rudra Saṃhitā (model-inferred + manual corrections)
-**Data Includes:** Bounding-box coordinates and metadata for each detected region.