Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| .gitattributes | 2.58 kB xet | 9e1510ab | |
| README.md | 2.54 kB xet | 18029014 | |
| reasoning_instruct_sample_5k.jsonl | 30 MB xet | d1aebf88 |
Chat2Find Unified Reasoning & Tool Dataset (Public Sample)
This repository contains a 5,000-record public preview of the Chat2Find Unified Reasoning & Tool Dataset.
The full dataset is a premium, high-logic instruction dataset designed for training state-of-the-art conversational AI models. It contains 279,260 trilingual records optimized for complex problem-solving, chain-of-thought reasoning, and sophisticated tool-calling interactions in Sinhala, Tamil, and English.
📂 Access the Full Dataset
The full 1.8 GB dataset is available as a Gated Repository for commercial and advanced research use.
👉 Access the Full Dataset Here
How to get a license:
- Purchase License: Use our secure Stripe link to purchase a commercial/advanced research license: 👉 Buy Full Dataset License (Stripe)
- Provide Username: During the Stripe checkout, please enter your Hugging Face Username.
- Approval: Once payment is confirmed, we will grant your account access to the gated repository within 24 hours.
📊 Sample Details & Composition
The 5,000 records in this preview are carefully curated to reflect the high quality of the full dataset.
Conversation Flow:
- Single-turn (SFT): 70.0% (3,500 records)
- Multi-turn (Agentic/Chat): 30.0% (1,500 records)
Reasoning & Execution:
- Pure Chain-of-Thought Reasoning: 72.0%
- Tool Calling & API Interaction: 28.0%
Language Breakdown:
- Tamil: 45.7%
- Sinhala: 36.3%
- English: 18.0%
- Note: Singlish and Tanglish code-mixed data are aggressively embedded within these records to ensure realistic South Asian conversational abilities.
🌟 What's in the Full Dataset?
The full 1.8 GB dataset contains 279,260 records offering a massive scale-up of everything seen in this sample.
- Over 10,800 deep multi-turn interactions.
- Hundreds of thousands of localized, culturally aware logic puzzles, tool invocations, and code-mixed conversations not found in standard open-source datasets.
Stay tuned to chat2find.com for updates.
- Total size
- 30 MB
- Files
- 3
- Last updated
- May 24
- Pre-warmed CDN
- US EU US EU