Buckets:

Mutibrains
/

Chat2Find-Instruct-Reasoning

30 MB

3 files

Updated about 2 months ago

Ctrl+K

Name	Size	Uploaded	Xet hash
.gitattributes	2.58 kB xet	about 2 months ago	9e1510ab
README.md	2.54 kB xet	about 2 months ago	18029014
reasoning_instruct_sample_5k.jsonl	30 MB xet	about 2 months ago	d1aebf88

README.md

Chat2Find Unified Reasoning & Tool Dataset (Public Sample)

This repository contains a 5,000-record public preview of the Chat2Find Unified Reasoning & Tool Dataset.

The full dataset is a premium, high-logic instruction dataset designed for training state-of-the-art conversational AI models. It contains 279,260 trilingual records optimized for complex problem-solving, chain-of-thought reasoning, and sophisticated tool-calling interactions in Sinhala, Tamil, and English.

📂 Access the Full Dataset

The full 1.8 GB dataset is available as a Gated Repository for commercial and advanced research use.

👉 Access the Full Dataset Here

How to get a license:

Purchase License: Use our secure Stripe link to purchase a commercial/advanced research license: 👉 Buy Full Dataset License (Stripe)
Provide Username: During the Stripe checkout, please enter your Hugging Face Username.
Approval: Once payment is confirmed, we will grant your account access to the gated repository within 24 hours.

📊 Sample Details & Composition

The 5,000 records in this preview are carefully curated to reflect the high quality of the full dataset.

Conversation Flow:

Single-turn (SFT): 70.0% (3,500 records)
Multi-turn (Agentic/Chat): 30.0% (1,500 records)

Reasoning & Execution:

Pure Chain-of-Thought Reasoning: 72.0%
Tool Calling & API Interaction: 28.0%

Language Breakdown:

Tamil: 45.7%
Sinhala: 36.3%
English: 18.0%
Note: Singlish and Tanglish code-mixed data are aggressively embedded within these records to ensure realistic South Asian conversational abilities.

🌟 What's in the Full Dataset?

The full 1.8 GB dataset contains 279,260 records offering a massive scale-up of everything seen in this sample.

Over 10,800 deep multi-turn interactions.
Hundreds of thousands of localized, culturally aware logic puzzles, tool invocations, and code-mixed conversations not found in standard open-source datasets.

Stay tuned to chat2find.com for updates.

Total size: 30 MB

Files: 3

Last updated: May 24

Pre-warmed CDN: US EU US EU

Chat2Find Unified Reasoning & Tool Dataset (Public Sample)

📂 Access the Full Dataset

How to get a license:

📊 Sample Details & Composition

🌟 What's in the Full Dataset?

Contributors