Spaces:

ashish-sarvam
/

data-gen

Runtime error

App Files Files Community

data-gen / notes.md

ashish-sarvam

Upload folder using huggingface_hub

fc1a684 verified 4 months ago

preview code

raw

history blame contribute delete

1.71 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

point of diversity - use cases (tools, bot prompt types, kbs) | user persona (user characteristics, conversation charactersitics)
conversation characteristics - recalls, length, personalisation, errors

DATA GENERATION:

econmomic times india - 2022,23,24,25
https://www.bls.gov/

memory protocols - different sort of memories how to handle memory decide what to forget long horizon context - 10hrs human in loop - pause and resume - what all is done sql on large number of rows deep queries good hypothesis of what to test - like dfs is a better way to solve the problem deep research report - mckinsey reports - language and ways generate long documents 4. verification and self check loops - first i need to have confidence and then increase the confidence - what is important to verify here

tool output conflicts with actual variables

Moving To RL

adding verifier - add small verifier after we get the trajectory

Overall

Looking at Arya for data gen
Looking at Sierra and other workflow providers for data gen

// quantity works better than quality in data gen with llm --> generate more number of samples and then dedup rather than constraining on a smaller quality set lesser - 5 companies 30 use case --> 11/30 kept more - 5 companies 60 use cases --> 15/60 kept

removing stakeholders with X cross mapping

Types of errors:

user prompt was coming in hindi because of user language = fixed by prompting
user did not comply with tool results - tool said order had tomoato, banana - user said issue in tomato and spinach

--> gave complexity rubric for state budget