Spaces:

ashish-sarvam
/

data-gen

Runtime error

App Files Files Community

data-gen / notes.md

ashish-sarvam

Upload folder using huggingface_hub

fc1a684 verified 4 months ago

preview code

raw

history blame contribute delete

1.71 kB


	- point of diversity - use cases (tools, bot prompt types, kbs) \| user persona (user characteristics, conversation charactersitics)
	- conversation characteristics - recalls, length, personalisation, errors




	DATA GENERATION:
	1. econmomic times india - 2022,23,24,25
	2. https://www.bls.gov/



	____
	memory protocols - different sort of memories
	how to handle memory
	decide what to forget
	long horizon context - 10hrs
	human in loop - pause and resume - what all is done
	sql on large number of rows deep queries
	good hypothesis of what to test - like dfs is a better way to solve the problem
	deep research report - mckinsey reports - language and ways
	generate long documents
	4. verification and self check loops - first i need to have confidence and then increase the confidence - what is important to verify here


	----
	- tool output conflicts with actual variables





	__________________________________________________
	## Moving To RL
	- adding verifier - add small verifier after we get the trajectory



	## Overall
	- Looking at Arya for data gen
	- Looking at Sierra and other workflow providers for data gen

	______________________
	// quantity works better than quality in data gen with llm --> generate more number of samples and then dedup rather than constraining on a smaller quality set
	lesser - 5 companies 30 use case --> 11/30 kept
	more - 5 companies 60 use cases --> 15/60 kept

	removing stakeholders with X cross mapping


	## Types of errors:
	1. user prompt was coming in hindi because of user language = fixed by prompting
	2. user did not comply with tool results - tool said order had tomoato, banana - user said issue in tomato and spinach



	--> gave complexity rubric for state budget