- point of diversity - use cases (tools, bot prompt types, kbs) | user persona (user characteristics, conversation charactersitics) - conversation characteristics - recalls, length, personalisation, errors DATA GENERATION: 1. econmomic times india - 2022,23,24,25 2. https://www.bls.gov/ ____ memory protocols - different sort of memories how to handle memory decide what to forget long horizon context - 10hrs human in loop - pause and resume - what all is done sql on large number of rows deep queries good hypothesis of what to test - like dfs is a better way to solve the problem deep research report - mckinsey reports - language and ways generate long documents 4. verification and self check loops - first i need to have confidence and then increase the confidence - what is important to verify here ---- - tool output conflicts with actual variables __________________________________________________ ## Moving To RL - adding verifier - add small verifier after we get the trajectory ## Overall - Looking at Arya for data gen - Looking at Sierra and other workflow providers for data gen ______________________ // quantity works better than quality in data gen with llm --> generate more number of samples and then dedup rather than constraining on a smaller quality set lesser - 5 companies 30 use case --> 11/30 kept more - 5 companies 60 use cases --> 15/60 kept removing stakeholders with X cross mapping ## Types of errors: 1. user prompt was coming in hindi because of user language = fixed by prompting 2. user did not comply with tool results - tool said order had tomoato, banana - user said issue in tomato and spinach --> gave complexity rubric for state budget