Make a script that can run 5-10 approximately test prompts and provide general statistics on Precision/Recall/etc