[bug] GPQA-Diamond ground truth answers are all D

This line [here](https://github.com/mlfoundations/evalchemy/blob/6ed674159b37f740f2353a86f596f49f6ac13c19/eval/chat_benchmarks/GPQADiamond/eval_instruct.py#L188) in the generate_multiple_choice_answers results in all of the ground truth answers to be matched to D.