When I evaluated Qwen2-7B-Instruct on the data/test/GSM8K_test_data.jsonl and data/test/MATH_test_data.jsonl datasets using eval_math.py, the accuracy I obtained exceeded the values reported in the paper. Could there be an issue with one of my operations?
When I evaluated Qwen2-7B-Instruct on the data/test/GSM8K_test_data.jsonl and data/test/MATH_test_data.jsonl datasets using eval_math.py, the accuracy I obtained exceeded the values reported in the paper. Could there be an issue with one of my operations?