I find the evaluation always run for every user. do you have some example for evaluation on sampled item?
I know that this metrics have some bias, but in some paper still use this.

and why there is no NeuMF implements in NeuRec3.X? is too slow to evaluation on all negative items?