It may be worth considering adding non-nan functions to this module. They are usually faster. Here are some:
# taken from xgboost
from numpy.core import umath as um
# save those O(100) nanoseconds!
umr_maximum = um.maximum.reduce
umr_minimum = um.minimum.reduce
umr_sum = um.add.reduce
umr_prod = um.multiply.reduce
umr_any = um.logical_or.reduce
umr_all = um.logical_and.reduce
It may be worth considering adding non-nan functions to this module. They are usually faster. Here are some: