Why does relu train faster than sigmoid?

Question

Answer

Efficiency: ReLu is faster to compute than the sigmoid function, and its derivative is faster to compute. This makes a significant difference to training and inference time for neural networks: only a constant factor, but constants can matter.

Answer

SITHU Nyein does it also have to do with the fact that relu has less noise (deactivates neurons below zero completely, unlike sigmoid)?

Why does relu train faster than sigmoid?

Often have questions like this?