Parametric flatten T-swish: An adaptive nonlinear activation function for deep learning
Parametric flatten T-swish: An adaptive nonlinear activation function for deep learning
The deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks.