The primary reason for NNs to fail to learn such numerical representation is the use of non linear activation functions in hidden layers of the network. Such activation functions are crucial to learn the abstract non linear relationship between inputs and labels but they fail miserably when it comes to learn the numerical representation outside the range of the numbers seen in the training data. Hence, such networks are very good to memorize the numerical pattern seen in the training set but fail to extrapolate this representation well.