Created by: PeganovAnton
Consider seq2seq generating several sentences. y_predicted is list of lists of integers or 2D numpy.ndarray. When language model task is trained y_predicted, has shape [sequence_length, batch_size]. Old function does not allow by-token accuracy calculation in these cases.