In an RNN, you could connect each hidden state at time step t, h(t) to h(t-N), i...

gwern · on Feb 4, 2019

You could, but it's not equivalent, and no one seems to have been able to use clockwork RNNs or related archs to achieve similar performance, so the differences would seem to make a difference.

trott · on Feb 4, 2019

Right. I'm just saying that this myopia is not a fundamental property of the recurrence any more than of convolution.

Clockwork RNNs subsample, BTW, so they are more analogous stride=2 in CNNs than to dilation.