Long-Horizon Dependency Modeling in Machine Learning Without Explicit Temporal Supervision
Keywords:
Long-horizon, dependency modeling, machine learning, implicit supervision, sequence learning, temporal modeling, deep learning, attention-free models, S4, RWKV, HyenaAbstract
Long-horizon dependency modeling has become a central challenge in machine learning systems, particularly in areas such as sequential decision-making, generative modeling, and video understanding. Traditional approaches like recurrent architectures and Transformers often rely on explicit temporal supervision and attention mechanisms, which are costly and limit scalability. Recent advances now aim to circumvent the need for such explicit supervision by learning implicit representations that can capture temporal dependencies over long horizons. This paper explores the current state of such methodologies, evaluates them through comparative analysis, and discusses their potential and limitations.
References
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. In: Advances in Neural Information Processing Systems, 30, 5998–6008.
Gu, A., Goel, K., & Ré, C. (2022). Efficiently modeling long sequences with structured state spaces. In: International Conference on Learning Representations (ICLR), 2022(1), 1–28.
Peng, H., He, Y., Fan, L., Liu, J., & Yu, C. (2023). RWKV: Reinventing RNNs for the transformer era. arXiv preprint arXiv:2305.13048.
Dao, T., Fu, W., Ermon, S., Ré, C., & Rudra, A. (2022). FlashAttention: Fast and memory-efficient exact attention with IO-awareness. In: Advances in Neural Information Processing Systems, 35, 15236–15250.
Poli, M., Rabe, M. N., & Sohl-Dickstein, J. (2023). Hyena: Competing with attention without attention. arXiv preprint arXiv:2302.10866.
Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., ... & Weller, A. (2021). Rethinking attention with performers. In: International Conference on Learning Representations, 2021(1), 1–24.
Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. (2020). Transformers are RNNs: Fast autoregressive transformers with linear attention. In: Proceedings of the 37th International Conference on Machine Learning, 119, 5156–5165.
Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., ... & Ahmed, A. (2020). Big bird: Transformers for longer sequences. In: Advances in Neural Information Processing Systems, 33, 17283–17297.
Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
Lin, Z., Feng, M., Santos, C. N. d., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. (2017). A structured self-attentive sentence embedding. In: International Conference on Learning Representations, 2017(1), 1–9.
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations, 2015(1), 1–15.
Al-Rfou, R., Choe, D., Constant, N., Guo, M., Jones, L., & Kaiser, Ł. (2019). Character-level language modeling with deeper self-attention. In: Proceedings of the AAAI Conference on Artificial Intelligence, 33(1), 3159–3166.
Tay, Y., Dehghani, M., Bahri, D., & Metzler, D. (2020). Efficient transformers: A survey. arXiv preprint arXiv:2009.06732.
Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-XL: Attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1, 2978–2988.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Geoffrey Hinton (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
