NotesEssayMay 21, 2026

Towards Adaptive Learning Systems.

The potential of Reinforcement Learning in eLearning systems.

By Frederik WillemsenReading ~12 minSources 20 references

01 Introduction

With the uprising of improving Artificial Intelligence algorithms and more available datasets to train these algorithms on, the implementation of Reinforcement Learning into eLearning systems has become a recent trend. E-learning systems gained enormous popularity during the COVID-19 pandemic when schools had to be closed, and people had to go online in order to educate themselves. Duolingo and Babbel are examples of e-learning systems used to learn languages, but many more platforms, like Moodle or Kahoot, exist where you can start to educate yourself online.

As platforms like Duolingo have millions of users, the question arose of how to optimize the learning experience for each of the users, to make learning more fun, easy, and accessible. Reinforcement Learning offers a lot of possibilities to tackle these problems, with advanced algorithms and a lot of data that is collected on these online platforms such as in Duolingo[1], it might just be a perfect solution to make learning more fun and personalized.

Traditionally, learning — and especially education — have been a system stuck without progress. The same answers and the same questions are provided to people with different levels of knowledge. May this be the most optimal way to learn? Unlikely.

This article is a review of the potential Reinforcement Learning has in eLearning Systems, to make learning more adaptive, efficient, and fun.

Fig 01 · Where reinforcement learning sits within machine learning — and where deep learning intersects all three paradigms.

02 Background

eLearning and adaptive systems

E-Learning systems are defined as all forms of online electronic-supported systems that enable the opportunity to learn and teach. Furthermore, e-learning systems should aid the construction of knowledge, practice, and individual experience of the learner. Thereby, the electronic devices used serve as media to implement the learning process. What makes an e-learning system adaptive is its capability to track the learner's knowledge, style, preferences, and behaviours over time. With this learned data, an adaptive system would be able to use AI/ML to tailor the content to the learner and suggest an optimal learning path, analyse weaknesses, and adjust the learning process and difficulty to the learner's level in real time [2].

Basics of Reinforcement Learning

Reinforcement Learning is a method in machine learning. It focuses on how agents can learn to make decisions through interaction with an environment to maximise the cumulative reward function over time. While Supervised Learning relies on labeled data, and Unsupervised Learning uncovers hidden patterns, Reinforcement Learning learns optimal actions through trial and error without explicit instructions [3].

Reinforcement Learning therefore suffers under the exploration-exploitationtrade-off, which refers to the decision dilemma between exploring new states — which could be less optimal than a known action — and exploiting an already known action that is guaranteed to return the best reward under all the currently known actions [3].

Fig 02 · The agent–environment loop. Action flows one way, state and reward come back.

State

The part of the environment at a particular moment — the situation the agent is in.

Action

At any state, the agent can choose from multiple actions to take.

Reward

For any action taken in a state, the agent receives a reward telling it how good or bad the action was.

Discount factor

Controls how important future rewards are compared to immediate rewards [4].

Pedagogical theories

Adaptive eLearning systems are the most effective when they support established theories from pedagogical research and are focused on the learner and their user experience. In Constructivism, learners should build knowledge through experience and interaction, as learning is seen as an active process; this aligns with adaptive technologies such as eLearning systems [5]. Additionally, Mastery Learning states that given enough time and effort, all learners can achieve a high level of understanding [6]. Reinforcement Learning can help here by adjusting the difficulty.

03 Reinforcement Learning in Education

Use cases of RL applications in eLearning

Reinforcement Learning can be applied across a wide range of tasks in e-learning. A few of the potential tasks where it is studied to be used:

Curriculum sequencing
Determine the most effective sequence in which learning material should be presented to the student to maximise learning gains. [7] explored how to optimise teaching with RL by modeling it as a POMDP.
Adaptive feedback
Adapt feedback timing and content based on previous behaviour and interaction of the student. The goal is to help learners with the right information at the right time. [8] studied how RL-driven feedback improves problem solving in physics tutoring systems.
Dynamic difficulty adjustment
Adapt the difficulty level in real time to match the student's ability, reducing frustration and enhancing engagement. In [9], RL-tuned visual memory games led to better engagement and outcomes.
Spaced repetition
RL agents trained to suggest when to review learning material such that long-term memorisation is maximised.

Techniques used

A multitude of different RL strategies are used in e-learning, which depend on the context and the intended implementation, as well as various design choices. In most systems, Reinforcement Learning is implemented as a Markov Decision Process, which models learners' progress as a sequential process from state to state. Techniques such as Q-Learning and Deep Q-Networks are most effective for modeling the states. At the same time, multi-armed banditalgorithms are used when immediate feedback and short-term planning — such as choosing optimal reminders or learning tips — is the goal. On top of that, offline RL methods are used for educational purposes, as safety and ethical constraints are important.

Rule-based vs. data-driven systems

Legacy eLearning systems rely on rule-based systems, which are designed in an if-then-else manner. This effectively treats all learners similarly and assumes everyone is starting from the same knowledge level, and does not provide personalized support features. Data-driven methods use machine learning or reinforcement learning to tailor feedback for the user; however, classical machine learning models are dependent on the goodness of their training data. Reinforcement Learning is therefore used to improve these systems while deployed on real interaction data. They dynamically learn optimal decisions, based on long-term learning outcomes, and then suggest the best decision for each learner. Research has shown that data-driven systems often outperform rule-based systems — a study about adaptive learning in higher education [10] showed that personalized feedback increased academic performance and student engagement.

04 Existing Research

Case studies

In [11], the authors describe how a multi-armed bandit algorithm — a type of RL algorithm — is used in Duolingo to send personalized push notifications. These daily push notifications are selected by a multi-armed bandit system to maximise engagement with the app. The algorithm is context-aware, which means a user with a daily streak is more likely to get a notification to keep their streak than a user who does not have one. Furthermore, the algorithm introduced a recency penalty, meaning that reminders get more effective the longer they have not been used.

The deployment of these measures led to a measurable increase in user engagement, lesson completions, and daily active users. The RL algorithm works in production and is used daily to suggest automated messages for millions of Duolingo users. Testing showed that 0.4% more lessons were completed, and new-user retention increased by approximately 2% compared to the baseline.

PolicyAvg. reward (r̄ ±0.00015)

Baseline (random)

0.1295

—

Template

0.1311

+1.2%

Template + UI language

0.1318

+1.8%

Fig 03 · Multi-armed bandit policies vs. random baseline for Duolingo push notifications. Adapted from Yancey & Settles (2020).

In another paper, [12], the author implemented a Reinforcement Learning pipeline that personalizes the order of questions for each learner. This was achieved by framing the problem as a Markov Decision Process, where states represent the student's knowledge, actions are the selection of the next questions, and rewards are based on correctness and engagement of the learner. For training and testing, the EdNet dataset was used, without live deployment. The pipeline serves as a proof-of-concept, showing how such a system could work and which design principles matter for performance and generalisation. The author demonstrated that an RL policy can be used effectively in eLearning systems — under the assumption that state representations are well-designed — and highlighted the importance of feature engineering in educational RL.

On top of that, [13]looks at how applied deep RL can be used to provide feedback, hints, or explanations in an intelligent tutoring system. The system seeks to learn when students need help during their assignments and gives feedback at the right moment to maximise cognitive learning. Furthermore, it seeks to creatively explain ideas rather than encouraging clicking and guessing, enhancing engagement and deeper thinking. It learns from past interactions to decide on the best moment to offer guidance, instead of following strict intervals. Trained on real student data, the system did a better job at providing feedback than a human tutor following preset strategies — a striking demonstration of adaptive ML outperforming human action by providing help only when needed.

05 Challenges and Limitations

The large-scale deployment of Reinforcement Learning has already begun with leading companies like Duolingo implementing it to optimise push notifications. In more traditional domains in education, it is still more difficult to implement.

Deployment challenges

RL algorithms return better results on larger data samples than on smaller ones. This is a result of the exploration-exploitation trade-off: with fewer states to explore, the algorithm finds the best option available faster, but the overall results might still be suboptimal. RL algorithms, like all data-driven methods, are extremely data hungry; therefore, it is also extremely important how the states are modeled for optimal results. In educational institutions, harnessing a huge amount of data might clash with privacy concerns. Effective data collection might also be a problem in developing countries, which can lead to worse results [14].

Privacy and ethical concerns

Machine learning methods are hungry for data; they need it to improve the models and be more accurate. If you interact with a chatbot, your data will be saved and processed to improve the algorithms — and often sold to third parties. You actually do not own your data on some online platforms, where you transfer your copyright when uploading something. Digital assistants like Amazon's “Alexa” have been shown to be exploitable to privacy risks and to listen even when deactivated [15].

In the context of e-learning, this raises huge issues. While interacting with online learning solutions, the algorithm learns your habits, how you respond to certain actions, and your consumer behaviour. Your actions will be internalized in the system and used to improve the results for other people. Ultimately, this can lead to a surveillance that should not be expected when interacting with an e-learning system. If these systems collect enough data about you, this potentially leads to identity theft or fraud. [16]outlines several security concerns and common privacy breaches by AI — data breaches, weak security protocols, third-party vendors, misuse, inadequate control of data. Other problems include the bias and inference risk these systems have on their training data, which can lead to worse results for minorities and favour certain demographics over others [17].

06 Opportunities and Future Directions

Hybrid models

In [18], the author presents the idea of blended learning. Blended learning emerged during the COVID-19 pandemic with the popularity of e-learning and “combines online with offline.” Hybrid models can be beneficial because they combine human interaction with the benefits of tech-driven personalization. Reinforcement learning can guide the online part of learning and adjust content in real time, and the offline learning part can be adjusted according to the suggestions of the models, with live learning data collected for further training. The author concludes this leads to increased autonomy and engagement, and higher cognitive, social, and emotional aspects. [19]also especially highlighted the increased flexibility in “time, pace, and mode” of hybrid learning models.

01
Online preparation
prior to the class · async, RL-driven content sequencing
02
Classroom learning
in-person · comprehensive, human-led
03
Feedback analysis
RL agent reads engagement + correctness data
04
Recapitulative recap
personalized review · feeds back into 01

Fig 04 · Blended learning workflow. The RL loop wraps offline classroom learning instead of replacing it.

Conclusion

The goal of this review was to showcase the potential and risks Reinforcement Learning has in e-learning systems. It was shown that RL has already been successfully integrated in deployed e-learning systems — millions of users already interact with it knowingly or unknowingly. Use cases range from curriculum sequencing to adaptive feedback, dynamic difficulty adjustment, and spaced repetition. While the potential is immense, the loss of control over one's own data is also an intriguing problem, and should not be forgotten.

Future implementations in education should focus on hybrid learning. A promising idea is blended learning, which combines offline with online learning and tries to optimise offline learning with online algorithms.

Reinforcement Learning offers a powerful framework for developing smarter learning systems — ones that focus more on the single person and can give more precise learning instructions. But its success will depend on its integration into educational theory and the human part of learning. Without total human interaction, other factors might be reduced.

07 References

Bicknell, K., Brust, C., & Settles, B. (2023). How Duolingo's AI learns what you need to learn. IEEE Spectrum. spectrum.ieee.org/duolingo
Sweta, S. (2021). Adaptive E-Learning System. Pages 13–24.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. 2nd ed., MIT Press.
Ghasemi, M., & Ebrahimi, D. (2024). Introduction to Reinforcement Learning. arXiv:2408.07712.
Tam, M. (2000). Constructivism, instructional design, and technology: Implications for transforming distance learning.Educational Technology & Society, 3.
Guskey, T. (2007). Closing achievement gaps: Revisiting Benjamin S. Bloom's “Learning for Mastery”. Journal of Advanced Academics, 19, 8–31.
Rafferty, A. N., Brunskill, E., Griffiths, T. L., & Shafto, P. (2011). Faster teaching by POMDP planning. Artificial Intelligence in Education, 280–287, Springer.
Chi, M., Vanlehn, K., Litman, D., & Jordan, P. (2011). Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Model. User-Adapt. Interact., 21, 137–180.
Rahimi, M., Moradi, H., Vahabie, A., & Kebriaei, H. (2023). Continuous reinforcement learning-based dynamic difficulty adjustment in a visual working memory game.
du Plooy, E., Casteleijn, D., & Franzsen, D. (2024). Personalized adaptive learning in higher education. Heliyon, 10(21), e39630.
Yancey, K. P., & Settles, B. (2020). A sleeping, recovering bandit algorithm for optimizing recurring notifications.Proceedings of KDD '20, 3008–3016. ACM.
Azhar, A. Z., Segal, A., & Gal, K. (2022). Optimizing representations and policies for question sequencing using reinforcement learning. Proceedings of EDM 2022.
Fahid, A. A., & Chi, M. T. H. (2021). Adaptively scaffolding cognitive engagement with batch constrained deep Q-networks. AIED 2021, LNCS 12748, 280–292.
Dake, D. (2023). Reinforcement learning in education 4.0: Open applications and deployment challenges. International Journal of Computer Science and Information Technology, 15, 47–61.
Bartneck, C., Lütge, C., Wagner, A., & Welsh, S. (2021). Privacy Issues of AI. Pages 61–70.
Paul, J. (2024). Privacy and data security concerns in AI.
Yulianti, S. (2025). The hidden bias in AI: How artificial intelligence reflects and reinforces social inequalities.
Ma, Z. (2023). Hybrid learning: A new learning model that connects online and offline. Journal of Education and Educational Research, 3, 130–132.
Ingabire, H., & Kiu Publication Extension. (2024). Hybrid learning models: Combining in-person and online education effectively. Pages 16–19.
Mon, B. F., Wasfi, A., Hayajneh, M., Slim, A., & Abu Ali, N. (2023). Reinforcement learning in education: A literature review. Informatics, 10(3), 74.