i need help with 2 questions 3 and 4 NOT 5. Robotics subject
3. Reinforcement Learning
Reinforcements learning (RL) agents learn by taking state-dependent actions and experiencing reward arising from interaction with their environments. One method is to use a table-based Q-learning algorithm.
Figure 1: The inverted pendulum problem
Q-learning tables are discrete, but most real-world tasks involve systems that have continuous states and are controlled using continuous actions. With this in mind, consider how a table-based Q-learning algorithm could learn to balance an inverted pendulum (as shown in Fig. 1). To achieve this:
(a) Describe a suitable reward function.
(b) Describe a suitable choice of states and explain why they are appropriate.
(c) Describe a suitable choice of actions and explain why they are appropriate and how they relate to the states discussed in part (a).
(d) Discuss how an inverted pendulum task could be either an MDP or a POMDP. [2 marks]
Question 3 continued …
Question 3 continued
(e) Discuss how simulated experience generated from a model within a RL agent can increase the speed with which the RL algorithm convergence. How can this assist finding a solution in the inverted pendulum task?
(f) Dyna-Q algorithm is one such model-based approach to RL. Using high-level pseudo code in no more than 12 lines, describe the operation of the Dyna-Q algorithm and describe all its key terms.
4. State estimation
(a) When building a full state feedback controller, why is if often necessary to use some form of state estimator?
(b) The Luenberger observer is a deterministic state estimator. Draw its signal flow graph to illustrate its operation and explain the design and function of the Luenberger gain L.
(c) The Kalman filter is a stochastic state estimator. Draw and compare a signal flow graph of the Kalman estimator with that of the Luenberger observer, illustrating all the Kalman estimator’s important components, including its noise sources.
Question 4 continued …
Question 4 continued
(d) The Kalman filter iteratively computes 5 variables as illustrated below
Write a short paragraph on each of the terms 1 – 5 to explain their meaning and function.
5. Gaussian processes
Describe the main difference between using Gaussian Processes and Support Vector Machines in approximating linear functions.