i need help with 2 questions 3 and 4 NOT 5. Robotics subject

3. Reinforcement Learning

Reinforcements learning (RL) agents learn by taking state-dependent actions and experiencing reward arising from interaction with their environments. One method is to use a table-based Q-learning algorithm.

Figure 1: The inverted pendulum problem

Q-learning tables are discrete, but most real-world tasks involve systems that have continuous states and are controlled using continuous actions. With this in mind, consider how a table-based Q-learning algorithm could learn to balance an inverted pendulum (as shown in Fig. 1). To achieve this:

(a) Describe a suitable reward function.

[3 marks]

(b) Describe a suitable choice of states and explain why they are appropriate.

[3 marks]

(c) Describe a suitable choice of actions and explain why they are appropriate and how they relate to the states discussed in part (a).

[3 marks]

(d) Discuss how an inverted pendulum task could be either an MDP or a POMDP. [2 marks]

Question 3 continued …

Question 3 continued

(e) Discuss how simulated experience generated from a model within a RL agent can increase the speed with which the RL algorithm convergence. How can this assist finding a solution in the inverted pendulum task?

[4 marks]

(f) Dyna-Q algorithm is one such model-based approach to RL. Using high-level pseudo code in no more than 12 lines, describe the operation of the Dyna-Q algorithm and describe all its key terms.

[5 marks]

4. State estimation

(a) When building a full state feedback controller, why is if often necessary to use some form of state estimator?

[3 marks]

(b) The Luenberger observer is a deterministic state estimator. Draw its signal flow graph to illustrate its operation and explain the design and function of the Luenberger gain L.

[3 marks]

(c) The Kalman filter is a stochastic state estimator. Draw and compare a signal flow graph of the Kalman estimator with that of the Luenberger observer, illustrating all the Kalman estimator’s important components, including its noise sources.

[4 marks]

Question 4 continued …

Question 4 continued

(d) The Kalman filter iteratively computes 5 variables as illustrated below

Write a short paragraph on each of the terms 1 – 5 to explain their meaning and function.

[10 marks]

5. Gaussian processes

Describe the main difference between using Gaussian Processes and Support Vector Machines in approximating linear functions.

[20 marks]

3. Reinforcement Learning

Reinforcements learning (RL) agents learn by taking state-dependent actions and experiencing reward arising from interaction with their environments. One method is to use a table-based Q-learning algorithm.

Figure 1: The inverted pendulum problem

Q-learning tables are discrete, but most real-world tasks involve systems that have continuous states and are controlled using continuous actions. With this in mind, consider how a table-based Q-learning algorithm could learn to balance an inverted pendulum (as shown in Fig. 1). To achieve this:

(a) Describe a suitable reward function.

[3 marks]

(b) Describe a suitable choice of states and explain why they are appropriate.

[3 marks]

(c) Describe a suitable choice of actions and explain why they are appropriate and how they relate to the states discussed in part (a).

[3 marks]

(d) Discuss how an inverted pendulum task could be either an MDP or a POMDP. [2 marks]

Question 3 continued …

Question 3 continued

(e) Discuss how simulated experience generated from a model within a RL agent can increase the speed with which the RL algorithm convergence. How can this assist finding a solution in the inverted pendulum task?

[4 marks]

(f) Dyna-Q algorithm is one such model-based approach to RL. Using high-level pseudo code in no more than 12 lines, describe the operation of the Dyna-Q algorithm and describe all its key terms.

[5 marks]

4. State estimation

(a) When building a full state feedback controller, why is if often necessary to use some form of state estimator?

[3 marks]

(b) The Luenberger observer is a deterministic state estimator. Draw its signal flow graph to illustrate its operation and explain the design and function of the Luenberger gain L.

[3 marks]

(c) The Kalman filter is a stochastic state estimator. Draw and compare a signal flow graph of the Kalman estimator with that of the Luenberger observer, illustrating all the Kalman estimator’s important components, including its noise sources.

[4 marks]

Question 4 continued …

Question 4 continued

(d) The Kalman filter iteratively computes 5 variables as illustrated below

Write a short paragraph on each of the terms 1 – 5 to explain their meaning and function.

[10 marks]

5. Gaussian processes

Describe the main difference between using Gaussian Processes and Support Vector Machines in approximating linear functions.

[20 marks]

Assessment 3ASSESSMENTThis module is assessed through a portfolio submission which comprises 70% of individual coursework and 30% of a group presentation. The group members are expected to work together...powerpointwill need discussion to startENGINEERING DESIGN FOR INNOVATIONASSESSMENT BRIEF | 1CWK100This is an assignment for students on the following postgraduate programmes:• MSc Engineering Smart Systems•...2.2.2 Formative 1Week 7 Find a gap in practice for evidence translation - Formative Assessment 1Introduction to the Formative assessment Upon completing this lesson, you should be able to: • Find a knowledge...Global Business Environment Deadline: 29th October 2021Assessment GuidelinesWord Count: 500 (+-10%)Scenario:You are to select a UK company and conduct a PESTEL analysis on the firms operating environment...Assessment: Individual portfolio up to 4000 words or equivalent This assignment is to develop either a brand brief for a new brand OR a brand audit and recommendations for a failing brand. The portfolio...Assessment: Individual Coursework Assessment methods which enable students to demonstrate the learning outcomes for the module: Weighting: Learning Outcomes demonstrated Individual Coursework (3,000 words...Dissertation (All chapters) - Indicative table of contents provided in the brief.Topic - ESG investing and portfolio perfomancePlease refer to the brief attached for more information.Total word count -...**Show All Questions**