> Blog > Reinforcement Learning. You might find it helpful to read the original Deep Q Learning (DQN) paper. The Problem • Traffic congestion is estimated to cost Americans $121 billion in lost productivity fuel, and other costs. University of California, Berkeley Reinforcement Learning Overview We’ve focused a lot on supervised learning: Training examples: (x1, y1), (x2, y2), But consider a different type of learning problem, in which a robot has to learn to do tasks in a particular environment. t chooses action a in A. 840 views • 67 slides Mar 10, 2024 · This is a simplified description of a reinforcement learning problem. Kaelbling et al. Unlike previous works that pinpoint important features to the agent's current action, our explanation is at the step level. Common assumption #2: episodic learning. Deep recurrent neural networks can be used to encode the state-action Explore the freedom of writing and self-expression with Zhihu's column platform. Reinforcement Learning Russell and Norvig: Chapter 21 CMSC 421 – Fall 2006. The latter may be the most useful if you don't have all the right fonts installed. The goal is to learn a policy that maximizes the cumulative reward. Barto. Apr 1, 2019 · Reinforcement Learning. The document discusses reinforcement learning, which is a machine learning technique where an agent learns from interacting with an environment. 9 An Example of Reward Function. Final Presentation. 68 …. Alignment techniques such as supervised fine-tuning (\textit{SFT}) and reinforcement learning from human feedback (\textit{RLHF}) greatly reduce the required skill and domain knowledge to effectively harness the capabilities of LLMs Sep 26, 2012 · Reinforcement Learning. Mitchell. Sep 2, 2014 · Reinforcement Learning. The following recent papers and reports have a strong connection to material in my reinforcement learning books, and amplify on their analysis and its range of applications. Instead, positive means you are adding something, and negative means you are taking something away. Download presentation by click this link. PowerPoint Presentation. slideshare. Hand craft intermediate objectives that yield reward. Common assumption #3: continuity or smoothness. Encourage the right type of exploration. However, let’s go ahead and talk more about the difference between supervised, unsupervised, and reinforcement learning. Risks of learning to game the rewards. Two widely used learning model are 1) Markov Decision Process 2) Q learning. Download Presentation. Title: Reinforcement Learning. Reinforcement learning (RL) is a type of machine learning process that focuses on decision making by autonomous agents. Sep 16, 2012 · Temporal Difference Learning Idea: Use observed transitions to adjust values in observed states so that the comply with the constraint equation, using the following update rule: UΠ (s) UΠ (s) + α [ R (s) + γ UΠ (s’) - UΠ (s) ] α is the learning rate; γ discount rate Temporal difference equation. Assumed by some model-based RL methods. Markov assumption: rt = r(st, at) and st+1 = δ(st, at) depend only on current state and action. and Shelton, C. Mar 29, 2019 · Reinforcement Learning. Grid World • The agent lives in a grid • Walls block the agent’s Nov 28, 2012 · Reinforcement learning 2: action selection. The proof of Theorem 3 and the appendices are optional. a Apr 7, 2021 · Presenting this set of slides with name segments of reinforcement learning ppt powerpoint presentation complete deck with slides. Main Dimensions Model-based vs. Number of Views: 303. Computer Science | University of Illinois Chicago à‚ Title: bayes-rl-tutorial Author: Mohammad Ghavamzadeh Created Date: 6/20/2007 8:42:43 PM Nov 1, 2014 · Presentation Transcript. Reinforcement can be positive or negative, and punishment can also be Jan 2, 2020 · Reinforcement Learning. In this presentation, we introduce value function approximation and cover three different approaches to generating features for linear models. (2002). Teaching material from David Silver including video lectures is a great introductory course on RL. RL is a sub-field of Machine Learning where an agent interacts with an environment to achieve a goal, and learning takes place interaction-after-interaction. Instructor: Sergey Levine UC Berkeley. By representing policies as estimates of their expected values, we leverage techniques from contextual bandits to guide exploration and exploitation. Reinforcement learning methods provide a natural way of tackling the problem of optimal MM. 12k views • 86 slides Jun 12, 2024 · Two types of reinforcement learning are 1) Positive 2) Negative. , after 5, 7, 10, and 20 minutes). Gt Rt +1 + Rt +2 + = Rt +3 + ::: I We call this the return. It begins with an introduction to reinforcement learning concepts like Markov decision processes and value-based methods. I hope this example explained to you the major difference between reinforcement learning and other models. This class will provide a solid introduction to the field of reinforcement learning and students will learn about the core challenges and approaches, including Jul 26, 2014 · Final Presentation Traffic Light Control Using Reinforcement Learning Daniel Goldberg Andrew Elstein. , Navigate without crashing into anything Locate and retrieve an object Perform some multi-step manipulation of objects resulting in a desired Research Scientist Hado van Hasselt introduces the reinforcement learning course and explains how reinforcement learning relates to AI. Most existing deep face presentation attack detection approaches extract features from the entire image or several fixed regions. 5. , after 2, 4, 6, and 8 responses). Traffic Light Control Using Reinforcement Learning. Jan 26, 2021 · This presentation on Reinforcement Learning will help you understand the basics of reinforcement learning. Our templates showcase use cases of reinforcement learning across various industries. Data is streaming into learner x 1 ,y 1 , …, x n ,y n y i = f(x i ) 956 views • 44 slides Jul 26, 2014 · 510 likes | 817 Views. Reinforcement learning 2: action selection. Doubly robust estimators and other improved importance-sampling estimators: Jiang, N. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty. Random exploring will fall off of rope ~97% of the time. Slides: https://dpmd. Reinforcement learning I: (Wednesday) prediction classical conditioning dopamine Reinforcement learning II: dynamic programming; action selection sequential sensory decisions vigor. Passive Reinforcement Learning Ruti Glick Bar-Ilan university. The agent performs actions and receives rewards or penalties as feedback to learn which actions yield the best outcomes. Learn to map states to utilities. Can be mitigated by adding recurrence. It begins with an overview of reinforcement learning and how it differs from supervised and unsupervised learning. Reinforcement Learning Guest Lecturer: Chengxiang Zhai 15-681 Machine Learning December 6, 2001. You will learn the concepts like reward and states. Outline For Today • The Reinforcement Learning Problem • Markov Decision Process • Q-Learning • Summary. This approach generates a more varied set of payloads than existing Feb 23, 2024 · Reinforcement Learning PowerPoint Templates. Barto Second Edition (see here for the first edition) MIT Press, Cambridge, MA, 2018. The agent receives feedback in the form of rewards or penalties based on its actions. Task. (2016). Sections 1, 2, 4, and 5 and the proof of Theorem 1 in Section 3. Sep 11, 2014 · Presentation Transcript. The agent has a repertoire of actions, perceives states and learns a policy from its experiences in the form of rewards. finite set of states S; set of actions A. Introduction to Reinforcement Learning . Robots and self-driving cars are examples of autonomous agents. Control Learning. Aug 26, 2017 · Introduction to Reinforcement Learning, part III: Basic approximate methods This is the final presentation in a three-part series covering the basics of Reinforcement Learning (RL). most Atari Games, Backgammon, Go, DOTA2, and StarCraft II. In doing so, the agent tries to minimize wrong moves and maximize the right ones. Assume. Machine Learning Chapter 13. Mar 3, 2022 · It covers all the important concepts and has relevant templates which cater to your business needs. Introduction Presented by Alp Sardağ. It then discusses how to model reinforcement learning problems using Markov decision processes (MDPs) and partially observable Markov decision Oct 3, 2014 · Direct Utility Estimation Convert the problem to a supervised learning problem: (1,1) U = 0. • Roughly, the agent’s goal is to get as much reward as it can over the Nov 13, 2019 · RL Has Revolutionized AI Gameplay. 840 views • 67 slides Apr 24, 2021 · Reinforcement learning is a machine learning technique where an agent learns how to behave in an environment by receiving rewards or punishments for its actions. Given. , physical or biological laws) relative to the size of the feature space. Subrat Panda gave an introduction to reinforcement learning. This document provides an introduction to reinforcement learning. Reinforcement Learning • Basic idea: • Receive feedback in the form of rewards • Agent’s utility is defined by the reward function • Must learn to act so as to maximize expected rewards This slide deck courtesy of Dan Klein at UC Berkeley. Consider learning to choose actions, e. Does self learning through simulator. Week 0: Class Overview, Introduction. Overview. Agenda. Global plan. xml¬U oÚ0 ý Ò¾ƒ•ÿiøUFQi ´L“º ú \çB¬9¶e 4í»ïì$°®­Ö©“P|9ŸÏï½Ø óË]©Ø œ—F “ÎI;a Dec 24, 2011 · 460 likes | 845 Views. Peter Dayan. Assumed by some continuous value function learning methods. He defined reinforcement learning as dealing with agents that must sense and act upon their environment to receive delayed scalar feedback in the form of rewards. We study pre-training representations for decision-making using video data, which is abundantly available for tasks such as game agents and software testing. E. Reinforcement is a key concept in behaviorism, a school of psychology that emphasizes the role of the environment in shaping behavior. This relaxes the assumption that intervening experts in interactive imitation learning should be near-optimal and enables the algorithm to learn behaviors that improve over the potential suboptimal human expert. Objective(s) of Reinforcement Learning. and Li, L. [Infants don’t get to “simulate” the world since they neither have T(. Are you interested in learning AI & ML from Reinforcement Learning: An Introduction Richard S. ) nor R(. Supervised vs Unsupervised Learning. Reinforcement Learning IT Powerpoint Presentation Slides. 10 The Goal in Reinforcement Learning. However, the design of effective PDRs is a tedious task, requiring a myriad of specialized knowledge and often delivering limited performance. Learn about reinforcement learning from Berkeley AI's lecture slides, covering topics such as Q-learning, exploration and policy iteration. Online learning Reinforcement learning Model-free vs. We also provide a unified Oct 23, 2020 · Priority dispatching rule (PDR) is widely used for solving real-world Job-shop scheduling problem (JSSP). Powerpoint slides for teaching each chapter of the book have been prepared and made available by Professor Barto. That is, the agent is given: S: A set of all states the agent could encounter. In Reinforcement Learning (RL), agents are trained on a reward and punishment mechanism. Today’s Lecture. Indicates how well agent is doing at step t — defines the goal. Traffic congestion is estimated to cost Americans $121 billion in lost productivity fuel, and other costs. •Schulman, Abbeel, Chen. This complete deck has PPT slides on Reinforcement Learning In AI Powerpoint Presentation Slide Templates Complete Deck with well suited graphics and subject driven content. Solving for the optimal policy: Q-learning 37 Q-learning: Use a function approximator to estimate the action-value function If the function approximator is a deep neural network => deep q-learning! function parameters (weights) See full list on rail. Incremental (“Online”) Function Learning. We exploit the disjunctive graph About This Presentation. 2k views • 74 slides In operant conditioning, positive and negative do not mean good and bad. Reinforcement Learning Applications in Robotics. Reinforcement learning is based on the reward hypothesis: Any goal can be formalized as the outcome of maximizing a cumulative Mar 19, 2018 · Reinforcement Learning-An Introduction, a book by the father of Reinforcement Learning- Richard Sutton and his doctoral advisor Andrew Barto. 1. Peshkin, L. Active Passive vs. An autonomous agent is any system that can make decisions and act in response to its environment independent of direct instruction by a human user. First of all, let us understand the reinforcement learning framework which contains several important terms: Agent represents an object whose goal is to learn a strategy to optimize a certain process; Environment acts as a world in which the agent is located and consists of a set of different states; Jan 3, 2020 · Reinforcement Learning. active learning Exploration-exploitation tradeoff. ) of their world]. In Reinforcement Learning, the agent Each step ~50% probability of going wrong way – P(reaching goal) ~ 0. 840 views • 67 slides Apr 18, 2017 · Reinforcement learning is one powerful paradigm for doing so, and it is relevant to an enormous range of tasks, including robotics, game playing, consumer modeling and healthcare. Bertsekas, D. This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v1 task from Gymnasium. Description: Title: A Short Trip to Reinforcement Learning Author: bibhas Last modified by: SAM Created Date: 11/12/2004 10:07:57 PM Document presentation format – PowerPoint PPT presentation. 2. In this paper, we propose a salience-aware face presentation attack detection (SAFPAD) approach, which takes advantage of deep reinforcement learning to exploit the salient local part information in face images. 1 of 13. (2015). What if we want to learnthe reward function from observing an expert, and then use reinforcement learning? 3. berkeley. Tom M. Any Situation in which both the inputs and outputs of a component can be perceived is called Supervised Learning. Traffic Lights are imperfect and contribute to this. It then describes Concept-Network Reinforcement Learning which decomposes complex tasks into high-level concepts or actions. Learn to take correct actions over time by experience Similar to how humans learn: “trial and error” Try an action – “see” what happens. Finally, you'll look at a demo on Tic Tac Toe using Reinforcement Learning in Python. Our proposed method uses reinforcement learning with user intervention signals themselves as rewards. model-based Passive vs. The agent’s job is to maximize cumulative reward. This deck consists of total of ninty eight slides. Equivalence between policy gradients and soft Q-learning. at each discrete time agent observes state st in S and. Buy from Amazon Errata and Notes Full Pdf Trimmed for viewing on computers (latest release April 26, 2022) Code Nov 13, 2014 · Presentation Transcript. Reinforcement is delivered at unpredictable time intervals (e. Inverse Reinforcement Learning. Reward Shaping. Policy Gradient Algorithms RL for Quadrupal Locomotion PEGASUS Algorithm Autonomous Helicopter Flight High Speed Obstacle Avoidance RL for Biped Locomotion Poincare-Map RL Dynamic Planning. Most of you… Jan 6, 2020 · Temporal Difference Learning (Cont’) Subtleties and Ongoing Research. He described key concepts like the Markov decision process framework, value functions, Q-functions, exploration vs Apr 1, 2019 · Reinforcement Learning. E0397 Lecture Slides by Richard Sutton (With small changes). Submission Number: 6169. Reinforcement learning can be applied to problems like game playing, robot control, scheduling Oct 22, 2021 · Finally, we emphasize that the majority of current works (13/22 or 59%) completely neglect trading fees and other market frictions, diminishing the practical importance of such approaches to a certain degree. , "Model Predictive Control, and Reinforcement Learning: A Unified Framework Based on Dynamic Programming," To be published in IFAC NMPC, March, 2024. An online draft of the book is available here . 840 views • 67 slides Oct 17, 2018 · Reinforcement learning is one of the most discussed, followed and contemplated topics in artificial intelligence (AI) as it has the potential to transform most businesses. CS 285. Data-efficient off-policy policy evaluation for reinforcement 2. then receives immediate reward rt & state changes to st+1. 01%. The Checker Problem Revisited • Goal: To win every game! Lecture 1: Introduction to Reinforcement Learning The RL Problem Reward Rewards Areward R t is a scalar feedback signal Indicates how well agent is doing at step t The agent’s job is to maximise cumulative reward Reinforcement learning is based on thereward hypothesis De nition (Reward Hypothesis) All goals can be described by the In this paper we develop SQIRL, a novel approach to detecting SQL injection vulnerabilities based on deep reinforcement learning, using multiple worker agents and grey-box feedback. Types of reinforcement include positive and negative reinforcement. Reinforcement Learning (RL) is a class of algorithms that solve a Markov Decision Process. Supervised learning classification, regression Unsupervised learning clustering, dimensionality reduction Reinforcement learning generalization of supervised learning learn from interaction w/ environment to achieve a goal. The goal of the agent is to learn an optimal policy that maximizes long-term rewards. EL PROBLEMA DE REINFORCEMENT LEARNING • El agenteinteractua con el entorno. δ and r may be nondeterministic. High response rate with pauses after reinforcement Assignments. So far: manually design reward function to define a task 2. Often assumed by pure policy gradient methods. • Roughly, the agent’s goal is to get as much reward as it can over the Sep 1, 2023 · 10 Real-Life Applications of Reinforcement Learning. Homework 4: Model-based reinforcement learning. 72 (2,1) U = 0. Aligning large language models (LLMs) with human preferences has proven to drastically improve usability and has driven rapid adoption as demonstrated by ChatGPT. Nov 29, 2023 · Answer: Reinforcement learning (RL) is a type of machine learning where an agent learns to make a sequence of decisions by interacting with an environment. A detailed presentation of RL can be found in . eecs. The agent is rewarded for correct moves and punished for the wrong ones. In this paper, we propose to automatically learn PDRs via an end-to-end deep reinforcement learning agent. Learning from scarce experience. The agent has to decide between two actions - moving the cart left or right - so that the pole attached to it stays upright. Jan 5, 2020 · r r r . • Search the space of functions that asses values to behaviors (Q-learning). The topics discussed in these slides are management, finance, marketing, planning. Evolutionary Reinforcement Learning Systems Presented by Alp Sardağ. Reinforcement learning with deep energy based models: soft Q-learning algorithm, deep RL with continuous actions and soft optimality •Nachum, Norouzi, Xu, Schuurmans. Reinforcement Learning. Goal • Two main branches of reinforcement learning: • Search the space of functions that asses values to utility of states. Reinforcement means you are increasing a behavior, and punishment means you are decreasing a behavior. These slides will give you a killer overview of reinforcement learning. Reinforcement Learning: An Introduction by Richard S. Reinforcement Learning Presented by: Bibhas Chakraborty and Lacey Gunter. Mar 18, 2019 · Presentation Transcript. Jan 16, 2024 · Primary Area: reinforcement learning. It is quite different from supervised machine learning algorithms, where we need to ingest and process that data. t +1 t +2 s s t +3 s s t +1 t +2 t +3 a a t a a t +1 t +2 t t +3 The Agent-Environment Interface Reinforcement Learning. What is Machine Learning? • A method to learn about some phenomenon from data, when there is little scientific theory (e. UCB: Finite-time Analysis of the Multiarmed Bandit Problem. In this SlideShare, I want to provide a simple guide that explains reinforcement learning and give you some practical examples of how it is used today. An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. , Robot learning to dock on battery charger Learning to choose actions to optimize factory output Learning to play Backgammon Slideshow 9640360 Apr 1, 2023 · This document discusses deep reinforcement learning and concept network reinforcement learning. Reinforcement Learning • supervised learning is simplest and best-studied type of learning • another type of learning tasks is learning behaviors when we don’t have a teacher to tell us how • the agent has a task to perform; it takes Dec 14, 2021 · Abstract. Post Graduate Program in AI and Machine Learning: Ranked #1 AI and Machine Learning . Daniel Goldberg Andrew Elstein. Mark Towers. Moderate yet steady response rate: Checking social media: Fixed ratio: Reinforcement is delivered after a predictable number of responses (e. Markov Decision Processes. and Brunskill, E. • Describe the evolutionary algorithm Reinforcement learning (RL) is the part of the machine learning ecosystem where the agent learns by interacting with the environment to obtain the optimal strategy for achieving the goals. It models the relationship between the final reward and the key steps that a DRL agent takes, and thus Apr 6, 2019 · Reinforcement Learning. Active Passive: Assume the agent is already following a policy (so there is no action choice to be made; you just need to learn the state values and may be action model) Active: Need to learn both the optimal policy and the state values Mar 28, 2019 · Presentation Transcript. Each worker intelligently fuzzes the input fields discovered by an automated crawling component. This section introduces some basic mechanisms and terminology. Reinforcement Learning method works on interacting with the environment, whereas the supervised learning method works on given sample data or example. Resources for Reinforcement Learning: Theory and Practice. Bridging the gap between value and policy based reinforcement learning. Conclusions. In this work, we propose AIRS, a general framework to explain deep reinforcement learning-based security applications. Gerhard Neumann, Seminar A, SS 2006. The Agent Learns a Policy • Reinforcement learning methods specify how the agent changes its policy as a result of experience. You will also understand about actions in reinforcement learning. These are available as powerpoint files and as postscript files. Specifically, it provides definitions of Apr 2, 2019 · DEFINICIÓN Reinforcement Learning is the problem faced by an autonomousagentthat learns behavior through trial-and-errorinteractions with a dynamicenvironment. Thomas, P. Particularly, embedding a policy network into a linear feature space allows Reinforcement Learning (RL) is a mathematical framework for problem solving that implies goal-directed interactions of an agent with its environment. Previous Lectures. net/slideshow/reinforcement-learning-40052403/40052403 Apr 9, 2024 · Reinforcement learning framework. Learning when there is no hint at all about correct outputs is called. Reinforcement learning does not require data. It has professionally designed templates with relevant visuals and subject driven content. Chapter 13: Reinforcement Learning. Problem: utilities are not independent of each other! Incorrect formula replaced on March 10, 2006 Bellman Equation Utility values obey the following equations: U (s) = R (s) + γ*maxaΣs’T (s,a Dec 3, 2023 · AI-enhanced description. Passive Reinforcement Learning • We will assume full observation • Agent has a fix policy π • Always executes π (s) • Goal – to learn how good the policy is • similar to policy evaluation • But – doesn’t have all the knowledge • Doesn Mar 15, 2019 · Reinforcement Learning with Hidden State ot ot+2 ot+1 at at+1 at+2 st st+1 st+2 rt+1 rt+2 • Learning in a POMDP, or k-Markov environment • Planning in POMDPs is intractable • Factored POMDPs look promising • Policy search can work well May 31, 2023 · Representation-Driven Reinforcement Learning. Sep 12, 2018 · Dr. Final project: Research-level project of your choice (form a group of up to 2-3 students, you’re welcome to Jul 27, 2014 · Reinforcement Learning Slides for this part are adapted from those of Dan Klein@UCB. . 841 views • 67 slides We present a representation-driven framework for reinforcement learning. Homework 1: Imitation learning (control via supervised learning) Homework 2: Policy gradients (“REINFORCE”) Homework 3: Q learning with convolutional neural networks. 11 Discounted Cumulative Reward. Slides from week 0: pdf. • Traffic Lights are imperfect and contribute to this • Usually statically controlled • A better method of https://www. Particularly, embedding a policy network into a linear feature space allows us to reframe the exploration-exploitation May 7, 2023 · In psychology, reinforcement refers to a process where behavior is strengthened or increased by the presentation or removal of a stimulus. However, the discriminative information beneficial for A reward Rt is a scalar feedback signal. edu Nov 14, 2020 · Basics of Reinforcement Learning with Real-World Analogies and a Tutorial to Train a Self-Driving Cab to pick up and drop off passengers at right destinations using Python from Scratch. The Problem. (2017). In this paper, we propose a salience-aware face presentation attack detection (SAFPAD) approach, which takes advantage of deep reinforcement learning to exploit the salient local part Jan 18, 2014 · Edward Balaban. INTRODUCTION (1) • agent, state, actions, policy • 주제 • 이러한 agents가 그들이 처한 환경에서 행동함으로써 어떻게 성공적인 제어 정책을 학습할 수 있는가 • agent의 목표는 reward함수에 의하여 정의 됨 • 제어 정책 May 16, 2012 · Traffic Light Control Using Reinforcement Learning. Sutton and Andrew G. 1. Peter Bodík. Megha Sharma. Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. Model-free Passive vs. 841 views • 67 slides Generally assumed by value function fitting methods. ZØ ppt/slides/slide12. (thanks to Nathaniel Daw). This presentation deck has total of sixty six slides. Oct 3, 2014 · Presentation Transcript. Even though significant empirical advances have been made on this problem, a theoretical understanding remains absent. Games: S is all configurations of pixels that the game engine can render. R. This complete presentation has PPT slides on wide range of topics highlighting the core areas of your business needs. g. We present a representation-driven framework for reinforcement learning. Dec 20, 2019 · Presentation Transcript. Jan 4, 2020 · Reinforcement Learning. Spice up your presentations on reinforcement learning without breaking a sweat by getting your hands on these editable templates. Read more. Requires custom human work. Reinforcement Learning: Basic Idea. This is a completely editable PowerPoint presentation and is available for immediate download. 491 views • 16 Lecture 15: Offline Reinforcement Learning (Part 1) Lecture 16: Offline Reinforcement Learning (Part 2) Lecture 17: Reinforcement Learning Theory Basics; Lecture 18: Variational Inference and Generative Models; Lecture 19: Connection between Inference and Control; Lecture 20: Inverse Reinforcement Learning; Lecture 21: RL with Sequence Models Jul 30, 2014 · Introduction to Reinforcement Learning. 1996 REINFORCEMENT LEARNING. Doubly robust off-policy value evaluation for reinforcement learning. Goal: learn to choose actions that maximize: r0 + r1 + 2 r2 + … , where 0 < 1 The discount factor is used to exponentially decrease the weight of reinforcements received in the future It’s called: Discounted Cumulative Reward. Slides for this part are adapted from those of Dan Klein@UCB. qo vw la om eh mj fl kl kn gq