-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
21_reinforcement_learning #27
base: master
Are you sure you want to change the base?
Conversation
_**"... What we want is a machine that can learn from experience." -Alan Turing, 1947**_ | ||
|
||
|
||
Reinforcement learning is a subcategory of Machine Learning which solves problems that involve learning what to do—how to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use unnecessary "that"s in a formal text... for example: "which solves problems that involve" can be "which solves problems involving", if you don't like the sound of "involving learning," consider reconstructing =))))))
<div id='TypesofReinforcementLearningAccordingtoLearningPolicy'/> | ||
|
||
## Types of Reinforcement Learning According to Learning Policy | ||
Ther are two types of RL according to Learning policy: **Passive Reinforcement Learning**, **Active Reinforcement Learning**. In below, we explain both methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid typos, please... also, ameliorate comprehension by changing the structure: "According to Learning policy, there are two types of RL: Passive Reinforcement Learning and Active Reinforcement Learning."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that we should only change the structure of that specific sentence?
<div id='ActiveReinforcementLearning'/> | ||
|
||
## Active Reinforcement Learning | ||
Active reinforcement learning is when the policy of the agent is not fixed and can change during the time of training. In this method exploration and exploition is done. Exploration refers to trying new actions that are rarely done and checking if they have a bigger reward. Exploitation refers to keep doing the optimal action at each state. _Q-learning_ is one of the active RL algorithms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically revise your grammar :)))
"policy of the agent": "the agent's policy"
"exploition": typo
is done: are done
|
||
<div id='Definition'/> | ||
|
||
**Definition:** In this method, the agent executes a sequence of trials or runs (sequences of state-action transitions that continue until the agent reaches the terminal state). Each trial gives a sample value and the agent estimates the utility based on the samples values. This can be calculated as running averages of sample values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
punctuation...
|
||
<div id='TemporalDifference(TD)Learning'/> | ||
|
||
# Temporal Difference (TD) Learning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide the reader with some preparation & introduction before jumping to a topic. Work a bit more on the structure of your notes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add introduction before each subtopic? We have already divided the main topic into subtopics and there is an introduction at the beginning of the lecture note . There is also a definition at the beginning of some subtopics.
|
||
<div id='Conclusion'/> | ||
|
||
# Conclusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this to summary & conclusion
|
||
<div id='SummaryofDiscussedRLMethods'/> | ||
|
||
### Summary of Discussed RL Methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't just use bullets to point out things that were mentioned before... do sth like "In this article we FILANed & DASTANed. We learned FILAN because... So..." (disclaimer: I don't mean you should lose the bullets) this is a very critical part of your work
* Aircraft control and robot motion control | ||
|
||
<div id='resources'/> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can have an "Other useful links" section and provide some interesting material related to these topics (just a suggestion)
<div id='ProblemwithTD'/> | ||
|
||
## Problem with TD | ||
All we want is to find the best policy that suits us. Although TD agent finds the value of each state (A value that converges to the real value during the time), it cannot find the best policy because for finding that and doing one-step expectimax, $T$ and $R$ functions are needed. However, in RL, they are not available. Therefore, a new method is needed which is called **Q-Learning**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay I know I should stop with the English-related comments, but the last line is bugging me :)))) how about "a new method is required, called..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first review
No description provided.