21_reinforcement_learning #27

sayehjarollahi · 2022-01-07T20:27:18Z

No description provided.

sayehjarollahi · 2022-01-07T20:33:52Z

notebooks/21_reinforcement_learning/index.md

sinatav · 2022-01-16T23:09:55Z

notebooks/21_reinforcement_learning/index.md

+_**"... What we want is a machine that can learn from experience." -Alan Turing, 1947**_
+
+
+Reinforcement learning is a subcategory of Machine Learning which solves problems that involve learning what to do—how to


don't use unnecessary "that"s in a formal text... for example: "which solves problems that involve" can be "which solves problems involving", if you don't like the sound of "involving learning," consider reconstructing =))))))

sinatav · 2022-01-16T23:16:43Z

notebooks/21_reinforcement_learning/index.md

+<div id='TypesofReinforcementLearningAccordingtoLearningPolicy'/>
+
+## Types of Reinforcement Learning According to Learning Policy
+Ther are two types of RL according to Learning policy: **Passive Reinforcement Learning**, **Active Reinforcement Learning**. In below, we explain both methods.


Avoid typos, please... also, ameliorate comprehension by changing the structure: "According to Learning policy, there are two types of RL: Passive Reinforcement Learning and Active Reinforcement Learning."

Do you mean that we should only change the structure of that specific sentence?

sinatav · 2022-01-16T23:20:48Z

notebooks/21_reinforcement_learning/index.md

+<div id='ActiveReinforcementLearning'/>
+
+## Active Reinforcement Learning
+Active reinforcement learning is when the policy of the agent is not fixed and can change during the time of training. In this method exploration and exploition is done. Exploration refers to trying new actions that are rarely done and checking if they have a bigger reward. Exploitation refers to keep doing the optimal action at each state. _Q-learning_ is one of the active RL algorithms. 


So basically revise your grammar :)))
"policy of the agent": "the agent's policy"
"exploition": typo
is done: are done

sinatav · 2022-01-16T23:22:46Z

notebooks/21_reinforcement_learning/index.md

+
+<div id='Definition'/>
+
+**Definition:** In this method, the agent executes a sequence of trials or runs (sequences of state-action transitions that continue until the agent reaches the terminal state). Each trial gives a sample value and the agent estimates the utility based on the samples values. This can be calculated as running averages of sample values.


punctuation...

sinatav · 2022-01-16T23:26:03Z

notebooks/21_reinforcement_learning/index.md

+
+<div id='TemporalDifference(TD)Learning'/>
+
+# Temporal Difference (TD) Learning


Provide the reader with some preparation & introduction before jumping to a topic. Work a bit more on the structure of your notes

Should we add introduction before each subtopic? We have already divided the main topic into subtopics and there is an introduction at the beginning of the lecture note . There is also a definition at the beginning of some subtopics.

sinatav · 2022-01-16T23:27:07Z

notebooks/21_reinforcement_learning/index.md

+
+<div id='Conclusion'/>
+
+# Conclusion


change this to summary & conclusion

sinatav · 2022-01-16T23:34:23Z

notebooks/21_reinforcement_learning/index.md

+
+<div id='SummaryofDiscussedRLMethods'/>
+
+### Summary of Discussed RL Methods


Don't just use bullets to point out things that were mentioned before... do sth like "In this article we FILANed & DASTANed. We learned FILAN because... So..." (disclaimer: I don't mean you should lose the bullets) this is a very critical part of your work

sinatav · 2022-01-16T23:35:31Z

notebooks/21_reinforcement_learning/index.md

+* Aircraft control and robot motion control
+
+<div id='resources'/>
+


you can have an "Other useful links" section and provide some interesting material related to these topics (just a suggestion)

sinatav · 2022-01-16T23:38:17Z

notebooks/21_reinforcement_learning/index.md

+<div id='ProblemwithTD'/>
+
+## Problem with TD
+All we want is to find the best policy that suits us. Although TD agent finds the value of each state (A value that converges to the real value during the time), it cannot find the best policy because for finding that and doing one-step expectimax, $T$ and $R$ functions are needed. However, in RL, they are not available. Therefore, a new method is needed which is called **Q-Learning**.


okay I know I should stop with the English-related comments, but the last line is bugging me :)))) how about "a new method is required, called..."

sinatav

first review

sayehjarollahi added 4 commits January 7, 2022 23:42

initialize the project

eb7e253

edit index.md

ea618a3

add newline

11301d5

insert dir name to index.md

4da8923

sayehjarollahi added 6 commits January 8, 2022 00:54

edit index.yml

115b841

edit publish_dir in main.yml

70c6d58

fix index.yml

51a93f2

add people to metadata

c019aab

delete \t

f8e118e

edit index after building ln

31eb133

sararajabzadeh requested a review from sinatav January 12, 2022 15:38

sinatav reviewed Jan 16, 2022

View reviewed changes

notebooks/21_reinforcement_learning/index.md Outdated Show resolved Hide resolved

sinatav reviewed Jan 16, 2022

View reviewed changes

notebooks/21_reinforcement_learning/index.md Outdated

<div id='Conclusion'/>

# Conclusion

Copy link

sinatav Jan 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to summary & conclusion

sinatav reviewed Jan 16, 2022

View reviewed changes

Sayeh Jarollahi added 3 commits January 28, 2022 19:06

fix errors

90d3889

fix types of RL

d1b711d

add useful links

9fdb4f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

21_reinforcement_learning #27

21_reinforcement_learning #27

sayehjarollahi commented Jan 7, 2022

sayehjarollahi commented Jan 7, 2022

sinatav Jan 16, 2022

sinatav Jan 16, 2022

sayehjarollahi Jan 27, 2022

sinatav Jan 16, 2022

sinatav Jan 16, 2022

sinatav Jan 16, 2022

sayehjarollahi Jan 27, 2022

sinatav Jan 16, 2022

sinatav Jan 16, 2022

sinatav Jan 16, 2022

sinatav Jan 16, 2022

sinatav left a comment

		_"... What we want is a machine that can learn from experience." -Alan Turing, 1947_


		Reinforcement learning is a subcategory of Machine Learning which solves problems that involve learning what to do—how to


		<div id='Definition'/>

		Definition: In this method, the agent executes a sequence of trials or runs (sequences of state-action transitions that continue until the agent reaches the terminal state). Each trial gives a sample value and the agent estimates the utility based on the samples values. This can be calculated as running averages of sample values.


		<div id='TemporalDifference(TD)Learning'/>

		# Temporal Difference (TD) Learning


		<div id='SummaryofDiscussedRLMethods'/>

		### Summary of Discussed RL Methods

		* Aircraft control and robot motion control

		<div id='resources'/>

21_reinforcement_learning #27

Are you sure you want to change the base?

21_reinforcement_learning #27

Conversation

sayehjarollahi commented Jan 7, 2022

sayehjarollahi commented Jan 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sinatav left a comment

Choose a reason for hiding this comment