Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

21_reinforcement_learning #27

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

sayehjarollahi
Copy link

No description provided.

@sayehjarollahi
Copy link
Author

@sinatav

_**"... What we want is a machine that can learn from experience." -Alan Turing, 1947**_


Reinforcement learning is a subcategory of Machine Learning which solves problems that involve learning what to do—how to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use unnecessary "that"s in a formal text... for example: "which solves problems that involve" can be "which solves problems involving", if you don't like the sound of "involving learning," consider reconstructing =))))))

<div id='TypesofReinforcementLearningAccordingtoLearningPolicy'/>

## Types of Reinforcement Learning According to Learning Policy
Ther are two types of RL according to Learning policy: **Passive Reinforcement Learning**, **Active Reinforcement Learning**. In below, we explain both methods.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid typos, please... also, ameliorate comprehension by changing the structure: "According to Learning policy, there are two types of RL: Passive Reinforcement Learning and Active Reinforcement Learning."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that we should only change the structure of that specific sentence?

<div id='ActiveReinforcementLearning'/>

## Active Reinforcement Learning
Active reinforcement learning is when the policy of the agent is not fixed and can change during the time of training. In this method exploration and exploition is done. Exploration refers to trying new actions that are rarely done and checking if they have a bigger reward. Exploitation refers to keep doing the optimal action at each state. _Q-learning_ is one of the active RL algorithms.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically revise your grammar :)))
"policy of the agent": "the agent's policy"
"exploition": typo
is done: are done


<div id='Definition'/>

**Definition:** In this method, the agent executes a sequence of trials or runs (sequences of state-action transitions that continue until the agent reaches the terminal state). Each trial gives a sample value and the agent estimates the utility based on the samples values. This can be calculated as running averages of sample values.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punctuation...


<div id='TemporalDifference(TD)Learning'/>

# Temporal Difference (TD) Learning
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provide the reader with some preparation & introduction before jumping to a topic. Work a bit more on the structure of your notes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add introduction before each subtopic? We have already divided the main topic into subtopics and there is an introduction at the beginning of the lecture note . There is also a definition at the beginning of some subtopics.


<div id='Conclusion'/>

# Conclusion
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change this to summary & conclusion


<div id='SummaryofDiscussedRLMethods'/>

### Summary of Discussed RL Methods
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't just use bullets to point out things that were mentioned before... do sth like "In this article we FILANed & DASTANed. We learned FILAN because... So..." (disclaimer: I don't mean you should lose the bullets) this is a very critical part of your work

* Aircraft control and robot motion control

<div id='resources'/>

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have an "Other useful links" section and provide some interesting material related to these topics (just a suggestion)

<div id='ProblemwithTD'/>

## Problem with TD
All we want is to find the best policy that suits us. Although TD agent finds the value of each state (A value that converges to the real value during the time), it cannot find the best policy because for finding that and doing one-step expectimax, $T$ and $R$ functions are needed. However, in RL, they are not available. Therefore, a new method is needed which is called **Q-Learning**.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay I know I should stop with the English-related comments, but the last line is bugging me :)))) how about "a new method is required, called..."

Copy link

@sinatav sinatav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants