✅ It is inconvenient to work with different sample spaces. To develop a unified probability theory, we consider a mapping
The nature of random variable is a function or a map
✅ In many applications, we may be interested only in some particular aspect of the outcomes of an experiment, rather than the outcomes themselves.
[💬 Definition 1. Random Varible] A random variable,$X(\cdot )$, is a
::: {.callout-important}
Convention:A capital letter
:::
🌰:When we throw a coin, the sample space
For the election of a candidate, the sample sapces
So it is not necessary to have the same number of basic outcomes for both
Suppose we throw three fair coins.Then the space sample
Suppose
Here
$X$ is called binary random variable because there are only two possible values$X$ can take.The binary variable has wide applications in economics.
There are many other examples of random variables, including
- subjective well-being
- sentiment index of investoers
- economic policy uncertainty(EPU) index
These indices are usually constructed based on text data from social media platforms (e.g.,WeChat,Facebook)and news media.Text data are an unstructured form of Big data.
The probability function defined on the original sanple space
First suppose space
with a probability function
where
Then the probability function
where
More formally, for any set $A \in \mathbb{B}{\Omega}$,where $\mathbb{B}{\Omega}$ is a
where
Now we complete the transformation from the original measurable space
💬 Definition 3.2 Measurable Function A function
- A
$\mathbb{B}$ -measurable function is simply called a measurable function.(if it dose not cause any confusion) - A measurable function ensures that
$P(X \in A)$ is always well-defined for all subsets$A$ in$\mathbb{B}_{\Omega}$ - If
$X(\cdot )$ is not measurable, then there exist subsets in the$\sigma$ -field in$\mathbb{R}$ for which probability are not defined . - In this doc, the term "random variable" is restricted to being a
$\mathbb{B}$ -measurable function from$S$ to$\mathbb{R}$
⚖️Theorem 3.1 Let
if function
$X( s )$ and$Y( s )$ are measurable mappings from$S$ to$\Omega$ , then the ordinary algebraic operations on$X( s )$ and$Y( s )$ will produce new measurable mappings.
$aX( s )$ $X( s ) + Y( s )$ $X( s )Y( s )$ and$\frac{X( s )}{ Y( s )}$
::: {.callout-tip}
- Condition ( 1 ) $$ 1 \ge P_{X}(x_i) = P(X = x_i) = P(C_i) \ge 0 $$
given
-
Condition ( 2 ) $$ P_{X}( \Omega ) = P( S ) = 1 $$
-
Condition ( 3 )
consider two mutually exclusive events
where
However, we can also write $$ C = { s \in S : X( s ) \in A_1 } \cap { s \in S : X( s ) \in A_2 } \ = C_1 \cap C_2 $$
The fact that
$$ P_{x}( A_1 \cap A_2 ) = P( C_1 ) + P( C_2 ) \ = P_{x}( A_1 ) + P_{X}( A_2 ) $$ :::
In the rest, we will abuse the notations for the original probability function
🌰: Suppose we throw three coins. Then the sample space $$ S = { HHH,HTH,HHT,THH,THT,TTH,HTT,TTT } $$
Define a random variable
Now, suppose we are interested in calculating the probability that
It follows that: $$ P(0 \le X \le 1 ) = P( C )\ = \frac{1}{2 } $$
✅ How to characterize a random variable
💬 Definition 3.3 cumulative Distribution Function (CDF) The cumulative distribution function (CDF) of a random variable
⚖️Theorem 3.2 Properties of
-
$\lim_{x \to -\infty}F_{X}( x ) = 0,\lim_{x \to +\infty}F_{X}( x )=1$ . -
$F_{X}( x )$ is non-decreasing -
$F_{X}( x )$ is right-continuous, i.e., for all$x$ and$\delta>0$ ,
⚖️ Theorem 3.3 Let
⚖️ Theorem 3.4
🌰: Suppose
IIF the
::: {.callout-tip}
- A mixture of two distributions can provide a great deal of flexibility such as capturing skewness and heavy tails.
- One possibility for a mixed distribution to arise is that in an observed data, some observations are generated from one distribution, and the remaining observations are generated from another distribution.
- Another possibility for a mixed distribution is that there exist two mutually exclusive states,state 1 and state 2 which arise with probabilities p and 1 -p respectively.The random variable
$X$ will follow the distribution$F(x)$ when state 1 occurs and will follow distribution$F_2(x)$ when state 2 occurs.Then the distribution $F(x)$of$X$ is a mixture of distributions $F_1(x)$and$F_2(x)$ . - A well-known example in econometrics is the so-called Markov regime-switching model, which is widely used in macroeconomics and finance, in which
$p$ depands on a state variable characterizing the business cycles
In practice,
$p$ may depend on some economic variables$Z$ . An example of$p = p( Z )$ is
$$ p( Z ) = \frac{1}{1 + \exp( - \alpha^{'} Z )} $$ :::
💬 Definition 3.4 Identical Distributions Two random variables
- it is important to note that the identically distribution does not imply
$X = Y$ , although$X = Y$ implies that$X$ and$Y$ have the same distribution - it is not necessary that
$X$ and$Y$ are defined on the same sampele space. Only their distributions functions have to coincide.
⚖️ Theorem 3.5 Let
🌰:the Income distribution, Lorenz Curve and Gini Coefficient
In economics, the Lorenz curve and Gini coefficient are two popular measures of income income inequality.
{width="50%" fig-align="center"}
It graphically shows that for the bottom
A perfectly equal income distribution would be one in which every household has the same income. In this case, the bottom
In economics, the Gini coefficient is the most commonly used measurement of inequality. It was developed by the Italian statistician and sociologist Corrado Gini in 1912. The Gini coefficient is defined based on the Lorenz Curve.The Gini coeffcient can then be thought of as the ratio of the area that lies between the line of equality and the Lorenz curve over the total area under the line of equality If all people have non-negative income ( or wealth ), the Gini coefficient can theoretically range from 0( complete equality ) to 1( complete inequality ); it is sometimes expressed as a percentage ranging between 0 and 100 If negative values are possible ( such as the negative wealth of people with debts ), then the Gini coefficient could theoretically be more than 1. Noting that two countries with different income distributions can have the same Gini coefficient
🌰: First Order Stochastic dominance
If two distribution
{width="50%" fig-align="center"}
the first order stochastic dominance is widely used in decision analysis, welfare economics, and finance An application of stochastic dominance is to the analysis of income distributions. If
$x$ denotes an income level, then the inequality means that the proportion of individuals in distribuion$F( \cdot )$ with the income less than$x$ is smaller than the proportion of such individuals in$G( \cdot )$ . In other words, there is a higher proportion of poorer people in$G( \cdot )$ than in$F( \cdot )$ Another application:if portfolio $F( \cdot )$has first order stochastic dominate over portfolio$G( \cdot )$ ,then$P_{F}( X > x ) \ge P_{G}( X > x )$ for all$x$ ;that is,the probability that high returns of portfolio$F( \cdot )$ is higher than the probability that high returns of portfolio$G( \cdot )$ .
🌰: Second order stochastic Dominance
A probability distribution
with strict inequality for at least some
{width="50%" fig-align="center"}
Risk-averse economic agents will always prefer distribution
$F( \cdot )$ because for any increasing and concave utility function$u( \cdot )$ , we have $$ \int_{-\infty}^{\infty}u( x )dF(x) \ge \int_{-\infty}^{\infty} u( x )dG( x ) $$
if and only if
::: {.callout-tip}
Intuitively,economic agents whom prefer more will prefer a first order stochastic dominating distribution,and economic agents who prefer more but are risk-adverse will prefer a second order stochastic dominating distribution.
Higher order stochastic dominance can be defined in a similar way.
The concepts of various stochastic dominances are very useful in characterizing risk behavior of economic agents.There exists a dual relationship between stochastic dominances and classes of utility functions.
Probability theory is a rather useful analytic tool in behavior economics and behavior finance.
:::
💬 Definition 3.5 Discrete Random Variable(DRV) If a random variable
::: {.callout-tip}
For a discrete random variable
💬 Definition 3.6 Probability Mass Function(PMF) The PMF of a DRV is defined as $$ f_{X}( x ) = P( X = x ) \text{for all x } \in \mathbb{R} $$
⚖️ Theorem 3.6 Properties of PMF
-
$0\le f_{X}( x ) \le 1$ for all$x \in \mathbb{R}$ $\sum_{x \in \Omega}f_{X}( x ) = 1$
💬 Definition 3.7 SupportThe collection of the points on the real line
We pay attention to the point with the PMF greater than zero.
The support of
$X$ is the set of all possible values that$X$ can take with strictly positive probability. Although $f_{X}( x )$is defined on the entire real line$\mathbb{R}$ ,it suffices to know the support of a DRV$X$ and the probabilities of all points in the support. The PMF fx(x)can be represented graphically via a so-called probability historgram.
💬 Definition Probability Histogram A probability histogram is a plot to represent a discrete probability distribution where rectangles are constructed so that their bases of equal width are centered at each value
{width="50%" fig-align="center"}
⚖️ Theorem 3.7 Suppose
where the summation is over all values
::: {.callout-important}
-
$F_{X}( x )$ is defined not only in over the Support of$X$ , but on the whole real line.
:::
🌰: Suppose a random variable
👉🏻: To compute the CDF
case 1 :
case 2 :
case j :$j-1 \le x < j, 2 \le j\le N$. Then the event
case N+1 :$x\ge N$ Then the event
To sumup, we have
This function is a step function, where jumps occur ar the points with strictly positive probabilities that is , when jumps occur at the points contained in the support of
$X$
**⚖️ Theorem 3.8 ** Suppose
💬 Definition 3.9 Continuous Random Variable A random variable
{width = "50%" fig-align = "center"}
✅ can we define a PMF
For any constant
- for a CRV
$X$ , the probability that$X$ takes a single point is zero - Intuition: consider an analogous example of a satellite flying over Mainland China.Suppose it takes one hour for the satellite to fly over Mainland,2 minutes to fly over Fujian,and 0.1 second to fly over Xiamen.It is conceivable that it takes almost zero second for the satellite to fly over Economic Building at XMU.
- The result that
$P(X=x)=0$ for all$x$ for a CRV$X$ has important implications.For example, $$ P( a < X \le b) =P( a \le X< b ) \ = P(a \le X \le B) $$
💬 Definition Absolute Continuity(AC) A function
What is meant by "almost everywhere" Intuitively,in any finite interval of
$\mathbb{R}$ ,there are a finite number of points or an infinite but countable number of points where $F_{X}( x )$is not differentiable. A continuously differentiable function is absolutely continuous.
💬 Definition 3.10 Probability Density Function(PDF) Let
The function
In other words, we have
$$ F^{'}{X}( x ) = f{X}( x ) \text{ for almost all } x \in \mathbb{R} $$
For interpretation of the PDF
for some
-
Although $f_{X}( x )$is not a probability measure(i.e., the probability mass function for DRV),it is proportional to the probability that x takes values in a small interval centered at point
$x$ .Thus,$f_{X}( x )$characterizes the relative magnitude of the probability that$X$ takes values in a small interval centered at$x$ . -
The plot of $f_{X}( x )$describes the shape of the probability distribution of a CRV
$X$ .
- Given a CDF
$F_{X}( x )$ , we can obtain the PDF function$f_{X}( x ) = F_{X}^{'}( x )$ at the points where$F_{X}( x )$ is differentiable. When$F_{X}( x )$ differentiable on the entire real line, the PDF$f_{X}( x )$ is unique. - However, when
$F_{X}( x )$ is not differentiable at some points,$f_{X}( X )$ is not defined at those points.
✅ So how to define the values of PDF
- we can define
$f_{X}( x )$ arbitrarily at those points, but smooth as possible for compute convenience. - Example: consider two PDFs
⚖️ Theorem 3.9 Properties of PDF A function
-
$f_{X}( x ) \ge 0$ for all$x \in \mathbb{R}$ $\int_{-\infty}^{\infty}f_{X}( x )dx = 1$
⚖️ Theorem 3.12 Support of a CRV The support of a CRV
- The support of a CRV
$X$ is the set of all possible points on$\mathbb{R}$ with strictly positive PDF$f_{X}( X )$ . - The probability that a CRV
$X$ takes values in a small neighborhood of any point in its support is always positive. - In contrast,the probability that X takes values in some small neighborhood of any point outside the support will be zero.
- It suffices to focus on the support of
$X$ when calculating the probabilities of a CRV.
🌰: Location_Scale Family
Let
is a PDF
The family of f(x-u),indexed by parameter u,is called the location family with standard PDF f(x),where pa- rameter u is called the location parameter. The family of if(),indexed by parameter o,is called the scale family with standard PDF f(x),where param- eter o is called the scale parameter. The family of If(),indexed by parameters (u,o), is called the location-scale family with standard PDF ,where u and o are called the location and scale parameters respectively.