A Comprehensive Approach to Construct a Portfolio: Factor Model, Bayesian Shrinkage, Black-Litterman Model, and Smart Beta
The primary objective of our project is to maximize the portfolio return by stock selection and allocation. To accomplish this, we have adopted a series of steps. The project course codes are available in the GitHub repository.
Firstly, we employ the dual momentum method to select stocks and narrow down the list for portfolio construction. This method combines absolute and relative momentum to identify assets with strong performance potential.
Secondly, we apply Bayesian shrinkage (first layer of Bayesian update) and Black-Litterman Model (second layer of Bayesian update) on factor returns to re-evaluate posterior predictive moments of stock returns with the newest information and investor's view.
Thirdly, we utilize Smart Beta to calculate the weight of each stock within our portfolio. By incorporating the 10 industry factor model and implementation of monthly rebalancing via stock selection, we aim to boost the return of our strategy.
Lastly, we utilize the Backtrader
1 framework and QuantStats
2 for rigorous backtesting to thoroughly evaluate the performance of our strategy. This allows us to assess the historical effectiveness of our approach and make informed decisions on potential refinements.
In our strategy, we implement a six-month formation period for stock selection. This means that we compute the returns of every asset in our stock universe, which comprises the S&P 500 index, over the past six months. This calculation is performed on a rolling basis, with monthly intervals, to ensure regular portfolio rebalancing.
During each rebalancing period, we evaluate the six-month returns of all the assets and adjust the portfolio composition accordingly. Specifically, we select a subset of assets with the highest six-month returns to construct the portfolio. This approach allows us to capture and capitalize on the momentum effect exhibited by assets within the S&P 500 index.
By incorporating this rolling six-month return calculation and frequent portfolio adjustments, our strategy aims to adapt to changing market conditions and exploit potential trends in the selected assets.
clean_stock_price.to_excel("Cleaned SPX Price.xlsx")
clean_stock_price.index = pd.to_datetime(clean_stock_price.index)
clean_stock_semiannually_return = clean_stock_price.resample("M").last().rolling(window=6).apply(
lambda x: (x[-1] - x[0]) / x[0]).dropna()
clean_stock_semiannually_return.index = [str(i + timedelta(days=15)).rsplit("-", 1)[0] + "-01" for i in clean_stock_semiannually_return.index]
In our stock selection process, we create a long portfolio by sorting the top 50 stocks with the highest returns. However, it is essential to ensure that each stock in the long portfolio has a positive absolute return. If any stock has a negative return, it is removed from the portfolio.
Similarly, we form a short portfolio by selecting the 50 stocks with the lowest returns and negative absolute returns. This ensures that we take advantage of downward price movements in these stocks.
long_list_output = pd.DataFrame(columns=range(1, 51))
short_list_output = pd.DataFrame(columns=range(1, 51))
for i in range(clean_stock_semiannually_return.shape[0]):
semiannual_return = clean_stock_semiannually_return.iloc[i].values
sorted_indices = np.argsort(-semiannual_return)
sorted_semiannual_return = semiannual_return[sorted_indices]
sorted_company_names = company_names[sorted_indices]
positive_count = np.sum(sorted_semiannual_return > 0)
negative_count = np.sum(sorted_semiannual_return < 0)
positive_number = min(50, positive_count)
negative_number = min(50, negative_count)
array_shape = (50,)
positive_insert_array = np.resize(sorted_company_names[:positive_number], array_shape)
negative_insert_array = np.resize(sorted_company_names[-negative_number:], array_shape)
remaining_positive_columns = 50 - positive_number
remaining_negative_columns = 50 - negative_number
positive_insert_array[-remaining_positive_columns:] = ''
negative_insert_array[-remaining_negative_columns:] = ''
if remaining_positive_columns == 0:
long_list_output.loc[i] = sorted_company_names[:positive_number]
else:
long_list_output.loc[i] = positive_insert_array
if remaining_negative_columns == 0:
short_list_output.loc[i] = sorted_company_names[-negative_number:]
else:
short_list_output.loc[i] = negative_insert_array
With the factors provided, each stock's return
where
We are aiming to model the Bayesian posterior predictive moments
To maintain closed-form solutions in MV analysis, we adopted fully conjugate and well established priors: Zellner’s g-prior for
Here we propose
The marginal posterior of
where
# List of Mean and Var of beta_m
def post_beta(self, beta_0=None, g=None) -> tuple[list[np.ndarray], list[np.ndarray]]:
if not beta_0:
beta_0 = np.zeros(self.K)
if not g:
g = self.g_star
beta_mean_list = []
beta_var_list = []
for m in range(self.M):
r_m = self.stock_data[:, m]
beta_hat_m = np.linalg.inv(self.F.T @ self.F) @ self.F.T @ r_m
beta_m_bar = (beta_0 + g * beta_hat_m) / (1 + g)
beta_mean_list.append(np.array(beta_m_bar))
SSR = (r_m - self.F @ beta_hat_m).T @ (r_m - self.F @ beta_hat_m) + 1 / (g + 1) * (beta_hat_m - beta_0).T @ self.F.T @ self.F @ (
beta_hat_m - beta_0
)
sig_m = g / (g + 1) * np.linalg.inv(self.F.T @ self.F) * SSR / self.T
beta_var_list.append(self.T / (self.T - 2) * sig_m)
return beta_mean_list, beta_var_list
# List of Mean of sigma^2_m (length: m, m is number of stocks)
def post_sig2_mean(self, beta_0=None, g=None) -> list[float]:
if not beta_0:
beta_0 = np.zeros(self.K)
if not g:
g = self.g_star
sig2_list = []
for m in range(self.M):
r_m = self.stock_data[:, m]
beta_hat_m = np.linalg.inv(self.F.T @ self.F) @ self.F.T @ r_m
SSR = (r_m - self.F @ beta_hat_m).T @ (r_m - self.F @ beta_hat_m) + 1 / (g + 1) * (beta_hat_m - beta_0).T @ self.F.T @ self.F @ (
beta_hat_m - beta_0
)
sig2_list.append(SSR / 2 / (self.T / 2 - 1))
return sig2_list
The marginal posterior of
where
# Mean and Var of miu_f
def post_miu_f(self) -> tuple[np.ndarray, np.ndarray]:
f_bar = np.array(self.F.mean(axis=0)).T
Lambda_n = np.zeros((self.K, self.K))
for t in range(self.T):
f_t = self.F[t, :]
Lambda_n += np.outer(f_t - f_bar, f_t - f_bar)
# Without views about future factor returns
if self.P is None or self.Q is None:
miu_f_mean = f_bar
miu_f_var = 1 / (self.T - self.K - 2) * Lambda_n / self.T
return miu_f_mean, miu_f_var
# Mean of Lambda_n
def post_Lambda_n(self) -> np.ndarray:
f_bar = self.F.mean(axis=0)
Lambda_n = np.zeros((self.K, self.K))
for t in range(self.T):
f_t = self.F[t, :]
Lambda_n += np.outer(f_t - f_bar, f_t - f_bar)
return Lambda_n / (self.T - self.K - 2)
For Zellner’s g-prior with
where
Then we employ the empirical Bayes estimate
# Objective function for finding g*
def g_likelihood(self, g) -> float:
R_squared_list = []
for m in range(self.M):
r_m = self.stock_data[:, m]
r_m_bar = r_m.mean(axis=0)
beta_hat_m = np.linalg.inv(self.F.T @ self.F) @ self.F.T @ r_m
R_squared_m = 1 - ((r_m - self.F @ beta_hat_m).T @ (r_m - self.F @ beta_hat_m)) / ((r_m - r_m_bar).T @ (r_m - r_m_bar))
R_squared_list.append(R_squared_m)
R_squared_list = np.array(R_squared_list)
return sum(-(self.T - self.K - 1) / 2 * np.log(1 + g) + (self.T - 1) / 2 * np.log(1 + g * (1 - R_squared_list)))
Denote
where
with
# Posterior predictive return distribution (mean vector and covariance matrix) and shrinkage parameter g*
def posterior_predictive(self) -> tuple[np.ndarray, np.ndarray, float]:
sig2_mean = self.post_sig2_mean()
miu_f_mean, miu_f_var = self.post_miu_f()
Lambda_n_mean = self.post_Lambda_n()
beta_mean_list, beta_var_list = self.post_beta()
f_ft_mean = Lambda_n_mean + miu_f_var + np.outer(miu_f_mean, miu_f_mean)
f_var = Lambda_n_mean + miu_f_var
r_mean_list = []
r_cov_mat = np.zeros((self.M, self.M))
for m in range(self.M):
r_mean = beta_mean_list[m] @ miu_f_mean
r_mean_list.append(r_mean)
for j in range(m, self.M):
if m == j:
r_cov_mat[m, m] = sig2_mean[m] + np.trace(f_ft_mean @ beta_var_list[m]) + beta_mean_list[m].T @ f_var @ beta_mean_list[m]
else:
r_cov_mat[m, j] = beta_mean_list[m].T @ f_var @ beta_mean_list[j]
r_cov_mat[j, m] = r_cov_mat[m, j]
return np.array(r_mean_list), np.array(r_cov_mat), self.g_star
To track the difference of mean and covariance matrix between Bayesian approach and historical data sample approach, we employed two distance metrics:
The below plots shows the estimates’ difference and estimated
We note that there is a significant shock on distance in covariance estimate and
In this section, we are aiming to incorporate investor's forward-looking analyses on return and covariance matrix of the underlying factors via the second layer of Bayesian update. These information are not necessarily captured by the historical data, but can be strategically integrated into the statistical model. Here we only discuss its application on factor returns
Let
With the formulation:
and assume that the noise
According to Schöttle et al. (2010), the updated joint posterior distribution of
where
for some constant
Subsequent to the second Bayesian update, the posterior distribution of
def post_miu_f(self) -> tuple[np.ndarray, np.ndarray]:
f_bar = np.array(self.F.mean(axis=0)).T
Lambda_n = np.zeros((self.K, self.K))
for t in range(self.T):
f_t = self.F[t, :]
Lambda_n += np.outer(f_t - f_bar, f_t - f_bar)
# With views about future factor returns
if self.P == "absolute":
self.P = np.eye(self.K)
elif self.P == "relative":
self.P = np.eye(self.K)
for i in range(self.K - 1):
self.P[i, i + 1] = -1
if not self.c:
self.c = np.sqrt(self.T)
Sigma_n = Lambda_n / self.T / (self.T - self.K)
miu_f_mean = f_bar + 1 / (self.c + 1) * Sigma_n @ self.P.T @ np.linalg.inv(self.P @ Sigma_n @ self.P.T) @ (self.Q - self.P @ f_bar)
miu_f_var = (
(self.T - self.K)
/ (self.T - self.K - 2)
* (Sigma_n - 1 / (self.c + 1) * Sigma_n @ self.P.T @ np.linalg.inv(self.P @ Sigma_n @ self.P.T) @ self.P @ Sigma_n)
)
return miu_f_mean, miu_f_var
In scenarios where
After we have obtained the Bayesian predictive posterior mean and covariance matrix of stock returns, we employed Smart Beta to calculate the weight of each stock within our portfolio. The methods include Risk Parity, Maximum Diversification Ratio (MDR), Global Minimum Variance (GMV), and Maximum Sharpe Ratio (MSR) as mentioned in the lecture note.
We also add a weight calculation scheme that allows us to specify a required expected return
With our selected stocks via Dual Momentum approach and factors data (10 industry portfolios fetched from Fama and French Database, following the paper's approach), we use the preceding 252 days’ returns for parameters estimation and Bayesian update at the end of each trading day. Then with the posterior predictive moments of
The universe of stock selection is rebalanced monthly (i.e. at the first trading day of each month). For the long only stocks, we set the boundary for weight:
This section serves as a really simplified cumulative returns comparison of each strategy (no transaction cost, no dividend adjustment, no market liquidity assumption, purely based on the closing price) on the long only stocks selected by the Dual Momentum approach. The evaluation period is from 2020-01-01 to 2024-02-29.
For the long only stocks, we notice that the equal weight strategy has already beat the market after stock selection. For the MDR scheme, even though the Bayesian approach beat the sample approach, they both fail to outperform the equal weight strategy.
It is more interesting to take a look at the cumulative return when we specify a required daily return. We discovered that when we specify a relative small required return (0.1%, daily) or extremely large required return (1%, daily), which deviate from equal weight method daily return a lot, the Bayesian approach tends to fail.
However, when we adjust the required daily return to 0.2% or 0.25%, Bayesian approach beat the sample approach again. The Bayesian approach even earned an extremely abnormal return. This may result from allocating heavy weight to a single stock, which can be interpreted as its ability to capture the "trading signal".
In order to assess the performance of our portfolio relative to the benchmark SPX500, we employ backtesting techniques utilizing the Backtrader
1 and QuantStats
2 libraries. Backtrader
stands as a widely-used Python library designed for backtesting trading strategies, while QuantStats
serves as a complementary Python library tailored for comprehensive portfolio analytics. Through the integration of these tools, we aim to rigorously evaluate the performance of our strategy by using the generated portfolio weights in comparison to the SPX500 benchmark.
Utilizing Backtrader
, our methodology entails replicating the daily rebalancing of the portfolio in accordance with the weights generated by our strategy. With a portfolio comprising around 100 stocks, the selection of stocks is revised monthly based on prevailing market conditions. Consequently, our stock universe undergoes monthly adjustments, with purchases and sales executed daily in alignment with the specified weights.
The operational logic is as follows: leveraging the close price data for the current day, we calculate the requisite weights and corresponding share allocations. These share allocations are rounded down to integers, and orders are placed at the opening price of the subsequent day. We operate under the assumption of seamless execution at the opening price, devoid of any market impact.
When configuring Backtrader
, we eschew predefined templates for the data feeds, as only the open and close prices for all SPX stocks are required. Additionally, a new observer is integrated to calculate the portfolio value at each time point. The initial cash allocation is fixed at $100,000,000, with a margin of 10% and a commission rate of 0.1% applied. The execution of the backtesting is facilitated by the Cerebro
engine.
# define portfolio data feeds
class PandasData(bt.feeds.PandasData):
lines = ("open", "close")
params = (
("datetime", None), # use index as datetime
("open", 0), # the [0] column is open price
("close", 1), # the [1] column is close price
("high", 0),
("low", 0),
("volume", 0),
("openinterest", 0),
)
# new observer for portfolio
class PortfolioValueObserver(bt.Observer):
lines = ("value",)
plotinfo = dict(plot=True, subplot=True)
def next(self):
self.lines.value[0] = self._owner.broker.getvalue()
# backtest given prices, weights, initial cash, commission fee
def RunBacktest(stock_list, combined_df, weights_df, ini_cash, comm_fee, notify, log):
cerebro = bt.Cerebro() # initiate cerebro engine
# load data feeds
for col in stock_list:
data = PandasData(dataname=combined_df[[col + "_open", col + "_close"]])
cerebro.adddata(data, name=col)
# strategy setting
weights_df = weights_df / weights_df.sum(axis=1).values.reshape(-1, 1) * 0.9 # margin
# set initial cash
cerebro.broker.setcash(100000000)
cerebro.broker.setcommission(commission=comm_fee) # set commission
cerebro.addstrategy(BLStrategy, weights=weights_df, stocks=stock_list, printnotify=False, printlog=False) # set strategy
cerebro.addobserver(PortfolioValueObserver) # add observer
cerebro.addanalyzer(bt.analyzers.PyFolio, _name="pyfolio") # add analyzer
# run the strategy
results = cerebro.run()
return results
Following the computation of daily returns by subtracting the commission from the portfolio value return using Backtrader
backtesting, the subsequent analysis leverages QuantStats
to comprehensively evaluate portfolio performance. Key metrics, including cumulative return, annualized return, annualized volatility, Sharpe ratio, Sortino ratio, maximum drawdown, Calmar ratio, value-at-risk, and expected shortfall, are meticulously assessed. QuantStats
streamlines the process by automatically generating a report containing all the metrics and plots. Additionally, we aim to dynamically compare the performance of our portfolio with the benchmark SPX500, while also visualizing the dynamic drawdown plot of the portfolio.
The data utilized for backtesting purposes is sourced from iFind, encompassing the close price data for the SPX500 index, as well as the open3 and close4 prices for all individual stocks within the SPX500 index5. The portfolio weights, as previously mentioned, are generated through our strategy6. To ensure data integrity, missing values are replaced with NaN
, and the index is configured to datetime
format. Our backtesting approach involves testing our strategy with weights generated by targeting daily returns of 1.5%, 2%, and 2.5% using the Bayesian approach.
Given the target daily return and the approach for generating weights, we utilize the LoadData
function to load the stock lists, price data, and weights data for each of the stocks. By inputting the acquired data along with our initialization parameters such as initial cash and commission fee into the RunBacktest
function, we obtain the backtesting results. These results, along with the SPX500 index prices5, are then utilized to generate a comprehensive report using the PortReport
function.
The backtesting results for three target daily returns indicated that a target daily return of 0.15% exhibited the poorest performance. Despite yielding a positive return, it underperformed the market (SPX500). Conversely, the other two target daily returns showcased superior performance compared to the market. This disparity in performance is visually evident in the comparison plots between the market and portfolio value below. However, it is notable that the drawdown appears to be more significant when the market is in a downturn.
In addition to the drawdown and portfolio value, we also present the key performance metrics for the three target daily returns in the table below:
Metric | Value | Metric | Value |
---|---|---|---|
Target Daily Return | 0.015 | Cumulative Return | 33.12% |
Annualized Return | 4.86% | Annualized Volatility | 26.53% |
Sharpe Ratio | 0.4 | Sortino Ratio | 0.54 |
Calmar Ratio | 0.1 | Maximum Drawdown | -47.24% |
Value-at-Risk (VaR) | -2.71% | Expected Shortfall (ES) | -2.71% |
Metric | Value | Metric | Value |
---|---|---|---|
Target Daily Return | 0.02 | Cumulative Return | 74.62% |
Annualized Return | 9.68% | Annualized Volatility | 31.13% |
Sharpe Ratio | 0.59 | Sortino Ratio | 0.83 |
Calmar Ratio | 0.21 | Maximum Drawdown | -46.16% |
Value-at-Risk (VaR) | -3.15% | Expected Shortfall (ES) | -3.15% |
Metric | Value | Metric | Value |
---|---|---|---|
Target Daily Return | 0.025 | Cumulative Return | 143.35% |
Annualized Return | 15.89% | Annualized Volatility | 36.69% |
Sharpe Ratio | 0.77 | Sortino Ratio | 1.1 |
Calmar Ratio | 0.3 | Maximum Drawdown | -53.62% |
Value-at-Risk (VaR) | -3.69% | Expected Shortfall (ES) | -3.69% |
We have also generated more comprehensive plots and tables for the backtesting results. For detailed reports, please refer to the following links7.
Footnotes
-
For detailed information on
Backtrader
, please visit:Backtrader
Documentation. ↩ ↩2 -
The source code for
QuantStats
can be found at:QuantStats
GitHub Repository. ↩ ↩2 -
The weights for target daily return = 1.5% are available for download from here, for target daily return = 2% are available for download from here, and for target daily return = 2.5% are available for download from here ↩
-
The detailed report for target daily return = 0.015% can be found here, for target daily return = 0.02% can be found here, and for target daily return = 0.025% can be found here. ↩