This the companion notebook for the session on extensions of machine learning tools for factor investing.
The extensions go in four very different directions:
\(\rightarrow\) ABOUT THE NOTEBOOK:
We start by loading the data and the required packages. For SVMs, e1071 is a good choice as it is built on the well-known libsvm library.
if(!require(e1071)){install.packages("e1071")} # ML package used for SVM modelling
library(tidyverse)
library(e1071) # Package for SVMs
load("data.RData") # Load data
data <- data %>%
select(-ESG_rank) %>% # We will include ESG later on
group_by(Tick) %>% # Returns are computed asset-by-asset
mutate(F_Return = lead(Close) / Close - 1) %>% # Adding forward/future returns
ungroup() %>% # Ungroup
na.omit() # Remove missing data
Below, we also prepare the raw inputs. We start with the normalisation of features and with the train/test split. For simplicity, we work with a suboptimal dependent variable, namely simple future returns.
normalise <- function(v){ # This is a function that 'uniformalises' a vector
v <- v %>% as.matrix()
return(ecdf(v)(v))
}
data_f <- data %>% # Data with normalised values
group_by(Date) %>% # Normalisation takes place for each date
mutate_at(vars(Vol_1M:Prof_Marg), normalise) %>% # Apply normalisation on features
ungroup() # Ungroup
sep_date <- as.Date("2010-01-01") # Separates in-sample vs out-of-sample
t_oos <- data$Date[data$Date > sep_date] %>% # Out-of-sample dates (i.e., testing set)
unique() %>% # Remove duplicates
as.Date(origin = "1970-01-01") # Put in date format
features <- c("Mkt_Cap", "P2B", "Vol_1M", "D2E", "Prof_Marg") # Predictors
train_sample <- data_f %>% filter(Date <= sep_date) # Train sample
train_features <- select(train_sample, all_of(features)) # Train features
test_sample <- data_f %>% filter(Date > sep_date) # Test sample
test_features <- select(test_sample, all_of(features)) # Train features
We directly move to the implementation of a SVM-based model. We recall that there are two important options (among others): the kernel that is applied upfront to the inputs and the regularisation intensity, C (for cost), which is a scaling factor for the sum of slack errors in the objective function.
fit <- svm(y = train_sample$F_Return, # Train label
x = train_features, # Train features
type = "nu-regression", # SVM task type
kernel = "radial", # SVM kernel
gamma = 0.5, # Constant in the radial kernel
cost = 0.1) # Slack variable penalisation
pred <- predict(fit, test_features) # Store predictions
mean(pred * test_sample$F_Return > 0) # Compute hit ratio
## [1] 0.5401003
mean((pred-test_sample$F_Return)^2) # MSE
## [1] 0.004526179
The hit-ratio is slightly above 0.50, which means that the signal is not of outstanding quality.
Some classification code below.
fit <- svm(y = as.factor(train_sample$F_Return > 0), # Train label
x = train_features, # Train features
type = "nu-classification", # SVM task type
kernel = "radial", # SVM kernel
gamma = 0.5, # Constant in the radial kernel
cost = 0.1) # Slack variable penalisation
pred <- predict(fit, test_features) # Store predictions
mean(as.logical(pred) * test_sample$F_Return > 0) # Compute hit ratio
## [1] 0.3135338
In this section, we implement one optimal combination of ‘independent’ learners. We start by defining the prediction error function in which we stack all models (random forest, boosted trees and SVM). We also add a weight feature in the function: it will compute the errors when the output is a (weighted) linear combination of the 3 algorithms.
library(randomForest)
library(xgboost)
pred_multi <- function(train_data, test_data, weights){
set.seed(42) # Sets the pseudo-random generator seed to fix the results (for RF)
features <- c("Mkt_Cap", "P2B", "Vol_1M", "D2E", "Prof_Marg") # Hard-coded
train_features <- train_data %>%
select(all_of(features)) %>% as.matrix() # Indep. variable
train_label <- train_data$F_Return # Dependent variable
test <- test_data %>% # Test sample
select(all_of(features)) %>%
as.matrix()
# 1. Random Forest
fit <- randomForest(y = train_label, # Label
x = train_features, # Features
ntree = 10, # Number of random trees
mtry = 4 # Nb of predictive variables used for each tree
)
err_RF <- predict(fit, test) - test_data$F_Return # Return errors
# 2. Boosted Trees
train_matrix <- xgb.DMatrix(data = train_features, label = train_label) # XGB format
fit <- train_matrix %>%
xgb.train(data = ., # Data source (pipe input)
eta = 0.5, # Learning rate
objective = "reg:squarederror", # Number of random trees
max_depth = 4, # Maximum depth of trees
nrounds = 10 # Number of trees used
)
xgb_test <- test_data %>% # Test sample => XGB format
select(all_of(features)) %>%
as.matrix() %>%
xgb.DMatrix()
err_XGB <- predict(fit, xgb_test) - test_data$F_Return # Return errors
# 3. SVM
fit <- svm(y = train_label, # Train label
x = train_features, # Train features
type = "nu-regression", # SVM task type
kernel = "radial", # SVM kernel
gamma = 0.5, # Constant in the radial kernel
cost = 0.1) # Slack variable penalisation
err_SVM <- predict(fit, test) - test_data$F_Return # Return errors
err_AGG <- weights[1]*err_RF + weights[2]*err_XGB + weights[3]*err_SVM # Aggregate error
return(data.frame(err_RF, err_XGB, err_SVM, err_AGG)) # Return all errors
}
We are now ready to compute the errors across all three methods (+ the linear combination). The error is computed on the training samples!
errors <- pred_multi(train_data = train_sample,
test_data = train_sample, # On training data first!
weights = rep(1,3)/3) # Equal weights at first
errors %>% colMeans() # Average error
## err_RF err_XGB err_SVM err_AGG
## 0.0003423143 0.0004667837 0.0007595887 0.0005228955
errors %>% apply(2, sd) # Standard deviation of errors
## err_RF err_XGB err_SVM err_AGG
## 0.05486171 0.08661017 0.09329150 0.07584463
We then move to the ‘optimal’ combination. We compute the covariance of errors and then the derived optimal weights.
cor(errors[,1:3]) # Corr. matrix of errors of first 3 models (singular if 4)
## err_RF err_XGB err_SVM
## err_RF 1.0000000 0.8514612 0.8582702
## err_XGB 0.8514612 1.0000000 0.9703092
## err_SVM 0.8582702 0.9703092 1.0000000
Sigma <- cov(errors[,1:3]) # => stored in Sigma (for future use)
w <- rowSums(solve(Sigma)) # Sum or rows inverse covariance matrix of errors
w / sum(w) # Optimal weights
## err_RF err_XGB err_SVM
## 1.4304300 0.2378478 -0.6682778
The optimal weights are risky because they create an arbitrage: rely heavily on RF. This comes from the high correlation between the models’ errors.
Let’s see how an EW combination performs on the testing set.
errors_2 <- pred_multi(train_data = train_sample,
test_data = test_sample,
weights = rep(1,3)/3)
errors_2 %>% colMeans() # Average error
## err_RF err_XGB err_SVM err_AGG
## -0.0002450426 0.0024838701 -0.0049413085 -0.0009008270
errors_2 %>% apply(2, sd) # Standard deviation of errors
## err_RF err_XGB err_SVM err_AGG
## 0.08121525 0.08378682 0.06710358 0.07163709
In light of the randomness of the first method, each time we rerun the code, new metrics should be generated for the first and last column. This does not happen because we fixed the random number generator by freezing the random seed. In terms of bias, it is hard to conclude. In terms of variance, the aggregate forecast usually has a very small advantage.
Let’s see how the optimised combination performs on the testing set.
errors_3 <- pred_multi(train_data = train_sample,
test_data = test_sample,
weights = w / sum(w)) # Change weights!
errors_3 %>% colMeans() # Average error
## err_RF err_XGB err_SVM err_AGG
## -0.0002450426 0.0024838701 -0.0049413085 0.0035424332
errors_3 %>% apply(2, sd) # Standard deviation of errors
## err_RF err_XGB err_SVM err_AGG
## 0.08121525 0.08378682 0.06710358 0.09581434
Because of the negative (hence, risky) weights, the dispersion of errors of the aggregate prediction is far from competitive. This is a well known fact: diversification can only work when the underlying building blocks tell different stories. It is not the case here. Working with variations in feature values could help solve this issue.
It is in fact well known that unconstrained optimisations of learners are risky. In 1996, Breiman introduced the notion of stacked regressions. He makes the case that enforcing positivity of weights (leading to convex combinations) is the right way to go.
More generally, this has led to a family of sophisticated approaches called stacked ensembles. One implementation is provided in the ML library h2o: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/stacked-ensembles.html
Below, we illustrate this idea using the optimisation package quadprog. If we write \(\mathbf{\Sigma}\) for the covariance matrix of errors, we seek \[\mathbf{w}^*=\underset{\mathbf{w}}{\text{argmin}} \ \mathbf{w}'\mathbf{\Sigma}\mathbf{w}, \quad \mathbf{1}'\mathbf{w}=1, \quad w_i\ge 0,\] The constraints will be handled as:
\[\mathbf{A} \mathbf{w}= \begin{bmatrix} 1 & 1 & 1 \\ 1 & 0 & 0\\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \mathbf{w} \hspace{9mm} \text{ compared to} \hspace{9mm} \mathbf{b}=\begin{bmatrix} 1 \\ 0 \\ 0 \\ 0 \end{bmatrix}, \]
where the first line will be an equality and the last three will be inequalities.
if(!require(quadprog)){install.packages("quadprog")}
library(quadprog)
nb_mods <- nrow(Sigma)
w_stacked <- solve.QP(Dmat = Sigma,
dvec = rep(0, nb_mods),
Amat = rbind(rep(1, nb_mods), diag(nb_mods)) %>% t(),
bvec = c(1,rep(0, nb_mods)),
meq = 1
)
w_stacked$solution
## [1] 1.000000e+00 0.000000e+00 1.496491e-16
The opimal combination consists of taking RF only! A more diversified combination would require more models or models that capture different patterns (i.e., that are uncorrelated or negatively correlated).
In attempts to white-box complex machine learning models, one dichotomy stands out:
Below, we implement the LIME (Local Interpretable Model-agnostic Explanations from Ribeiro et al. (2016)) algorithm from the eponymous R package. We detail the approach below (we omit the subtlety of the interpretable representations in the original paper as it is not relevant in our context).
The original (complex) model is \(f\) and it is approximated at instance \(x\) by the interpretable model \(g\) that belongs to a large class \(G\). The vicinity of \(x\) is denoted \(\pi_x\) and the complexity of \(g\) is written \(\Omega(g)\). LIME seeks an interpretation of the form \[\xi(x)=\underset{g \in G}{\text{argmin}} \, \mathcal{L}(f,g,\pi_x)+\Omega(g),\] where \(\mathcal{L}(f,g,\pi_x)\) is the loss function (error/imprecision) induced by \(g\) in the vicinity \(\pi_x\) of \(x\).
The penalisation \(\Omega(g)\) is for instance the number of leaves or depth of a tree or the number of predictors in a linear regression.
The vicinity of \(x\) is defined by \(\pi_x(z)=e^{-D(x,z)^2/\sigma^2},\) where \(D\) is some distance measure. Note: this function decreases when \(z\) shifts away from \(x\).
The tricky part is the loss function. In order to minimise it, LIME generates artificial samples close to \(x\) and averages/sums the error on the label that the simple representation makes. The formulation is the following: \[\mathcal{L}(f,g,\pi_x)=\sum_z \pi_x(z)(f(z)-g(z))^2\] the errors are weighted according to their distance from the initial instance \(x\).
The question here is: how important is ESG in the determination of future returns?
library(rpart)
library(rpart.plot)
load("data_esg.RData")
data_esg <- data_esg %>%
group_by(Tick) %>%
mutate(F_Return = lead(Close)/Close - 1) %>% # Return for each stock
ungroup() %>%
group_by(Date) %>% # Normalisation takes place for each date
mutate_at(vars(Vol_1M:ESG_rank), normalise) %>% # Apply normalisation on features
ungroup()
fit_esg <- rpart(F_Return ~ Vol_1M + Mkt_Cap + P2B + D2E + ESG_rank,
data = data_esg,
cp = 0.001,
maxdepth = 5)
rpart.plot(fit_esg)
fit_esg$variable.importance %>%
data.frame(importance = .) %>%
rownames_to_column(var = "variable")
## variable importance
## 1 D2E 1.1431022
## 2 P2B 0.8451331
## 3 Mkt_Cap 0.6669449
## 4 ESG_rank 0.4196548
## 5 Vol_1M 0.3295422
fit_esg$variable.importance %>%
data.frame(importance = .) %>%
rownames_to_column(var = "variable") %>%
ggplot(aes(x = importance, y = reorder(variable, importance))) +
geom_col() + theme_light() + ylab("Importance")
Thus, according to this data & decision tree, ESG is not the most prominent driver of future returns.
One all-purpose package for machine learning in R is caret (see, e.g., http://topepo.github.io/caret/index.html, it is sort of the ancestor to the tidymodels). We have avoided to work with it up to now (we chose to work directly with individual and more narrowly-scoped packages), but LIME works well with caret, so it is time to test it! Also note that in fact caret resorts to pre-existing packages, like randomForest for instance. Finally, a bluit-in tool is included in caret for grid optimisation on hyperparameters and another one that computes variable importance: caret does pretty much everything!
if(!require(lime)){install.packages(c("lime", "caret"))}
library(lime)
library(caret)
grid_default <- expand.grid( # Must include ALL XGB PARAMETERS! If not => error.
nrounds = 10,
max_depth = 5,
eta = 0.8,
gamma = 0.1,
colsample_bytree = 1,
min_child_weight = 1,
subsample = 1
)
fit <- train(y = train_sample$F_Return, # Train features
x = as.matrix(train_features), # Train label
method = "xgbTree", # For boosted trees
tuneGrid = grid_default,
objective = "reg:squarederror"
)
explainer <- lime(train_features, fit) # First step in LIME
The list of all models in caret can be found here (there are many ~240): https://topepo.github.io/caret/available-models.html
The LIME package proceeds in two steps. First, the lime() function (used above) embeds the model and the distribution of features. This allows the second step (below) to generate random points around the chosen instance. This second step is performed by the explain() function. This function has many options which do not review. We refer to https://uc-r.github.io/lime for a detailed overview.
explanation <- explain(
x = train_features[1:2,], # We look at the first two instances in train_sample
explainer = explainer, # Input: the explainer variable created above
n_permutations = 1000, # Nb samples used for the loss function
dist_fun = "gower", # Distance function. "euclidean" is one alternative
n_features = 4
)
plot_features(explanation) # Plot!
In the arguments of explain(), we asked explanations for 2 instances (the first two in train_feature). Moreover, we asked for exactly 4 features. The above graph shows these features for the two instances. A blue/red colour highlights a positive/negative effect of the feature and the size of the rectangle represents the relative importance between the variables (they are ordered from most important at the top to least important at the bottom).
There is another way to quantify the relative importance of features in the determination of a local prediction. It stems from cooperative game theory. The idea it to investigate combinations of players and the payoffs that are generated by these combinations. The ‘value’ of a player can then be assessed by looking at what happens to payoffs if the player is removed from the combinations. In a ML framework, the players are the features and payoffs are model values.
We give the formula below and explain it shortly after.
\[\phi_k=\sum_{S \subseteq \{x_1,\dots,x_K \} \backslash x_k}\underbrace{\frac{|S|!(K-|S|-1)!}{K!}}_{\text{weight of coalition}}\underbrace{\left(f_{S \cup \{x_k\}}(S \cup \{x_k\})-f_S(S)\right)}_{\text{gain when adding } x_k}\]
\(S\) is any subset of the coalition that doesn’t include feature \(k\); its size/cardinal is \(|S|\).
In the equation above, the model \(f\) must be altered because it’s impossible to evaluate \(f\) when features are missing. In this case, several possible options:
Below, we use the iml package (for Interpretable Machine Learning). The packages has several features like variable importance, but we only resort to Shapley values.
if(!require(iml)){install.packages("iml")}
library(iml) # Interpretable ML package
predictor <- Predictor$new(fit, # This wraps the model & data
data = train_features,
y = train_sample$F_Return)
shapley <- Shapley$new(predictor, x.interest = train_features[1,]) # Compute the Shapley values
shapley$plot() + theme_light() # Plot the values!
Compared to the LIME interpretation, there are some discrepancies…
Green finance has soared recently and we refer to www.esgperspectives.com for a review on the topic. One central question pertains to the link between sustainability (green & social firms) and financial performance. Below, we use the data_esg dataset to investigate this question.
We resort to a panel model:
\[r_{t+1,n}= a_n+ \textbf{x}_{t,n}\textbf{b}+e_{t+1,n}\]
to explain returns. In the predictor variables \(\textbf{x}_{t,n}\), we include an ESG component.
if(!require(plm)){install.packages("plm")} # This is a package for panel models
library(plm)
plm(F_Return ~ Vol_1M + Mkt_Cap + P2B + D2E + ESG_rank,
data = data_esg,
model = "within",
index = "Tick") %>%
summary()
## Oneway (individual) effect Within Model
##
## Call:
## plm(formula = F_Return ~ Vol_1M + Mkt_Cap + P2B + D2E + ESG_rank,
## data = data_esg, model = "within", index = "Tick")
##
## Balanced Panel: n = 384, T = 84, N = 32256
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -0.86378356 -0.04239699 0.00081299 0.04243840 2.07170700
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## Vol_1M 0.0021091 0.0022124 0.9533 0.3404
## Mkt_Cap -0.0838442 0.0061635 -13.6033 < 2.2e-16 ***
## P2B -0.0334246 0.0053720 -6.2220 4.970e-10 ***
## D2E 0.0237141 0.0043055 5.5078 3.661e-08 ***
## ESG_rank -0.0040270 0.0038532 -1.0451 0.2960
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 227.46
## Residual Sum of Squares: 224.44
## R-Squared: 0.01328
## Adj. R-Squared: 0.0012665
## F-statistic: 85.7808 on 5 and 31867 DF, p-value: < 2.22e-16
Sadly, we see that the ESG column does not help to explain future profitability.
If we resort to a simple sorting procedure….
data_esg %>%
group_by(Date) %>%
mutate(type = if_else(ESG_rank > 0.5, "green", "brown")) %>%
group_by(type) %>%
summarise(m = mean(F_Return, na.rm = T))
## # A tibble: 2 × 2
## type m
## <chr> <dbl>
## 1 brown 0.0107
## 2 green 0.0113
The green assets have average returns barely above those of the brown ones.
This section is based on Bailey and Lopez de Prado (2014). The problem we address is that of backtest overfitting. When a manager or analyst tests a strategy on past data, a major risk is that a false positive emerges: the strategy will seem profitable in the test, but will lose money in live trading. In practice the analyst will test many strategies and only select a few: those that performed best in-sample. In order to correct for in-sample results that seem too optimistic, it is customary to shrink Sharpe ratios by some factor, usually close to 0.5.
Bailey and Lopez de Prado (2014) propose an approach to evaluate the statistical significance of Sharpe ratios when \(M\) strategies have been tested. Usually, the one with maximum SR is favoured. But is it a statistically sound choice?
The following test procedure will assess its relevance \[t = \phi\left(\frac{(SR-SR^*)\sqrt{T-1})}{\sqrt{1-\gamma_3SR+\frac{\gamma_4-1}{4}SR^2}} \right),\]
where \(\gamma_3\) and \(\gamma_4\) are the skewness and kurtosis of the returns of the strategy and \(SR^*\) is the theoretical average value of the maximum SR over \(M\) trials:
\[SR^*=\mathbb{E}[SR_m]+\sqrt{\mathbb{V}[SR_m]}\left((1-\gamma)\phi^{-1}\left(1-\frac{1}{N}\right)+\gamma \phi^{-1}\left(1-\frac{1}{Ne}\right) \right),\]
where \(\phi\) is the cdf of the standard Gaussian law and \(\gamma\) is the Euler-Mascheroni constant. The index \(m\) refers to the number of strategy trials and \(N\) is the total number of tested strategies. If \(t\) defined above is below a certain threshold (e.g., 0.95), then the \(SR\) cannot be deemed significant.
We now illustrate this result.
Our (very bad) research strategy is to test factor strategies based on thresholds. We pick one feature (e.g., Mkt_Cap) and test a policy that invests uniformly (EW portfolios) only in the stocks that are the most tilted towards one direction, i.e., above or below a particular threshold. In this (Mkt_Cap) case, invest in small firms, or large firms. And we then repeat the operation over all features. This generates a large array of strategies. Each strategy has three components:
We create the corresponding function below.
strat <- function(data, feature, thresh, direction){
data_tmp <- select(data, all_of(feature), Date, F_Return) # Data
colnames(data_tmp)[1] <- "feature" # Colname
data_tmp %>%
mutate(decision = direction * feature > direction * thresh) %>% # Investment decision
group_by(Date) %>% # Date-by-date analysis
mutate(nb = sum(decision), # Nb assets in portfolio
w = decision / nb, # Weights of assets
return = w * F_Return) %>% # Asset contribution
summarise(p_return = sum(return)) %>% # Portfolio return
summarise(avg = mean(p_return), sd = sd(p_return), SR = avg/sd) %>% # Perf. metrics
return()
}
We then test the function on two possible strategies (i.e, choice of triplet).
strat(data_f, "Mkt_Cap", 0.5, 1) # Large cap
## # A tibble: 1 × 3
## avg sd SR
## <dbl> <dbl> <dbl>
## 1 0.00473 0.0423 0.112
strat(data_f, "Mkt_Cap", 0.5, -1) # Small cap
## # A tibble: 1 × 3
## avg sd SR
## <dbl> <dbl> <dbl>
## 1 0.0128 0.0492 0.261
The first line tests a strategy that invests when Mkt_Cap is larger than 0.5 (note: we work with data_f, hence the feature values all lie in [0,1]). The second does the opposite (i.e., invests in small firms) and is much more profitable.
We then test a large panel of configuration via a grid approach: we use brute-force data-mining which is not often the best idea.
feature <- c("Mkt_Cap", "P2B", "Vol_1M", "D2E", "Prof_Marg") # Features
thresh <- seq(0.2,0.8, by = 0.1) # Threshold values values
direction <- c(1,-1) # Decision direction
pars <- expand.grid(feature, thresh, direction) # The grid
feature <- pars[,1] %>% as.character() # re-features
thresh <- pars[,2] # re-thresholds
direction <- pars[,3] # re-directions
Finally, we run the loop inside the pmap() wrapper.
grd <- pmap(list(feature, thresh, direction), # Parameters for the grid search
strat, # Function on which to apply the grid search
data = data_f # Data source/input
) %>%
unlist() %>%
matrix(ncol = 3, byrow = T)
grd <- data.frame(feature, thresh, direction, grd) # Gather & reformat results
colnames(grd)[4:6] <- c("mean", "sd", "SR") # Change colnames
grd <- grd %>% mutate_at(vars(direction), as.factor) # Change type: factor (for plot)
grd %>% ggplot(aes(x = thresh, y = SR, color = feature)) + # Plot!
geom_point() + geom_line() + facet_grid(direction~.) + theme_light()
The biggest effect of direction is observed for Mkt_Cap. As usual, this is the most decisive variable in our sample.
Note: these are SR at the monthly frequency. There is a time-scale effect with the SR: the numerator is proportional to time while the denominator is proportional to its square root. Hence a simple proxy for annualised SR is the monthly SR times \(\sqrt{12}\).
The best strategy in the above set is to invest in firms that are below the 0.3 quantile of market capitalisation. This would be our choice. Is it relevant? Let’s find out using the deflated SR test. First, we code the function.
DSR <- function(SR, Tt, M, g3, g4, SR_m, SR_v){ # First, we build the function
gamma <- -digamma(1) # Euler-Mascheroni constant
SR_star <- SR_m + sqrt(SR_v)*((1-gamma)*qnorm(1-1/M) + gamma*qnorm(1-1/M/exp(1))) # Avg. Max SR
num <- (SR-SR_star) * sqrt(Tt-1) # Numerator
den <- sqrt(1 - g3*SR + (g4-1)/4*SR^2) # Denominator
return(pnorm(num/den))
}
Next, we compute its arguments… and pass them to DSR().
# Next, the parameters:
M <- nrow(pars) # Number of strategies we tested
SR <- max(grd$SR) # The SR we want to test
SR_m <- mean(grd$SR) # Average SR across all strategies
SR_v <- var(grd$SR) # Std dev of SR
# Below, we compute the returns of the strategy by recycling the code of the strat() function
data_tmp <- select(data_f, "Mkt_Cap", Date, F_Return) # Mkt_Cap is the retained feature
colnames(data_tmp)[1] <- "feature"
returns <- data_tmp %>%
mutate(decision = feature < 0.3) %>% # Investment decision: 0.3 is the best threshold
group_by(Date) %>% # Date-by-date computations
mutate(nb = sum(decision), # Nb assets in portfolio
w = decision / nb, # Portfolio weights
return = w * F_Return) %>% # Asset contribution to return
summarise(p_return = sum(return)) # Portfolio return
g3 <- skewness(returns$p_return) # Function from the e1071 package
g4 <- kurtosis(returns$p_return) + 3 # Function from the e1071 package
Tt <- nrow(returns) # Number of dates
DSR(SR, Tt, M, g3, g4, SR_m, SR_v) # The sought value!
## [1] 0.45605
This value is very far from the 0.95 confidence threshold. The SR of our best strategy is not statistically significant because its location compared to the distribution of all SR is not enough to the right.
We have seen that the correlation between errors are too high in our base case ensemble model (with 3 learners). Try to build a similar aggregation model but with the predictors being the variations of the raw predictors (\(\Delta X_t=X_t-X_{t-1}\)). Use the mutate() or mutate_at() and lag() functions to create the variations.
Create a portfolio strategy based on SVM forecasts. Choose your variables wisely.
Create an ensemble with 5 or 6 learners (you can add LASSO and ridge regressions to the ones used above) and see if the combination involves more than one model.