In this first installment, I'm going to focus on:
The example used throughout this 3 part series is centered around an eCommerce site. We are going to look at the spend associated with promotions. The mix of promotions (expressed as percent of total promotional spend) is the input to the model. The outputs of the model are AOV (average order value), gross margin % and conversion rate. The goal is to maximize AOV, gross margin % and conversion rate with the best mix of promotional spend.
Let's look at the code:
# load the data from a CSV
SRC_PATH <- '/analytics/margin_model/'
data <- read.csv(file=paste(SRC_PATH,'margin_modeling.csv',sep=''), header=TRUE)
# split the data 80% train/20% test
sample_idx <- sample(nrow(data), nrow(data)*0.8)
data_train <- data[sample_idx, ]
data_test <- data[-sample_idx, ]
# create a linear model using the training partition
gm_pct_model <- lm(GROSS_MARGIN_RATE ~ PROMO_AFFILIATE_UNITS + PROMO_COMP_SHOP_ENGINES_UNITS + PROMO_DISPLAY_ADS_UNITS + PROMO_EMAIL_UNITS + PROMO_LOCAL_SEM_UNITS + PROMO_SEARCH_ENG_MKT_UNITS + PROMO_TELESALES_UNITS + PROMO_UNPAID_UNITS, data_train)
# save the model to disk
save(gm_pct_model, file=paste(SRC_PATH,'gm_pct_model.model',sep=''))
# load the model back from disk (prior variable name is restored)
load(paste(SRC_PATH,'gm_pct_model.model',sep=''))
# score the test data and plot pred vs. obs
plot(data.frame('Predicted'=predict(gm_pct_model, data_test), 'Observed'=data_test$GROSS_MARGIN_PCT))
# score the test data and append it as a new column (for later use)
new_data <- cbind(data_test,'PREDICTED_GROSS_MARGIN_PCT'=predict(gm_pct_model, data_test))
# score an individual row
predicted_gm_rate <- predict(gm_pct_model, data_test[1,])
It's amazing how little code it takes to automate the modeling and scoring process. Next, I'll show you how to perform non-linear optimization of these predictive models to determine the optimal promotional mix.
- Building/evaluating a predictive model with partitioned data
- Saving the predictive model to disk
- Loading the predictive model from disk
- Scoring data against a predictive model (within R)
The example used throughout this 3 part series is centered around an eCommerce site. We are going to look at the spend associated with promotions. The mix of promotions (expressed as percent of total promotional spend) is the input to the model. The outputs of the model are AOV (average order value), gross margin % and conversion rate. The goal is to maximize AOV, gross margin % and conversion rate with the best mix of promotional spend.
Let's look at the code:
# load the data from a CSV
SRC_PATH <- '/analytics/margin_model/'
data <- read.csv(file=paste(SRC_PATH,'margin_modeling.csv',sep=''), header=TRUE)
# split the data 80% train/20% test
sample_idx <- sample(nrow(data), nrow(data)*0.8)
data_train <- data[sample_idx, ]
data_test <- data[-sample_idx, ]
# create a linear model using the training partition
gm_pct_model <- lm(GROSS_MARGIN_RATE ~ PROMO_AFFILIATE_UNITS + PROMO_COMP_SHOP_ENGINES_UNITS + PROMO_DISPLAY_ADS_UNITS + PROMO_EMAIL_UNITS + PROMO_LOCAL_SEM_UNITS + PROMO_SEARCH_ENG_MKT_UNITS + PROMO_TELESALES_UNITS + PROMO_UNPAID_UNITS, data_train)
# save the model to disk
save(gm_pct_model, file=paste(SRC_PATH,'gm_pct_model.model',sep=''))
# load the model back from disk (prior variable name is restored)
load(paste(SRC_PATH,'gm_pct_model.model',sep=''))
# score the test data and plot pred vs. obs
plot(data.frame('Predicted'=predict(gm_pct_model, data_test), 'Observed'=data_test$GROSS_MARGIN_PCT))
# score the test data and append it as a new column (for later use)
new_data <- cbind(data_test,'PREDICTED_GROSS_MARGIN_PCT'=predict(gm_pct_model, data_test))
# score an individual row
predicted_gm_rate <- predict(gm_pct_model, data_test[1,])
It's amazing how little code it takes to automate the modeling and scoring process. Next, I'll show you how to perform non-linear optimization of these predictive models to determine the optimal promotional mix.
This comment has been removed by the author.
ReplyDeleteHi Scott, nice tutorial.
ReplyDeleteMight I suggest using the caret package for partitioning? http://cran.r-project.org/web/packages/caret/index.html
It has some nice features including balancing the target variable across partitions etc.
Nick
http://www.sonamine.com
Hi , could you provide the csv used in this example so that I could try it out? I am new to R and analytics , and would like to get a hang of it.. Your tutorial looks like I could get a decent start from here on predictive models.
ReplyDeleteThanks in advance