Here we try to solve the problem of predicting the relevance of a marketing email to a consume and, in the process, understand the building blocks of the Naytra model that can be applied to relevance predictions for many kinds of marketing messages.
Problem Statement
Purpose is to predict the relevance of a marketing email received by a consumer. The prediction is to be based on 1) Past email content within the person’s mailbox 2) Fitness tracker data as collected by their fitness tracking device and 3) Activity tracking of the consumer within their email box such as opens, clicks, etc.
Step 1: Identifying Features
Feature selection
We first determine what attributes of an email are most significant in being able to predict the relevance of an email. Emails consist of several features falling into two main categories:
- Social Features: These are based on the degree of interaction between the sender and the recipient. Such as the percent of a sender’s mail that’s read by the recipient or the click through rate for a particular sender.
- Content Features: These are related to the content of the email, such as subject lines, keywords, recent terms that are correlated with the recipient acting on the email.
- Activity Features: These are related to the actions the user did with an email. Such as whether the user opened it, archived it, or labeled it or left it unread for a while.
An extension of the content features, and one that’s most important for our problem statement, are Product Features.
These are keywords of the underlying product or service that the marketing message is talking about. Let’s take an example of an email offering a discount of a fitness supplement. Has the consumer viewed this type of product recently or taken an action on it? Has the consumer been doing exercises for which he would need those marketed supplements? Does the consumer generally open emails on fitness product offers and discounts? These are aggregate notions of a consumer’s activity in relationship to the product that’s part of the message.
Feature engineering
Some features will be selected, while others will be engineered, which involves transforming raw data to features that can be used for prediction. As an example, when we look at activity features we take a feature engineering approach such as using “Event counts” or “Activity counts”.
For example, the count of the number of times one clicked offers inside marketing emails for fitness products in the last 15 days could be an engineered feature.
Feature vector
A feature vector is an n-dimensional vector of numerical representation of selected and engineered features. So as an example, the action of opening of an email could have a numeric value 1, the recency of an email could have a numeric value 30, and the type of workout a person does could have a numeric value of 5. So a feature vector representing all this could be:
Step 2: The Relevance Metric
The relevance is ultimately based on how the user interacts with email after its delivery. Our goal is to predict the probability that the user will perform a “conversion” action in the mail, such as click within it. If we call that probability p, then we predict
where Pr is the posterior probability, a is the action performed on the mail, A is the set of actions denoting conversion actions (e.g. click on link within email), f is the feature vector and s indicates that the user has had the opportunity to see the email.
So we are essentially computing the posterior probability of the user performing a conversion action dependent on the features of the email and that the user has had an opportunity to see the email.
Step 3: The Model
Let’s use a simple linear logistic regression model here, the core of which is the logistic function that generates what’s popularly known as the S-curve. It takes any real valued number and maps it to a value between 0 and 1 using the following formula:
Here “value” is the numerical value that we want to transform, and is the euler’s (exp) number.
Now, with logistic regression, input values (in our case the email features we chose in our feature vector f) are combined linearly using weights (w) to predict the output (the relevance classification), which is a binary value. Our prediction equation becomes:
Where n represents the number of features and p the relevance prediction.
In addition to the model, since the learning is happening online while the user is working on their email, each action happens in a sequential manner and we need a method to remove a lot of noise such as a user accidentally opening an email etc. Methods such as PA-II regression variant algorithms can be used for this purpose.
The Learning
We learn the weights using a popular method called the gradient descent algorithm. This uses the gradient of the log-likelihood (which is the probability of the observed outcome) with respect to each weight.
We do this in an iterative process where, in each iteration, we calculate new weights by adding a fraction of a computed gradient. Iteratively updating the weights increases the likelihood of the observed outcome.
This process is stopped when the sum of the product of features and weights are closest to the observed outcome.
The Prediction
Finally, making predictions with this model is as simple as plugging numbers (features and learned weights) into the prediction equation and calculating the result.
This general sequence of steps remains for most predictions for the Naytra model. What will be different are the learning mechanism and the parameters to learn based on the type of message.
With this background, I will talk about the methods and considerations involved with Naytra to make relevance predictions.
Relevance prediction in the Naytra model
Seamless data collection
Naytra: The future of personalized marketing