The Naytra model germinates from concepts of recommendation systems, which is a rapidly evolving body of work within the field of Data Science.
The core idea behind the learning model is encapsulated well in the following belief:
The Internet is a giant machine designed to give people what they want… We often think the Internet enables you to do new things… but people just want to do the same things they’ve always done.EVAN WILLIAMS, co-founder of Blogger, Twitter, Medium
The nétra learning model explores and exploits a consumer’s digital exhaust to meet one simple goal:
To score the relevance of marketing messages that have not been seen by the consumer.
In terms of a learning model, what does relevance mean?
A marketing message is wrapped around an item, which could be a product, service or even a piece of published content. So when we say “relevance of a marketing message”, in essence the model is predicting the relevance of the characteristics of the item packaged within that message. This not only includes finding relevance with textual content from the consumer’s digital exhaust but also media such as images and audio. Ultimately, based on certain reference characteristics, this model aims to predict an “opinion” a consumer will have of a marketing message. Based on this opinion we want to determine whether or not the message is relevant to the consumer.
Suppose that a consumer is looking for a pair of shoes and lets say the consumer already owns multiple pairs of shoes. The possible relevant marketing messages might include shoes with similar characteristics, but may also include fashion collections that have shoes with similar characteristics. So relevant messages could potentially have a wide range of characteristics while retaining important commonalities with what that consumer prefers.
Intuitively this evaluation will be generally based on scores implicitly given to information that the consumer interacts with online. However, if relevance is predicted in a vacuum with only a single consumer’s digital exhaust we may run into some serious problems:
- Sparse or no digital exhaust for a particular item: Think about a person considering getting a mortgage for the first time. There may be little or no references within the consumer’s digital exhaust to predict the relevance of an ad from a bank presenting the consumer with an offer.
- Relevance Bubble: This is when the consumer is caught in a cycle where similar types items are found relevant because the characteristics are coming from the same person’s digital exhaust, thereby causing a “bubble” that’s hard to get out of.
- No Serendipity: This is an extension of the relevance bubble where there is little scope for elements of novelty, surprise or delight in items that the model’s evaluation deems relevant.
To avoid this, we will widen the scope of the digital exhaust to also include data from social networks that the consumer is a part of. So the consumer’s digital exhaust is augmented with social activity (posted pins, comments, reviews, etc.) from the consumer’s social connections, the data of which the consumer has access to. And because it’s proven that consumers are heavily influenced by their social networks, this data is a good foundation to start with.
The Naytra model has two notions: Filter and guide. The filter is responsible for selecting interesting or useful messages that are relevant to the consumer, while the guide is responsible for ordering them, determining when and where each relevant message is to be presented to the consumer.
Methods of scoring
As I talked about earlier, we want to learn a statistical function that models what the consumer will find relevant. The scoring of relevance could use one of these 2 main approaches:
- Score estimation: Here unknown scores can be extrapolated from known scores via a model as we saw earlier. The collection of scores is used to learn a model, which is then used to predict scores, whether via a probabilistic approach, automatic learning techniques or statistical models.
- Data Exploitation: Here we have two methods where already known scores could be used to estimate the unknown scores:
- Content based methods: Items similar (defined by a measure of feature similarity) to those which the user found previously relevant (defined by a measure of behavioral activity) with those terms
- Collaborative filtering methods: Items deemed relevant by others within the consumer’s social network who have similar tastes and preferences (based on their social activity)
- Hybrid methods: A combination of content based and collaborative filtering methods
Some of the algorithms that work well with the above approaches are:
Accounting for Serendipity
Serendipity in scoring the relevance of an item can be seen as the experience of receiving unexpected and fortuitous messages; therefore it’s a way to diversify relevance predictions. There is also a difference between novelty and serendipity. If we adopt the definition from Herlocker, novelty will occur when the model suggests to the consumer an unknown item that she might have autonomously discovered. However, a serendipitous recommendation helps the user to find a surprisingly interesting item she might not have otherwise discovered. There are a couple of strategies that could be adopted for the Naytra model to induce serendipity into relevance:
- Introduce a method to predict unexpectedness relative to some measure of expectedness. This could be a combination of popularity and the messages a consumer recently found relevant, in order to predict the relevance of a new message.
- Introduce a concept of randomness In addition to including data from Social networks in the dataset. For example, the use of genetic algorithms has been proposed in the context of information filtering.
The right model for the right prediction
The model I walked through to predict the relevance of an email message works with well to predict the relevance based on text in an email. But what about emails containing images? What about the relevance of ads embedded within YouTube videos? Also, different learners perform differently based on the task. How do we unify all of these together under the consumer’s Naytra? How will Naytra determine the best learner for the prediction?
We can solve this challenge through machine learning itself! We have a metalearner for Naytra that learns about the learning models and is able to predict the appropriate model for the situation. The metalearner can itself be any learner, from a decision tree to a simple weighted method that I talked about earlier. In order to learn the weights, we replace the attributes of each original example with the learner’s predictions. So the learners that predict the correct relevance will get high weights and the inaccurate ones will tend to be ignored. This technique is called stacking and this is how Naytra could choose the best model for prediction.
Conclusion
The above-mentioned techniques are just a few of the techniques and considerations that need to be accounted for while designing the Naytra model. It’s evident that there is no single model that can deal with all these considerations and all possible message type. Therefore we looked at one type of ensemble learning that could be used to pick the best learning model.
In order to accomplish all of these, we need a lot of computational power and a technology stack that can support it, which is what I we will look at next.
The Naytra Architecture
An introduction to relevance prediction
Naytra: The future of personalized marketing