Seamless data collection

By Ravi EvaniFiled under NaytraLeave a Comment

Data collection from IoT device

Let’s envision the Naytra of you, the consumer. Assume you are in complete control of your Naytra, what data it collects and what it learns. To put your mind at ease lets assume everything it collects is encrypted and only you have the key to it. Your Naytra runs on the cloud and all collected data syncs to the cloud. If you think about it, much of your data already syncs to the cloud – your email, your files on box, your photos, videos, blogs, your browsing history, etc. So this is nothing new. Your nétra learns from you in two ways: the first is implicitly from your digital exhaust, a lot of which is shown in the picture above, and the other way is by being explicitly trained by you in natural language, possibly by speaking to it on your phone.

In fact your phone itself could be the single largest source of data for your Naytra. To say that your phone knows about you more than you know about you isn’t an exaggeration, it’s a fact. Face it, there is nothing else today that lives closer to you than your phone. You probably sleep next to it, right. Do you remember your location every minute of every day? Do you remember what you messaged your friend last Tuesday at 10:47 am, word for word? Of course not. In fact, without photos, entire holidays would slide out of your mind. Compared to your phone, your brain holds a tiny amount of information – much of it wrong, all of it lossy. Even if a phone was replaced with something better in your life like maybe a tiny computer chip under your skin, it would still hold more information about you. From now on there will always be something with you that retains more information about you than you.

It’s evident that the purpose for a consumer to collect data of the world around her is for her model to be able to make better predictions related to her needs, context, priorities and desires. Today, there is a lot of data that the consumer voluntarily puts on the cloud. And much of this data could be used for better relevance predictions if there is a seamless way to collect this data. Let us see an example of how this could be done.

Capturing digital exhaust from an IoT device

We take an example of capturing a person’s digital exhaust from an IoT device such as a fitness tracker on their wrist. There are things to consider such as connecting the device to the network, reading from the hardware, sending data over the network and processing the data before we actually store it.

Four types of data

There are four types of data to be collected from the fitness tracker:

  • Device Data such as device Ids, classes, model numbers etc. These are static for a particular device and would be unchanged.
  • State Data is about the state of the device. This could change often, such as a device might change location, go to sleep, etc. This might be continually updated, perhaps every minute or hour.
  • Telemetry Data is what the sensors are generating. These are continuous and depending on the amount of data, this can get very large very quickly.
  • Command Data captures the commands the user sends to the device to make it do something, such as starting to track a run.

To collect this type of digital exhaust, we consider the following architecture:

Message Queue

Depending on the device, it could stream a large amount of telemetry data. What happens if the device is trying to send a person’s data to the cloud and the cloud is not ready to handle it? For that we use a message queue, which takes a message, which in this case could be a person’s activity (burned calories, heart rate etc.), and store it into a queue until the stream processing system is ready. The message queue would be highly scalable, such as Apache Kafka or Google Cloud Pub/Sub.

Stream Processing

The stream processing system takes the data from the message queue and writes the stream of continuous data into the time-series database. The stream processing would highly scale to a number of devices and used both for streaming and batch, such as with Apache Spark, or Google Cloud Data Flow.

State Database

As the application on the device runs, it sends state data to the State database. The state data & metadata such as device data are stored in the state database. This could be a NOSQL database that stores data in JSON format that’s easy to query and access.

Time-series Database

Finally, this is where the continuous stream of telemetry data is stored. It could be a key- value type of NoSQL database where the keys are sorted lexicographically to make it ideal for storing time-series data. If each key is mapped to a row of data, then we could map the same key to any number of pieces of sensor data coming in sputters. This is also highly scalable to large data streams and numbers of devices. Examples for this time-series database could be Apache Cassandra or Google BigTable.

Conclusion

Now we have seen how a consumer could collect their digital exhaust. The Naytra model acts on this data to make predictions. How will Naytra make predictions and respond to relevance requests from brands?

Before I talk about how it could do that, let us get on a common understanding of the science and technology of relevance prediction.

Leave a Reply

Your email address will not be published. Required fields are marked *