Analytics solutions are used by enterprises to better understand their clients and refine their business.
Let’s say I have a business proposing an application developed to control the engine of e-bikes and to provide insights on the users’ rides. The usual analytic solution collects data about the users and its position and then performs analytics on them. For example, I could know what are the best places to install a new charging station. I would just have to query my data looking for places where the e-bike users end up with less than 5% of battery.
Current way of doing analytics
The use case mentioned above seems naive but it requires three steps.
First, I need to collect from the application some unique ID like my users’ device and their geolocation. It would help me to perform further analytics on the data like looking for the most popular places where e-bikes batteries are under 5%. These data are personal as they can be used to identify individuals using the app. For example, we can get places of living from the data. In Europe, due to GDPR, it means I need to retrieve the end user's consent to collect and process such data. I must also process them in Europe.
Secondly, we need a central place to store this data and a place that scales well with the number of users and granularity of the collected data.
Finally, the computational power must scale with the data volumetry collected.
The figure below describes how we usually do analytics.
Currently, in Europe, the GDPR requirements are to be handled by the juridic department of the business. The collection and processing of data is provided by cloud platforms. Then solutions to do analytics exist and the question is about how much will it cost ?
Now imagine that you can perform analytics in a much simpler way, without needing the previous requirements i.e. you don’t need the user consent because you do not collect personal data and you do not face scalability issues for storage and processing.
This is the main topic of this article. We discuss an alternative solution : DLDB.io !
DLDB.io : new way of doing analytics
DLDB.IO provides an SDK (IOS, Android, flutter, react-native, unity) that stores geolocation and users’ data on the end user terminal. To do analytics, this data is processed by the terminal to answer queries sent by the Analytics engine. The results of the query sent by the terminal are already aggregated and cannot allow identification of the users.
More precisely queries sent by the analytics engine have a lifespan. The terminal will pull the query once it is online and the application is used. If the terminal receives the query during the query lifespan the result will be collected and used by the analytics engine.
The image below describes how DLDB.io does analytics.
Benefits of DLDB.io analytics solution
The benefits are:
- GDPR compliance by design:
- Users’ data are not collected on the cloud. It stays on the end user terminal which means there is no risk to bad usage of the data due to data dissemination. Indeed bad usage is more likely to happen when data is collected and copied multiple times.
- No user’s consent is needed to collect data as data is not collected (thanks captain obvious!) Right to be forgotten: users’ data is easy to delete because it is stored only on the user’s terminal and is easy to delete.
- A registry of queries performed on users’ data is available on the terminal.
- Scalability
- The storage is mainly handled by the end user device. A massive storage solution is not needed.
- The processing is partly deported to the end user’s device. Data sent to the analytics engine is pre-aggregated and cleaned.
All of these benefits are making DLDB.io a great solution for analytics. The distributed computing performed by the end user terminal is an application of edge computing which is very innovative in the analytics field.
Now if we look deeper on how analytics works with this solution we can find some limitations despite everything.
Limitations of DLDB.io analytics solution
- The most obvious limitation: if the device is not available during the lifespan of the query, the Analytics engine will not get a result on the query for this device. Yet we can assume that for a fleet of terminals we can get the results of enough terminals to make the query result statistically valid even if we do not get results from all the terminals.
- The lifespan of a query depends on the use rate of the application. If the application is used a lot, the query lifespan can be low whereas if the application is used once a day the query lifespan will be around a day. This is limiting the freshness of the data on which we wan’t to perform the query.
Finally, I bet that this analytics solution is going in the flow of history because the regulations are becoming more constraining which represents a risk for businesses doing analytics. The DLDB.io solution mitigates the risk. Also the volume of data collected by current analytics solutions is increasing and the edge computing solution might reduce the cost of storage and processing of this data. Indeed, we can imagine that the distributed processing on the end users terminal is less greedy in CO2 than a single big query. This is good for the planet!
If you are interested you can request a beta invite and get support from the DLDB.io team.