Big Data, Machine Learning and Customer Segmentation

The information from this article was published by Journal of Theoretical and Applied Electronic Commerce Research (JTAER). Some adaptions, header additions were made. If you want to take advantage of investment opportunities related to this topic, see bottom of article for two ETFs that are relevant.

The second digital wave has provided the companies with large quantities of data generated by the customers in real time. But it also has provided the technologies, such as Big Data, which allow for processing, storing and analysing almost any kind of data in real time too. This has boosted the application of disciplines such as machine learning, a combination of statistical and computational theories of learning processes algorithms that are now commonly used for finding customer patterns, which will help companies to solve complex business questions of any kind by analysing massive volumes of online data [14].

Big Data Machine Learning: Volume, Variety, Velocity, Veracity (the 4 Vs)

In this new scenario, Big Data and machine learning extend the application of traditional customer segmentation methods to e-commerce. Customer segmentation methods now benefit from the 4 V’s (Volume, Variety, Velocity, Veracity) which define the characteristics of Big Data: Availability of big Volume of data stored in the cloud which makes sample selection procedure more robust and easier to perform avoiding undesired biases. Variety of the data gathered that fully captures the customers’ characteristics and the possible activities and behaviour that they show on the websites. The data can be processed and analysed in real time providing Velocity to the evaluation and classification of the customers into segments. This facilitates the personalization of communication between the customer and the brand, which is able to trigger the right message at the right moment. Finally, Veracity, which refers to the added quality of the data that is fully accessible and more reliable thanks to the automation processes that avoid human intervention and manipulation. This is an important aspect to consider, as business decisions with big impact on financial results will be taken based on customer segmentation in e-commerce [9].

Enchriched Machine Learning

Customer segmentation methods benefit also from the evolution of enriched machine learning methods in two complementary dimensions.

Managing Rich Data Sets

The first consists of methods that can manage large and rich datasets of customers’ characteristics, behaviour and transactions, with both qualitative and quantitative variables, that are candidates to be included in the customer segmentation model.

Data Collection with Dynamic Longitudinal View over Time

The second comprises the development of methods that are able to segment customers in a dynamic longitudinal view over time (using a data sample collected over time). By adding the time variable to segmentation, these methods not only show the classification of the customers into groups at a specific moment in time, but are able also to depict their evolution over time drawing the customer journey between one segment to the next one in their path from acquisition to being an advocate of the brand [1]. Hence, the more advanced the segmentation methods the more benefits are provided for business decision making.

Clustering and Identification of Profitable Customer Groups

Empirical research shows how accessing large audiences in e-commerce forces the companies to deal with more different groups of customers. For example, in [2] eight different clusters of target customers were managed designing different marketing strategies according to their characteristics, giving a 2,947% difference between the most profitable group of customers and the least profitable one.

Benefits vs. Costs of AI Algorithms, ICT Infrastructure Impact, Privacy Issues

Information Communication Technologies (ICTs) bring major benefits to society but their extensive usage is also showing downsides. Some of them were expected and others have risen unexpectedly and are indeed, very varied. They range from privacy issues, the risks of unconscious or conscious bias of AI algorithms, to even the sustainability and environmental costs, etc. For example, the amount of electric power needed by Information Technology (IT) infrastructures will be 21% of the global demand for electricity in 2030 [10].

source: Ballestar, María T. 2021. "Editorial: Segmenting the Future of E-Commerce, One Step at a Time." J. Theor. Appl. Electron. Commer. Res. 16, no. 2: I-III.

[1] M. T. Ballestar, L. M. Doncel, J. Sainz, and A. Ortigosa-Blanch, A novel machine learning approach for evaluation of public policies: An application in relation to the performance of university researchers, Technological Forecast and Social Change, vol. 149, p. 119756, 2019.

[2] M. T. Ballestar, P. Grau-Carles and J. Sainz, Customer segmentation in e-commerce: Applications to the cashback business model, Journal of Business Research, vol. 88, pp. 407-414, 2018.

[9] X. Jin, B. W. Wah, X. Cheng, and Y. Wang, Significance and challenges of big data research, Big Data Research, vol. 2, no. 2, pp. 59-64, 2015.

[10] N. Jones, How to stop data centres from gobbling up the world’s electricity, Nature, vol. 561, no. 7722, pp. 163-166, 2018.

[14] T. M. Mitchell, The discipline of machine learning. Vol. 9. Pittsburgh: Carnegie Mellon University, School of Computer Science, Machine Learning Department, 2006.

Investment Opportunities

LY Digital Economy ESG Filter, LU2023678878, EBUY (TER: 0.15%)
LY Disruptive Tech ESG Filter, LU2023678282, QBIT (TER: 0.15%)

If you would like to propose additions to the above list of investment opportunities, feel free to comment below or drop an email to

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

Du kommentierst mit Deinem Abmelden /  Ändern )


Du kommentierst mit Deinem Twitter-Konto. Abmelden /  Ändern )


Du kommentierst mit Deinem Facebook-Konto. Abmelden /  Ändern )

Verbinde mit %s