Analyzing Consumer Behavior in Fresh Supermarkets using Association Rules, Self-Organizing Maps, and RFM Model
Nai-Chieh Wei
Department of Industrial Management,
I-Shou University, Taiwan
ncwei@isu.edu.tw
An-Yu Guo
Department of Industrial Management,
I-Shou University, Taiwan
Cheng-JingLi
Department of Industrial Management,
I-Shou University, Taiwan
lichjing@hotmail.com
Abstract
This study presents a comprehensive analytical framework for exploring consumer behavior in fresh supermarkets by integrating association rule mining, customer segmentation, and value analysis. Specifically, the methodology employs the Apriori algorithm to uncover frequent item sets and strong product associations from transactional data. These insights form the basis for customer segmentation using Self-Organizing Maps (SOM), a neural network-based clustering approach that groups customers with similar purchasing patterns. Finally, the Recency-Frequency-Monetary (RFM) Recency-Frequency-Monetary (RFM) model is applied to evaluate customer lifetime value within each cluster. The integration of these three techniques provides both behavioral and financial perspectives, enabling supermarket managers to identify high-value customer segments and tailor marketing strategies accordingly. The study demonstrates that linking product-level associations with customer-level segmentation enhances the ability to personalize promotions, optimize product placement, and allocate resources efficiently. The methodology also allows for flexible expansion and adaptation to other retail sectors. This hybrid approach offers a robust foundation for data-driven decision-making and paves the way for intelligent retail management based on consumer-centric insights.
Keywords: Apriori algorithm, Self-Organizing Maps (SOM), Recency-Frequency-Monetary (RFM)
In the fast-evolving landscape of retail, fresh supermarkets have emerged as an essential component of modern consumer life. As consumer preferences diversify and competition intensifies, understanding shopping behavior has become crucial for retailers seeking to maintain a competitive edge. This study focuses on exploring the complex patterns of consumer purchases in fresh supermarkets, aiming to uncover actionable insights that can improve marketing strategies, customer relationship management, and inventory optimization. The importance of analyzing consumer behavior in fresh supermarkets lies in its potential to directly influence profitability. Shoppers in these markets tend to exhibit unique consumption patterns driven by freshness, price sensitivity, seasonality, and household dietary habits. Unlike other retail environments, fresh supermarkets involve high-frequency, low-margin transactions where even marginal improvements in customer targeting can result in significant financial gains. By leveraging advanced data mining techniques, retailers can move beyond anecdotal or experience-based strategies and adopt evidence-based decision-making frameworks.
The motivation for this research stems from the increasing availability of consumer transaction data and the growing need for intelligent systems that can process this information to support business strategy. Traditional segmentation approaches often fail to capture the nuanced preferences and dynamic behaviors of fresh supermarket shoppers. Hence, the integration of three robust analytical tools: Apriori algorithm, Self-Organizing Maps (SOM), and RFM (Recency, Frequency, Monetary) analysisoffers a comprehensive solution. Each of these methods contributes uniquely to understanding purchasing patterns, identifying customer segments, and determining customer value.
This study not only aims to fill a gap in the literature by applying these techniques in the context of fresh food retailing but also provides practical guidelines for practitioners in the industry. Through the identification of frequent itemsets, behavioral clustering, and customer valuation, this research enables supermarkets to formulate personalized marketing strategies that enhance customer loyalty and drive business growth.
Literature Review
Association rule mining is a powerful technique to uncover hidden patterns in large transactional datasets. Agrawal and Srikant (1994) pioneered the Apriori algorithm, which remains one of the most widely used methods in market basket analysis. Gan et al. (2018) provided a comprehensive survey on utility-oriented pattern mining, highlighting its application in retail for identifying profitable product combinations. Wahidi and Ismailova (2024) successfully implemented association rule mining in e-commerce, enhancing sales strategies based on customer purchasing behavior.The effectiveness of association rule mining has been demonstrated in diverse sectors, including retail (Chen, Sain, & Guo, 2012), where it helped derive strategic insights from frequent itemsets. Hu and Yeh (2014) demonstrated that even in the absence of customer identification, meaningful frequent patterns could still be discovered.
Self-Organizing Maps (SOM) for Customer Segmentation
The Self-Organizing Map (SOM) developed by Kohonen (2001) is widely used for clustering high-dimensional data in an unsupervised manner. This neural network technique is especially useful in visualizing the topological structure of customer segments. Barman and Chowdhury (2019) demonstrated the value of SOM in market segmentation, allowing for clear distinction between consumer profiles.
Holmbom, Eklund, and Back (2011) utilized SOM for customer portfolio analysis in business intelligence contexts, while Kiang, Hu, and Fisher (2006) extended SOM applications to telecommunications market segmentation. Vellido, Lisboa, and Meehan (1999) successfully applied neural networks, including SOM, to segment online shopping markets, demonstrating its robustness across domains. Saitoh (2020) further enhanced the usability of SOM by integrating supervised learning for persona development and strategic market analysis.
The RFM (Recency, Frequency, Monetary) model is a foundational tool in customer relationship management. Hughes (1994) was among the earliest to advocate for this model in marketing. More recent applications, such as those by Wei, Lin, and Wu (2010), have shown how RFM can be used to profile customers and enhance targeting.Combining RFM with clustering methods has yielded even greater insight. Safari, Safari, and Montazer (2016) compared different segmentation strategies and found RFM-based clustering to be effective in identifying valuable customer segments. Yeh, Yang, and Ting (2008) innovated by using a Bernoulli sequence to enhance RFM segmentation accuracy.
Liao, Chu, and Hsiao (2022) applied RFM and SOM together to e-commerce data, yielding practical customer segmentation strategies. Dogan, Ayçin, and Bulut (2018) supported similar findings in the retail context. Sarvari, Ustundag, and Takci (2016) evaluated RFM and demographic data to create actionable customer profiles.Nguyen (2021) proposed deep embedding clustering to improve segmentation and customer behavior prediction. Chattopadhyay et al. (2012) reviewed neural network-based segmentation trends, confirming the increasing integration of machine learning with traditional marketing models.
These studies collectively validate the integration of Apriori, SOM, and RFM as a rigorous methodology for analyzing consumer behavior in retail environments, offering strategic value for both academic researchers and industry practitioners.
Methodology
This research adopts a multi-stage analytical framework that integrates three core techniques: association rule mining using the Apriori algorithm, customer segmentation via Self-Organizing Maps (SOM), and customer value analysis using the Recency-Frequency-Monetary (RFM) model. Each method plays a distinct and complementary role in uncovering, interpreting, and quantifying consumer purchasing behavior in the context of fresh supermarkets. The integration of these methods ensures both behavioral segmentation and financial valuation, enabling decision-makers to implement targeted and profitable marketing strategies.
Association Rule Mining: Apriori Algorithm
Association rule mining is a data mining technique used to discover interesting relationships between items in large datasets. In the context of fresh supermarkets, it helps identify combinations of products that are frequently purchased together. This knowledge can be used for shelf placement, cross-selling strategies, and personalized promotions.
The Apriori algorithm is particularly well-suited for this task because it systematically explores item combinations and prunes unlikely itemsets early in the computation. It proceeds in two stages: first identifying frequent itemsets that meet a minimum support threshold, and then generating strong association rules that meet a minimum confidence threshold.
The three main metrics used in Apriori are:
- Support:
Support(A ⇒ B) = |A ∩ B| / N
where |A ∩ B| is the number of transactions that include both items A and B, and N is the total number of transactions.
- Confidence:
Confidence(A ⇒ B) = |A ∩ B| / |A|
This measures the likelihood that item B is purchased given item A is purchased.
- Lift:
Lift(A ⇒ B) = Confidence(A ⇒ B) / Support(B)
A lift greater than 1 indicates that item A positively influences the purchase of item B.
By applying the Apriori algorithm to transaction records, this study identifies 25 key products with the strongest associations. These products serve as a behavioral signature and become the feature base for subsequent SOM analysis.
Customer Segmentation via Self-Organizing Maps (SOM)
The Self-Organizing Map (SOM) is an unsupervised neural network model introduced by Kohonen. It projects high-dimensional data onto a lower-dimensional (typically 2D) grid, preserving the topological relationships between data points. This is particularly effective for visualizing and clustering customer purchasing patterns.
In this study, SOM is used to cluster customers based on their interactions with the 25 key products identified via Apriori. Each customer is represented as a vector of purchase frequencies across these products. SOM organizes similar customers into neighborhoods on the grid.The training process uses the following update rule:
wi(t+1) = wi(t) + α(t) h_bi(t) (x(t) - wi(t))
Where:
- wi(t): weight vector of neuron i at time t
- α(t): learning rate
- h_bi(t): neighborhood function centered around the Best Matching Unit (BMU)
- x(t): input vector (customer purchase vector)
The SOM output initially formed 21 micro-clusters, which were aggregated into 7 customer segments after interpreting product relevance and density. Each segment represents a distinctive consumer behavior profile, e.g., snack-focused, vegetable-focused, or bakery-loyal customers.
Customer Value Analysis Using RFM
The RFM model assesses the value of each customer to the business along three dimensions:
- Recency (R): How recently a customer made a purchase.
R = Today’s Date - Date of Last Purchase
- Frequency (F): How often the customer made purchases.
F = Total Number of Purchases
- Monetary (M): How much the customer spent.
M = ∑ Transaction Amounti
Each RFM component is typically ranked or scored (e.g., from 1 to 5), then combined into a single RFM score. Customers are then segmented into categories such as high-value, potential growth, at-risk, or lapsed.
In this study, RFM scores are computed for the 374 customers identified in the SOM clustering phase. This allows mapping of behavioral clusters onto customer value tiers.
Methodological Integration and Synergy
The strength of this methodology lies in its sequential integration:
This creates a three-layered insight framework:
- What products are connected? (Apriori)
- Who behaves similarly in buying those products? (SOM)
- Which groups are most valuable? (RFM)
The integration ensures that marketing decisions can be both behaviorally targeted and financially justified. For example, a cluster of frequent instant-noodle buyers who also have high monetary scores can be offered exclusive bundle promotions.
By applying these methods in tandem, fresh supermarket operators can:
- Tailor shelf layouts based on frequently associated items.
- Customize loyalty programs to high RFM scorers in specific SOM clusters.
- Adjust inventory and supply chain planning in line with purchasing clusters.
This methodology is not only technically sound but also managerially actionable. Its modularity allows for updating with new data, extension to other sectors, and incorporation of additional techniques like time-series forecasting or supervised learning.
This section presents the results derived from the application of the three key analytical methods—Apriori algorithm, SOM clustering, and RFM analysis—on the supermarket transaction dataset. The data-driven insights are interpreted in the context of customer behavior and retail strategy.
Using the Apriori algorithm, we extracted 38 strong association rules from 3,904 transaction records. These rules highlighted frequent co-occurrences between specific product categories, revealing consumer purchasing habits. Table 1 illustrates several high-lift rules:
Table 1. Sample Association Rules
Rule |
Itemset |
Support |
Confidence |
Lift |
1 |
Pork Instant Noodles → Kids' Noodles |
0.0103 |
0.2395 |
3.94 |
10 |
Taiwanese Bread → Western Pastries |
0.0169 |
0.3929 |
3.92 |
16 |
Pork Instant Noodles → Seafood Instant Noodles |
0.0118 |
0.4792 |
7.89 |
The high lift values suggest strong associations beyond chance. These findings support bundled marketing strategies and can be cross-validated with observed co-purchase frequencies in the raw transaction file.
From the 3,904 transactions, we identified 374 customers who frequently purchased items from the 25 most significant products found in the association rules. These customers were subjected to SOM clustering, which initially produced 21 distinct clusters. By evaluating product frequency within each cluster, these were further consolidated into 7 behaviorally meaningful groups.
Table 2. SOM Cluster Segments
Segment |
Key Product Categories |
Group 1 |
Braised, Roasted, Cold Dishes |
Group 2 |
Pork, Seafood, Beef Instant Noodles |
Group 3 |
Taiwanese Bread, Toast, Pastries |
Group 4 |
Fruits, Leafy Vegetables, Pork |
Group 5 |
Leafy Vegetables, Cakes |
Group 6 |
Southeast Asian Fruits |
Group 7 |
Poultry, Root Vegetables |
The clustering results align with the transaction file segments, demonstrating consistent behavioral groupings. Visualizations from the SOM grid show how clusters occupy distinct areas in the data space, with minimal overlap.
Figure 1 Clustering results by SOM.
The RFM analysis evaluated each of the 374 key customers along three axes:
Based on quintile-based scoring (1 = lowest, 5 = highest), customers were categorized into four groups:
Table 3. RFM Segment Descriptions
Segment |
Characteristics |
Strategic Recommendation |
High Value |
R:5, F:5, M:5 |
Priority retention, loyalty perks |
Potential Value |
R:5, F:3-4, M:3-4 |
Targeted promotions to build habits |
Moderate Value |
R:3, F:3, M:3 |
General communication, upsell offers |
Low Value |
R:1-2, F:1-2, M:1-2 |
Re-engagement campaigns, exit screening |
The analysis indicates that about 18% of the customer base are high-value customers who contribute disproportionately to revenue. Their distribution matches closely with customers from clusters Group 2 and Group 3.
The integration of the three methods reveals layered insight:
For example, customers in Group 2 (frequent instant noodle buyers) show strong intra-group item associations (from Apriori), are clearly clustered in the SOM output, and appear with high RFM scores,suggesting they are lucrative, habit-driven shoppers.
This study integrates association rule mining, SOM clustering, and RFM analysis to provide a comprehensive framework for understanding consumer behavior in fresh supermarkets. The findings reveal that customers exhibit distinct purchasing patterns, often preferring specific combinations of products such as instant noodles, bakery items, or fresh produce.
From a managerial perspective, these insights support the development of targeted marketing strategies. For instance:
Retailers can implement these findings using existing POS and CRM systems, tagging customers with segment and RFM scores to facilitate dynamic promotions and messaging.Overall, this study provides a structured analytical approach for transforming transaction data into strategic marketing intelligence in the fresh retail sector.
While the current study offers valuable contributions, several limitations should be acknowledged:
Future research may also explore hybrid models combining machine learning classification algorithms with traditional segmentation, or integrate external data such as social media reviews or customer feedback to enrich consumer profiling.