Using Data Science to understand the value of Acts of Kindness on Customer Conversions

Think about the last time you received a gift via a random Act of Kindness that really provided you something valuable when you needed it most. Do you remember how you felt? I bet you felt fantastic and were prone to have a positive attitude toward that person. Acts of Kindness are a common way companies foster loyalty and engagement from their customer base; however, determining their impact on the bottom line can be difficult.

In this post, we will examine how eSage Group used data on past customer behavior, as well as data science, to answer a simple initial question: what are the relevant aspects of customers with one purchase vs customers who have made repeat purchases? Also, we talk about how to take this analysis forward and validate our assumption around the impacts of Acts of Kindness.

First Things First: A Hypothesis is NOT an Experiment

Many people confuse these two concepts and think that by evaluating past behavior, they can definitively predict the future. If you’ve ever purchased a stock or made a financial investment, you have probably seen this written somewhere:

“Past performance is not indicative of future results.”

Now, customer behavior is not exactly like stocks, so we can say that past performance could point to future behavior, if we can determine a hypothesis, and finally, test that hypothesis with a valid experiment. This article covers the first part–coming up with a hypothesis that our client can use to run an experiment.

Why do I need a Hypothesis?

An hypothesis is an “educated guess,” based on prior knowledge about a subject or group of cases. Think of it as “What am I trying to measure or prove?” It can be one thing to say, “I believe our free gift increases repeat purchases!” but that’s pretty vague. Do you have an idea of how much value is gained from a free gift or if it shortens the time taken between purchases? Do you know the make-up (demographic, tenure, recency, initial purchase price) of a customer who is likely to make a repeat purchase if given a free gift or is it just everyone? Having a valid hypothesis, with clear items to measure, will make any subsequent experiment more accurate and, usually, less costly to run. A rundown of the qualities of a good hypothesis can be found on Shana Rusonis’s blog here.

We have the Data: Now What?

Our analysis for our eCommerce client started with a question: What is the core difference between the engagement level with customers who placed just one order and the engagement level with customers that had more than one order? This company sent small free gifts to select customers at different points during the sales process; however, we did not know of this beforehand. Our goal was to simply see what makes Repeat purchasers different.

To answer this question, let’s set up the scenario:

  • What data do we have at hand?
  • What is our objective for the analysis?
  • Which analytics method fits best?

We had historical sales order data for each of the customers, as well as website navigation records. For Phase 1, we wanted to keep the analysis simple and not worry about session tracking, different browsers and devices, bounce rates, and all the rest. So, we choose to focus on the order records, which already had enough separate data points to proceed.

Environment

For this study, we used the statistical modeling program, R v3.5. We also used RStudio v1.2.11 as the main IDE. Both are free tools supported by the open-source community. A deep dive into R is out of scope for this post but you can learn more by visiting the R website. We also used the FactoMineR package in R.

Preparing the Data: Don’t Forget Your Labels

First, we needed to account for missing values. There is rarely a case where we don’t have to deal with missing data. For instance, we quickly found that not all customers left a review or a photo. Straight out of the database, these empty values were represented by an “NA”, which means NULL in R parlance. Since analysis in R is not very friendly to empty values, we swapped these out for zero. Handling this ahead of time will save you lots of debugging headaches later on.

Here is a sample of the raw data we had available for analysis:

Sample from the Initial Data
Figure 1: Sample from the initial data

We also needed to create some Ranges for various fields. As we can see, the data has a lot of similar values, especially in the total_reviews and product_review_rate fields. When we observe the number of customers that left a review, we get a fairly wide range:

Table count for total_reviews field
Figure 2: Table count for total_reviews field

A vast majority of customers did not leave a review, but of those that did, we put them into buckets to make our analysis easier. Likewise, customers who gave a review of 1 or 2 stars did not have a high opinion of the product, so we created a group based on this assumption.

The code used to create buckets via range conversion is straightforward. For each column of values, we took both the minimum and maximum values and supplied the number of total groups that we would like to create. Here is a sample of the code:

Code for range function
Figure 3: code for “Range” function

Next, we created our Labels, which is just a way of differentiating one Customer from another. In our case, we simply wanted one extra field that tells whether a Customer was a one-time purchaser or had purchased more than once. We already had a field for total number of orders (e.g., num_orders) so this was a simple calculation. We called our label field “num_orders_group”. Each customer had a value of “O” for one order and “M” for multiple orders.

The label will be used by the analysis function to create each group of cases. While this label can seem arbitrary, its core purpose is to provide predictive profiles for the algorithm to create around these classifiers. An impractical example could be using 50 labels like the state the order address is in, but this isn’t a useful prediction for our business needs.

Now that our data is nicely bucketed, we are ready for the analysis. Here’s a sample of the data that we fed into the catdes function (explained later):

Sample of clean data
Figure 4: Sample of clean data

Running the Analysis

With our customers already sorted by label, we next ran a profiling exercise to determine what made each group special. Profiling is closely related to cluster analysis, which is where cases get sorted into “clusters” or groups according to attributes that make them similar. Once a cluster is established, profiling is used to determine the exact relevance of each attribute within a specific group.

A basic example of clustering can be found here. Also, a more in-depth look using Python can be found here.

Since we have already defined our groups (or clusters), we can move right on the profiling step. The FactoMineR package provides a great profiling function called catdes (short for Category Description). It is responsible for profiling the different groups and determining the relevance of each feature. In our case, we have only two groups – One-time purchasers (labeled: O) and Multi-purchasers (labeled: M).

catdes function execution
Figure 5: catdes function execution

You can learn more about catdes and how it works by visiting the FactoMineR site.

In our data matrix, column #11 contains our label as to whether customers have purchased once or more. catdes uses this field, defined by the num.var argument to create the different groups.

The proba argument means that in order to be included in the final output, a feature must have a p.value above this value. We’ll cover p.value and the other importance metrics shortly.

The catdes function shows the links between the cluster variable (our label) and the other quantitative variables in the dataset. Running it against our prepared data returned the following information about our two groups of customers:

Table figures with Data

Cla/Mod vs Mod/Cla vs Global vs. P.Value? Help!

The fields returned by the catdes function help us determine which features in our dataset were linked the strongest to our label. These terms can be confusing so here’s a quick definition of each field:

Table figure with definitions

Reviewing the Results: A Surprise Awaits

After reviewing the results, we noticed something interesting right away. Those customers who received gifts just prior to or after their purchases had made more than one purchase and, even better, they tended to leave a positive product review about the products they bought. Customers making only a single purchase did not receive a gift and usually left no product review at all. From this analysis, we can form a hypothesis that says that giving a gift has an influence on the customer/client decision.

Now, we could form an experiment to send gifts to one-time purchasers that had left no product reviews, in an effort to move them into the repeat purchasers camp. But can we do better? What else is out there that could tighten our hypothesis or make an experiment more focused?

Next Steps: Balance, Prediction, and More Data!

At this point, how close were we to solving this puzzle about gifts and their relevance? The above analysis suggests that gifts may matter in driving multiple purchases, but we still cannot confirm a causal relationship. But we can do more work to give further evidence to our hypothesis:

1. Use Stratified Sampling

Check Distributions / Balance the Dataset: Our dataset has a balance issue – more than 77% of all customers have only ever purchased one time:

Distribution check of target variables
Figure 6: Distribution check of target variables

This means attributes favoring one-time purchasers will always appear more often than those favoring multiple time purchasers and that we will naturally pull more cases from the dominant group. We are automatically biased into finding features related to non-repeat purchasers.

In order to defeat this problem and get a more rationalized view of the dataset, we should use stratified sampling. Stratified sampling is where we take random samples from each group at different intervals but with the same quantity of the target variable. For instance, if we know what 100 cases purchased more than once, then we want 100 cases from the group that purchased once. However, which 100 cases? This is where the randomization and multiple passes come in:

A graphical example of sampling
Figure 7 – A graphical example of sampling. Credits to this post

By taking random samples of the overall population, we can take the average of our findings in each sample to achieve a much more realistic result. This technique derives its power from a mathematical fact known as the Central Limit Theorem, which is discussed at length here.

Example of the distribution of the target variable
Figure 8- Example of the distribution of the target variable

For a working example on Stratified Sampling, check out Gianluca Malato’s post here.

2. Use Prediction to Confirm Profiling Results

In order to confirm (or disavow) our earlier Profiling results, we should run a prediction that takes our stratified samples as an input. This model will predict which of our cases will be one-time vs. multiple purchases and how relevant our different attributes are in the prediction. The sampling exercise performed above will ensure that we have properly distributed data flowing into model training.

A reasonable first approach will be to use Gradient Boosting Machine (GBM), an ensemble technique based on random forests to see how well our data predicts single versus repeat customers and to determine feature importance in those predictions. With GBM, each forest is fed with a weighted version of the original dataset, with higher weights going to the points that are difficult to classify and lighter weights going to the easier ones. A good primer on GBM can be found here.

Gradient Boost Machine in action
Figure 9- Gradient Boost Machine in action. From https://miro.medium.com/max/1400/0*paPv7vXuq4eBHZY7.png

At this point, with the model finished, we would be able to provide more evidence of our earlier findings that sending a gift leads to repeat purchases. We would have both the profiling data as well as prediction model results with the importance it assigned to the different dataset features.

3. More Data Please!

At this point, more features would definitely be better, even if it means re-running the profiling exercise. At the moment, we have only a few features to provide a counterpoint to gift sent. Adding additional features would give the algorithm a chance to let those new features compete for prevalence in each of our classes (O or M) against the gift sent feature. In our case, we have the User Navigation data, which includes pages viewed, time since last session, average session time, and other web metrics that would allow us to create a more complete profile of our customers.

Summary

For this exercise, we started with the goal of identifying relevant characteristics of two groups, repeat purchasers and one-time purchasers for a consumer goods manufacturer.

For the initial analysis, we first needed to prepare the data. We decided to create ranges out of certain attributes that contained numerical data, such as the number of orders and product reviews since values close together usually told the same story. We also added our own labels to the cases since it was known how many orders each customer had placed. Since our labels were already in place, we knew how many clusters (or groups) we would have (two) and moved on to profiling the data using the R catdes function.

This initial analysis produced a result that showed us that Acts of Kindness (Gift Sent) was highly ranked in terms of importance for repeat purchasers. However, we still needed supporting evidence that backs up the finding before a real experiment could be justified.

We then talked about possible next steps to corroborate our initial research. This included creating proper samples of the initial dataset to ensure we were taking the same number of cases from each group for analysis. These stratified samples could then be fed to a Gradient Boosted Model (GBM), a subset of random forests, to determine if Gift Sent was still a determining factor when predicting which customers make repeat purchases.

Finally, although we started with only Order History data, we proposed adding additional features to the mix in order to give the Gift Sent attribute more competition.

Data preparation, prior labeling, profiling, creating stratified samples, and finally collaborating initial findings through modeling, are all key steps towards arriving at a good hypothesis to support formal experimentation.

Finally, to truly test our hypothesis, we would look to conduct a formal experiment. To do this, we would now want to select a random sample of new customers for treatment and control groups. We would give a gift to just one of these groups and evaluate whether giving the gift had a significant impact on repeat purchases. With this information we will be able to confirm whether giving a gift truly causes a repeat purchase!

Regardless of what the data says, a good mantra that you might want to consider is to always show appreciation to your customers. It’s a great way to retain their business and maintain a solid professional relationship.

For further reading on Acts of Kindness and how they can impact business, read Seven Ways to Properly Give a Gift to Your Customers by Shep Hyken and Giftology by John Ruhlin.

Thanks for reading!