How Engage Affects Retention - Buffer's Data Blog

Updated on February 1, 2021.

In this analysis we will attempt to estimate the effect that using the engagement feature has on retention for Buffer customers.

The dataset used contains around 38 thousand Buffer customers that started subscriptions on or after January 2020. We’ll separate the customers into groups defined by whether or not they replied to a comment during the period in which their subscriptions were active.

Key Findings

The data suggests that customers that replied to a comment with the engagement feature churn at significantly lower rates than those that did not.

This outcome is likely affected by selection bias – customers that used the feature likely have Instagram accounts with more followers and engagement, which would have granted them earlier access to the product. Customers that churned were less likely to have been invited to use Engage.

The data could suggest that using the engagement feature lowers the likelihood of churning, or that active customers with low churn probabilities are more likely to make use of the feature.

Data Exploration

Around 5% of customers in our dataset replied to a comment with the engagement feature.

# count engage users
users %>% 
  count(used_engage) %>% 
  mutate(prop = percent(n / sum(n)))

## # A tibble: 2 x 3
##   used_engage     n prop 
##   <lgl>       <int> <chr>
## 1 FALSE       36677 95%  
## 2 TRUE         1878 5%

Of customers that didn’t reply to a comment, around 35% have canceled their subscription. Only around 7% of customers that did reply to a commend have canceled. This stat shows that the data is potentially unbalanced or biased.

# count churned users
users %>% 
  group_by(used_engage, canceled) %>% 
  summarise(customers = n_distinct(stripe_customer_id)) %>% 
  mutate(prop = customers / sum(customers))

## # A tibble: 4 x 4
## # Groups:   used_engage [2]
##   used_engage canceled customers   prop
##   <lgl>       <lgl>        <int>  <dbl>
## 1 FALSE       FALSE        23323 0.647 
## 2 FALSE       TRUE         12737 0.353 
## 3 TRUE        FALSE         1749 0.933 
## 4 TRUE        TRUE           126 0.0672

To analyze the potential effect on retention we’ll use a common technique called survival analysis.

Survival Analysis

Survival analysis is a common branch of statistics used for analyzing the expected duration of time before a certain event occurs. It is commonly used in the medical field to analyze mortality rates, hence the term “survival”.

It’s especially useful when the data is right-censored, meaning there are cases in which the event hasn’t happened yet, but will likely happen at some time in the future. In the figures below, see the numbers with a “+”. These refer to customers that haven’t churned yet – their time to churn is X days “+”.

# build survival object
km <- Surv(users$time_to_cancel, users$canceled)

# preview data
head(km, 10)

##  [1]  17+  56+ 301+  89+  91+ 306+  64+ 336+ 199+  40+

To begin our analysis, we use the formula Surv(time, canceled) ~ 1 and the survfit() function to produce the Kaplan-Meier estimates of the probability of “survival” over time. The times parameter of the summary() function gives some control over which times to print. Here we set it to print the estimates for 1, 30, 60 and 90 days, and then every 90 days thereafter.

# get survival probabilities
km_fit <- survfit(Surv(time_to_cancel, canceled) ~ 1, data = users)

# summarise 
summary(km_fit, times = c(1, 7, 14, 30, 60, 90 * (1:10)))

## Call: survfit(formula = Surv(time_to_cancel, canceled) ~ 1, data = users)
## 
##  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
##     1  38309     383    0.990 0.000505        0.989        0.991
##     7  37301     312    0.982 0.000680        0.981        0.983
##    14  36450     150    0.978 0.000751        0.976        0.979
##    30  34531    1367    0.940 0.001240        0.937        0.942
##    60  28760    2943    0.855 0.001866        0.852        0.859
##    90  24170    2135    0.790 0.002200        0.785        0.794
##   180  14169    3772    0.651 0.002750        0.646        0.657
##   270   7219    1559    0.565 0.003165        0.558        0.571
##   360   2076     593    0.500 0.003850        0.492        0.507

The survival column shows the estimate for the percentage of customers still active after a certain number of days after subscribing. For example, around 94% of subscriptions are still active by day 30 and around 85% are still active by day 60.

This curve can be plotted.

You can see dips in the curve every 30 days, which makes intuitive sense given monthly billing periods. Now lets segment the customers by whether or not they replied to a comment and fit survival curves for each group

# get survival probabilities for each segment
km_fit <- survfit(Surv(time_to_cancel, canceled) ~ used_engage, data = users)

# summarise 
summary(km_fit, times = c(1, 7, 14, 30, 60, 90, 180))

## Call: survfit(formula = Surv(time_to_cancel, canceled) ~ used_engage, 
##     data = users)
## 
##                 used_engage=FALSE 
##  time n.risk n.event survival  std.err lower 95% CI upper 95% CI
##     1  36433     383    0.990 0.000531        0.989        0.991
##     7  35463     312    0.981 0.000714        0.980        0.982
##    14  34667     150    0.977 0.000789        0.975        0.978
##    30  32835    1362    0.937 0.001299        0.934        0.940
##    60  27205    2934    0.849 0.001949        0.845        0.853
##    90  22725    2121    0.780 0.002292        0.776        0.785
##   180  13252    3719    0.637 0.002843        0.632        0.643
## 
##                 used_engage=TRUE 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     1   1876       0    1.000 0.00000        1.000        1.000
##     7   1838       0    1.000 0.00000        1.000        1.000
##    14   1783       0    1.000 0.00000        1.000        1.000
##    30   1696       5    0.997 0.00130        0.995        1.000
##    60   1555       9    0.992 0.00224        0.987        0.996
##    90   1445      14    0.982 0.00330        0.976        0.989
##   180    917      53    0.940 0.00653        0.928        0.953

We can see that, although the sample is small, customers that used Engage have churned at significantly lower rates than those that did not. For example, around 85% of customers that did not use Engage were still active by day 60, compared to around 99% of customers that did use Engage.

The survival curves show a seemingly huge difference in churn rates.

One should keep in mind that this is likely affected by bias in the data and causality has not been established.

Customers that replied to a comment in Engage were either part of a hand-selected batch of Business customers that had early access or self selected by replying to a comment on their own. The fact is that active customers were more likely to use the engagement feature and customers that churned were less likely to have had the opportunity to use it.

Pro Plans

We can segment the data further and see if the survival curves differ for Pro and Premium/Business subscriptions. The plot below shows that churn rates are significantly lower for both Pro and non-Pro subscriptions that have used Engage.

It’s worth mentioning again that causality hasn’t been established. People that have replied to a comment in Engage have been less likely to churn, but that doesn’t mean that the introduction of the engagement feature made them less likely to churn. It is more likely that these users were already more likely to be retained.