16 min read

Sticky Header Experiment (EID21)

In this analysis we’ll explore the results from the sticky header A/B test, also known as experimnent EID21. The experiment was run as an A/B test via our A/Bert framework, and split visitors randomly 50/50 between the control and the variation groups.

The experiment hypthosis was:

  • If we show a sticky header to all visitors across the entire website, then we’ll see an increase in Publish Small Business trial starts, because more visitors will have the opportunity to click the “Try Buffer for Business” CTA in the header, which will drive more traffic to the /business page and lead to an increase in Small Business trial starts.

Given this hypothesis, our success metric was:

  • number of Publish Small Business trial starts

TLDR

The result of the experiment is that there is insufficient evidence to confirm the hypothesis, as there was no observed statistical difference between the control and variation groups.

Data Collection

To analyze the results of this experiment, we will use the following query to retrieve data about users enrolled in the experiment.

# connect to bigquery
con <- dbConnect(
  bigrquery::bigquery(),
  project = "buffer-data"
)
# define sql query to get experiment enrolled visitors
sql <- "
  with enrolled_users as (
    select
      anonymous_id
      , experiment_group
      , first_value(timestamp) over (
      partition by anonymous_id order by timestamp asc
      rows between unbounded preceding and unbounded following) as enrolled_at
    from segment_marketing.experiment_viewed
    where 
      first_viewed 
      and experiment_id = 'eid21_sticky_header_all_times'
  )
  select
    e.anonymous_id
    , e.experiment_group
    , e.enrolled_at
    , i.user_id as account_id
    , c.email
    , c.publish_user_id
    , a.timestamp as account_created_at
    , t.product as trial_product
    , t.timestamp as trial_started_at
    , t.subscription_id as trial_subscription_id
    , t.stripe_event_id as stripe_trial_event_id
    , t.plan_id as trial_plan_id
    , t.cycle as trial_billing_cycle
    , t.cta as trial_started_cta
    , s.product as subscription_product
    , s.timestamp as subscription_started_at
    , s.subscription_id as subscription_id
    , s.stripe_event_id as stripe_subscription_event_id
    , s.plan_id as subscription_plan_id
    , s.cycle as subscritpion_billing_cycle
    , s.revenue as subscription_revenue
    , s.amount as subscritpion_amount
    , s.cta as subscription_started_cta
  from enrolled_users e
    left join segment_login_server.identifies i
      on e.anonymous_id = i.anonymous_id
    left join dbt_buffer.core_accounts c
      on i.user_id = c.id
    left join segment_login_server.account_created a
      on i.user_id = a.user_id
    left join segment_publish_server.trial_started t
      on i.user_id = t.user_id
      and t.timestamp > e.enrolled_at
    left join segment_publish_server.subscription_started s
      on i.user_id = s.user_id
      and s.timestamp > e.enrolled_at
  group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
"
  
# query BQ
users <- dbGetQuery(con, sql)

Exploratory Analysis

Let’s start by reviewing a few of the summary statistics from out data.

skim(users)
Table 1: Data summary
Name users
Number of rows 477118
Number of columns 23
_______________________
Column type frequency:
character 17
numeric 2
POSIXct 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
anonymous_id 0 1.00 36 36 0 473819 0
experiment_group 0 1.00 7 9 0 2 0
account_id 438745 0.08 24 24 0 35794 0
email 438858 0.08 7 58 0 35686 0
publish_user_id 439435 0.08 24 24 0 35114 0
trial_product 450771 0.06 7 7 0 2 0
trial_subscription_id 450771 0.06 18 18 0 25902 0
stripe_trial_event_id 450771 0.06 18 18 0 25963 0
trial_plan_id 450771 0.06 11 44 0 19 0
trial_billing_cycle 450771 0.06 4 5 0 2 0
trial_started_cta 450931 0.05 30 68 0 50 0
subscription_product 474311 0.01 7 7 0 2 0
subscription_id 474311 0.01 18 18 0 2183 0
stripe_subscription_event_id 474311 0.01 18 18 0 2186 0
subscription_plan_id 474311 0.01 11 44 0 19 0
subscritpion_billing_cycle 474311 0.01 4 5 0 2 0
subscription_started_cta 474625 0.01 30 67 0 40 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
subscription_revenue 474311 0.01 35.72 51.04 10 15 15 50 500 ▇▁▁▁▁
subscritpion_amount 474311 0.01 98.60 173.92 10 15 50 144 2030 ▇▁▁▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
enrolled_at 0 1.00 2019-12-04 18:23:45 2020-02-13 13:19:42 2019-12-18 19:49:21 473756
account_created_at 438745 0.08 2019-04-23 18:05:15 2020-02-18 19:22:35 2019-12-14 19:45:37 35794
trial_started_at 450771 0.06 2019-12-04 20:26:27 2020-02-18 19:22:47 2019-12-24 10:25:08 25968
subscription_started_at 474311 0.01 2019-12-04 20:55:43 2020-02-18 20:31:12 2020-01-07 21:02:03 2226

Let’s start with a quick validation of the visitor count split between the two experiment groups.

users %>% 
  group_by(experiment_group) %>% 
  summarise(visitors = n_distinct(anonymous_id), accounts = n_distinct(account_id), trials = n_distinct(trial_subscription_id), subscriptions = n_distinct(subscription_id)) %>% 
  mutate(visitor_split_perct = visitors / sum(visitors)) %>%
  kable() %>% 
  kable_styling()
experiment_group visitors accounts trials subscriptions visitor_split_perct
control 237661 18102 13186 1093 0.501586
variant_1 236158 17694 12718 1092 0.498414

Great, there is a total of 475,219 unique visitors enrolled in the experiment, and the percentage split between the two experiment groups is within 0.15% of a 50/50 (well within reason for our randomization split). This confirms that our experiment framework correctly split enrollments for the experiment.

res <- prop.test(x = c(18348, 18055), n = c(238347, 236872), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(18348, 18055) out of c(238347, 236872)
## X-squared = 0.95332, df = 1, p-value = 0.3289
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.0007589254  0.0022741252
## sample estimates:
##    prop 1    prop 2 
## 0.0769802 0.0762226
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.000757599899208872. The observed proportion for the first group is 0.0769802011353195 (18,348 events out of a total sample size of 238,347). For the second group, the observed proportion is 0.0762226012361106 (18,055, out of a total sample size of 236,872).

The confidence interval for the true difference in population proportions is (-7.589254210^{-4}, 0.0022741). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.3288756. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000757599899208872 or less than -0.000757599899208872.

We can see that the number of visitors that ended up creating a Buffer account (ie, signed up for their first Buffer product), was a few hundred higher in the control group. Using a quick proportion test we can see that this difference in proportion of enrolled visitors that created a Buffer account is NOT statistically significant, with a p-value of 0.329 (which is far more than the generally accepted 0.05 threshold). TL;DR, there was no difference between the variation and control with the total visitor to total signup rate.

Next, we will calculate how many users from each experiment group started a Publish trial.

users %>% 
  mutate(has_publish_trial = trial_product == "publish") %>% 
  group_by(experiment_group, has_publish_trial) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_publish_trial) %>% 
  group_by(experiment_group) %>% 
  summarise(users_with_publish_trials = users) %>%
  kable() %>% 
  kable_styling()
experiment_group users_with_publish_trials
control 11761
variant_1 11406

There were 11,790 users in the control group that started a Publish trial, and 11,541 in the variation group. Just like above, we should also run a proportion test here.

res <- prop.test(x = c(11790, 11541), n = c(18348, 18055), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(11790, 11541) out of c(18348, 18055)
## X-squared = 0.43279, df = 1, p-value = 0.5106
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.006548244  0.013274911
## sample estimates:
##    prop 1    prop 2 
## 0.6425768 0.6392135
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.0033633333508416. The observed proportion for the first group is 0.642576847612819 (11,790 events out of a total sample size of 18,348). For the second group, the observed proportion is 0.639213514261977 (11,541, out of a total sample size of 18,055).

The confidence interval for the true difference in population proportions is (-0.0065482, 0.0132749). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.5106212. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.0033633333508416 or less than -0.0033633333508416.

We can see that the difference in proportion of accounts that started a Publish trial is also NOT statistically significant, with a p-value of 0.51. TL;DR, there is no difference between the two groups in total Publish trial starts.

users %>% 
  mutate(has_sbp_trial = (trial_product == "publish" & str_detect(trial_plan_id, ".small.") == TRUE)) %>% 
  group_by(experiment_group, has_sbp_trial) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_sbp_trial) %>% 
  group_by(experiment_group) %>% 
  summarise(users_with_sbp_trials = users) %>%
  kable() %>% 
  kable_styling()
experiment_group users_with_sbp_trials
control 2787
variant_1 2668

There were 2769 users in the control group that started a Publish Small Business trial, and 2670 in the variation group. Just like above, we should also run a proportion test here.

res <- prop.test(x = c(2769, 2670), n = c(18348, 18055), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(2769, 2670) out of c(18348, 18055)
## X-squared = 0.63555, df = 1, p-value = 0.4253
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.004344667  0.010412983
## sample estimates:
##    prop 1    prop 2 
## 0.1509156 0.1478815
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.00303415785535768. The observed proportion for the first group is 0.150915631131458 (2,769 events out of a total sample size of 18,348). For the second group, the observed proportion is 0.147881473276101 (2,670, out of a total sample size of 18,055).

The confidence interval for the true difference in population proportions is (-0.0043447, 0.010413). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.4253263. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00303415785535768 or less than -0.00303415785535768.

We can see that the difference in proportion of accounts that started a Publish SBP trial is also NOT statistically significant, with a p-value of 0.43. This means we can say with confidence that there is no observed difference in the number of Publish Small Business trials started between the two experiment groups.

Next, we will calculate how many users from each experiment group started a paid Publish subscription.

users %>% 
  mutate(has_publish_sub = (subscription_product == "publish")) %>% 
  group_by(experiment_group, has_publish_sub) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_publish_sub) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_publish_users = users) %>%
  kable() %>% 
  kable_styling()
experiment_group paying_publish_users
control 972
variant_1 959

There were 741 users in the control group that started a paid Publish subscription, and 709 in the variation group. Just like above, we should also run a proportion test here, for both the account to paid subscription proportion, and also the trial to paid subscription proportion.

res <- prop.test(x = c(741, 709), n = c(18348, 18055), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(741, 709) out of c(18348, 18055)
## X-squared = 0.26838, df = 1, p-value = 0.6044
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.002955545  0.005189490
## sample estimates:
##     prop 1     prop 2 
## 0.04038587 0.03926890
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.00111697253812972. The observed proportion for the first group is 0.0403858731196861 (741 events out of a total sample size of 18,348). For the second group, the observed proportion is 0.0392689005815564 (709, out of a total sample size of 18,055).

The confidence interval for the true difference in population proportions is (-0.0029555, 0.0051895). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.6044234. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00111697253812972 or less than -0.00111697253812972.

There is no statistical difference between the two proportions of signups that started a paid Publish subscription, as the p-value is 0.604.

res <- prop.test(x = c(741, 709), n = c(11790, 11541), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(741, 709) out of c(11790, 11541)
## X-squared = 0.17726, df = 1, p-value = 0.6737
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.004864409  0.007697851
## sample estimates:
##     prop 1     prop 2 
## 0.06284987 0.06143315
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.00141672140017238. The observed proportion for the first group is 0.0628498727735369 (741 events out of a total sample size of 11,790). For the second group, the observed proportion is 0.0614331513733645 (709, out of a total sample size of 11,541).

The confidence interval for the true difference in population proportions is (-0.0048644, 0.0076979). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.6737409. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00141672140017238 or less than -0.00141672140017238.

There is no statistical difference between the two proportions of trial starts that started a paid Publish subscription, as the p-value is 0.67.

Next, let’s look into the number of Publish SBP paid subscriptions between the two experiment groups.

users %>% 
  mutate(has_sbp_sub = (subscription_product == "publish" & str_detect(subscription_plan_id, ".small.") == TRUE)) %>% 
  group_by(experiment_group, has_sbp_sub) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_sbp_sub) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_sbp_users = users) %>%
  kable() %>% 
  kable_styling()
experiment_group paying_sbp_users
control 76
variant_1 69

The number of users for a Publish SBP paid subscription was 47 in the control and 43 in the variation. Just like above, we should also run a couple of proportion tests here too.

res <- prop.test(x = c(47, 43), n = c(18348, 18055), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(47, 43) out of c(18348, 18055)
## X-squared = 0.057684, df = 1, p-value = 0.8102
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.0008949943  0.0012549450
## sample estimates:
##      prop 1      prop 2 
## 0.002561587 0.002381612
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.000179975352061444. The observed proportion for the first group is 0.00256158709396119 (47 events out of a total sample size of 18,348). For the second group, the observed proportion is 0.00238161174189975 (43, out of a total sample size of 18,055).

The confidence interval for the true difference in population proportions is (-8.949942710^{-4}, 0.0012549). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.8101945. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000179975352061444 or less than -0.000179975352061444.

There is no statistical difference between the two proportions of signups that started a paid SBP subscription, as the p-value is 0.810.

res <- prop.test(x = c(47, 43), n = c(11790, 11541), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(47, 43) out of c(11790, 11541)
## X-squared = 0.0464, df = 1, p-value = 0.8294
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.001415507  0.001936671
## sample estimates:
##      prop 1      prop 2 
## 0.003986429 0.003725847
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.000260582196937878. The observed proportion for the first group is 0.00398642917726887 (47 events out of a total sample size of 11,790). For the second group, the observed proportion is 0.00372584698033099 (43, out of a total sample size of 11,541).

The confidence interval for the true difference in population proportions is (-0.0014155, 0.0019367). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.8294495. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000260582196937878 or less than -0.000260582196937878.

There is no statistical difference between the two proportions of Publish trials that started a paid SBP subscription, as the p-value is 0.829.

Final Results

Given the above observations, the result of the experiment is that there is insufficient evidence to confirm the hypothesis. Based on these observations, there is no difference between the control and the variation.