Sticky Header Experiment (EID21)

In this analysis we’ll explore the results from the sticky header A/B test, also known as experimnent EID21. The experiment was run as an A/B test via our A/Bert framework, and split visitors randomly 50/50 between the control and the variation groups.

The experiment hypthosis was:

If we show a sticky header to all visitors across the entire website, then we’ll see an increase in Publish Small Business trial starts, because more visitors will have the opportunity to click the “Try Buffer for Business” CTA in the header, which will drive more traffic to the /business page and lead to an increase in Small Business trial starts.

Given this hypothesis, our success metric was:

number of Publish Small Business trial starts

TLDR

The result of the experiment is that there is insufficient evidence to confirm the hypothesis, as there was no observed statistical difference between the control and variation groups.

Data Collection

To analyze the results of this experiment, we will use the following query to retrieve data about users enrolled in the experiment.

# connect to bigquery
con <- dbConnect(
  bigrquery::bigquery(),
  project = "buffer-data"
)
# define sql query to get experiment enrolled visitors
sql <- "
  with enrolled_users as (
    select
      anonymous_id
      , experiment_group
      , first_value(timestamp) over (
      partition by anonymous_id order by timestamp asc
      rows between unbounded preceding and unbounded following) as enrolled_at
    from segment_marketing.experiment_viewed
    where 
      first_viewed 
      and experiment_id = 'eid21_sticky_header_all_times'
  )
  select
    e.anonymous_id
    , e.experiment_group
    , e.enrolled_at
    , i.user_id as account_id
    , c.email
    , c.publish_user_id
    , a.timestamp as account_created_at
    , t.product as trial_product
    , t.timestamp as trial_started_at
    , t.subscription_id as trial_subscription_id
    , t.stripe_event_id as stripe_trial_event_id
    , t.plan_id as trial_plan_id
    , t.cycle as trial_billing_cycle
    , t.cta as trial_started_cta
    , s.product as subscription_product
    , s.timestamp as subscription_started_at
    , s.subscription_id as subscription_id
    , s.stripe_event_id as stripe_subscription_event_id
    , s.plan_id as subscription_plan_id
    , s.cycle as subscritpion_billing_cycle
    , s.revenue as subscription_revenue
    , s.amount as subscritpion_amount
    , s.cta as subscription_started_cta
  from enrolled_users e
    left join segment_login_server.identifies i
      on e.anonymous_id = i.anonymous_id
    left join dbt_buffer.core_accounts c
      on i.user_id = c.id
    left join segment_login_server.account_created a
      on i.user_id = a.user_id
    left join segment_publish_server.trial_started t
      on i.user_id = t.user_id
      and t.timestamp > e.enrolled_at
    left join segment_publish_server.subscription_started s
      on i.user_id = s.user_id
      and s.timestamp > e.enrolled_at
  group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
"
  
# query BQ
users <- dbGetQuery(con, sql)

Exploratory Analysis

Let’s start by reviewing a few of the summary statistics from out data.

skim(users)

Table 1: Data summary

Name	users
Number of rows	477118
Number of columns	23
_______________________
Column type frequency:
character	17
numeric	2
POSIXct	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
anonymous_id	0	1.00	36	36	473819
experiment_group	0	1.00	7	9	2
account_id	438745	0.08	24	24	35794
email	438858	0.08	7	58	35686
publish_user_id	439435	0.08	24	24	35114
trial_product	450771	0.06	7	7	2
trial_subscription_id	450771	0.06	18	18	25902
stripe_trial_event_id	450771	0.06	18	18	25963
trial_plan_id	450771	0.06	11	44	19
trial_billing_cycle	450771	0.06	4	5	2
trial_started_cta	450931	0.05	30	68	50
subscription_product	474311	0.01	7	7	2
subscription_id	474311	0.01	18	18	2183
stripe_subscription_event_id	474311	0.01	18	18	2186
subscription_plan_id	474311	0.01	11	44	19
subscritpion_billing_cycle	474311	0.01	4	5	2
subscription_started_cta	474625	0.01	30	67	40

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
subscription_revenue	474311	0.01	35.72	51.04	10	15	15	50	500	▇▁▁▁▁
subscritpion_amount	474311	0.01	98.60	173.92	10	15	50	144	2030	▇▁▁▁▁

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
enrolled_at	0	1.00	2019-12-04 18:23:45	2020-02-13 13:19:42	2019-12-18 19:49:21	473756
account_created_at	438745	0.08	2019-04-23 18:05:15	2020-02-18 19:22:35	2019-12-14 19:45:37	35794
trial_started_at	450771	0.06	2019-12-04 20:26:27	2020-02-18 19:22:47	2019-12-24 10:25:08	25968
subscription_started_at	474311	0.01	2019-12-04 20:55:43	2020-02-18 20:31:12	2020-01-07 21:02:03	2226

Let’s start with a quick validation of the visitor count split between the two experiment groups.

users %>% 
  group_by(experiment_group) %>% 
  summarise(visitors = n_distinct(anonymous_id), accounts = n_distinct(account_id), trials = n_distinct(trial_subscription_id), subscriptions = n_distinct(subscription_id)) %>% 
  mutate(visitor_split_perct = visitors / sum(visitors)) %>%
  kable() %>% 
  kable_styling()

experiment_group	visitors	accounts	trials	subscriptions	visitor_split_perct
control	237661	18102	13186	1093	0.501586
variant_1	236158	17694	12718	1092	0.498414

Great, there is a total of 475,219 unique visitors enrolled in the experiment, and the percentage split between the two experiment groups is within 0.15% of a 50/50 (well within reason for our randomization split). This confirms that our experiment framework correctly split enrollments for the experiment.

res <- prop.test(x = c(18348, 18055), n = c(238347, 236872), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(18348, 18055) out of c(238347, 236872)
## X-squared = 0.95332, df = 1, p-value = 0.3289
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.0007589254  0.0022741252
## sample estimates:
##    prop 1    prop 2 
## 0.0769802 0.0762226

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.000757599899208872. The observed proportion for the first group is 0.0769802011353195 (18,348 events out of a total sample size of 238,347). For the second group, the observed proportion is 0.0762226012361106 (18,055, out of a total sample size of 236,872).

The confidence interval for the true difference in population proportions is (-7.589254210^{-4}, 0.0022741). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.3288756. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000757599899208872 or less than -0.000757599899208872.

We can see that the number of visitors that ended up creating a Buffer account (ie, signed up for their first Buffer product), was a few hundred higher in the control group. Using a quick proportion test we can see that this difference in proportion of enrolled visitors that created a Buffer account is NOT statistically significant, with a p-value of 0.329 (which is far more than the generally accepted 0.05 threshold). TL;DR, there was no difference between the variation and control with the total visitor to total signup rate.

Next, we will calculate how many users from each experiment group started a Publish trial.

users %>% 
  mutate(has_publish_trial = trial_product == "publish") %>% 
  group_by(experiment_group, has_publish_trial) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_publish_trial) %>% 
  group_by(experiment_group) %>% 
  summarise(users_with_publish_trials = users) %>%
  kable() %>% 
  kable_styling()

experiment_group	users_with_publish_trials
control	11761
variant_1	11406

There were 11,790 users in the control group that started a Publish trial, and 11,541 in the variation group. Just like above, we should also run a proportion test here.

res <- prop.test(x = c(11790, 11541), n = c(18348, 18055), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(11790, 11541) out of c(18348, 18055)
## X-squared = 0.43279, df = 1, p-value = 0.5106
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.006548244  0.013274911
## sample estimates:
##    prop 1    prop 2 
## 0.6425768 0.6392135

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.0033633333508416. The observed proportion for the first group is 0.642576847612819 (11,790 events out of a total sample size of 18,348). For the second group, the observed proportion is 0.639213514261977 (11,541, out of a total sample size of 18,055).

The confidence interval for the true difference in population proportions is (-0.0065482, 0.0132749). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.5106212. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.0033633333508416 or less than -0.0033633333508416.

We can see that the difference in proportion of accounts that started a Publish trial is also NOT statistically significant, with a p-value of 0.51. TL;DR, there is no difference between the two groups in total Publish trial starts.

users %>% 
  mutate(has_sbp_trial = (trial_product == "publish" & str_detect(trial_plan_id, ".small.") == TRUE)) %>% 
  group_by(experiment_group, has_sbp_trial) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_sbp_trial) %>% 
  group_by(experiment_group) %>% 
  summarise(users_with_sbp_trials = users) %>%
  kable() %>% 
  kable_styling()

experiment_group	users_with_sbp_trials
control	2787
variant_1	2668

There were 2769 users in the control group that started a Publish Small Business trial, and 2670 in the variation group. Just like above, we should also run a proportion test here.

res <- prop.test(x = c(2769, 2670), n = c(18348, 18055), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(2769, 2670) out of c(18348, 18055)
## X-squared = 0.63555, df = 1, p-value = 0.4253
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.004344667  0.010412983
## sample estimates:
##    prop 1    prop 2 
## 0.1509156 0.1478815

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.00303415785535768. The observed proportion for the first group is 0.150915631131458 (2,769 events out of a total sample size of 18,348). For the second group, the observed proportion is 0.147881473276101 (2,670, out of a total sample size of 18,055).

The confidence interval for the true difference in population proportions is (-0.0043447, 0.010413). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.4253263. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00303415785535768 or less than -0.00303415785535768.

We can see that the difference in proportion of accounts that started a Publish SBP trial is also NOT statistically significant, with a p-value of 0.43. This means we can say with confidence that there is no observed difference in the number of Publish Small Business trials started between the two experiment groups.

Next, we will calculate how many users from each experiment group started a paid Publish subscription.

users %>% 
  mutate(has_publish_sub = (subscription_product == "publish")) %>% 
  group_by(experiment_group, has_publish_sub) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_publish_sub) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_publish_users = users) %>%
  kable() %>% 
  kable_styling()

experiment_group	paying_publish_users
control	972
variant_1	959

There were 741 users in the control group that started a paid Publish subscription, and 709 in the variation group. Just like above, we should also run a proportion test here, for both the account to paid subscription proportion, and also the trial to paid subscription proportion.

res <- prop.test(x = c(741, 709), n = c(18348, 18055), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(741, 709) out of c(18348, 18055)
## X-squared = 0.26838, df = 1, p-value = 0.6044
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.002955545  0.005189490
## sample estimates:
##     prop 1     prop 2 
## 0.04038587 0.03926890

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.00111697253812972. The observed proportion for the first group is 0.0403858731196861 (741 events out of a total sample size of 18,348). For the second group, the observed proportion is 0.0392689005815564 (709, out of a total sample size of 18,055).

The confidence interval for the true difference in population proportions is (-0.0029555, 0.0051895). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.6044234. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00111697253812972 or less than -0.00111697253812972.

There is no statistical difference between the two proportions of signups that started a paid Publish subscription, as the p-value is 0.604.

res <- prop.test(x = c(741, 709), n = c(11790, 11541), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(741, 709) out of c(11790, 11541)
## X-squared = 0.17726, df = 1, p-value = 0.6737
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.004864409  0.007697851
## sample estimates:
##     prop 1     prop 2 
## 0.06284987 0.06143315

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.00141672140017238. The observed proportion for the first group is 0.0628498727735369 (741 events out of a total sample size of 11,790). For the second group, the observed proportion is 0.0614331513733645 (709, out of a total sample size of 11,541).

The confidence interval for the true difference in population proportions is (-0.0048644, 0.0076979). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.6737409. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00141672140017238 or less than -0.00141672140017238.

There is no statistical difference between the two proportions of trial starts that started a paid Publish subscription, as the p-value is 0.67.

Next, let’s look into the number of Publish SBP paid subscriptions between the two experiment groups.

users %>% 
  mutate(has_sbp_sub = (subscription_product == "publish" & str_detect(subscription_plan_id, ".small.") == TRUE)) %>% 
  group_by(experiment_group, has_sbp_sub) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_sbp_sub) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_sbp_users = users) %>%
  kable() %>% 
  kable_styling()

experiment_group	paying_sbp_users
control	76
variant_1	69

The number of users for a Publish SBP paid subscription was 47 in the control and 43 in the variation. Just like above, we should also run a couple of proportion tests here too.

res <- prop.test(x = c(47, 43), n = c(18348, 18055), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(47, 43) out of c(18348, 18055)
## X-squared = 0.057684, df = 1, p-value = 0.8102
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.0008949943  0.0012549450
## sample estimates:
##      prop 1      prop 2 
## 0.002561587 0.002381612

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.000179975352061444. The observed proportion for the first group is 0.00256158709396119 (47 events out of a total sample size of 18,348). For the second group, the observed proportion is 0.00238161174189975 (43, out of a total sample size of 18,055).

The confidence interval for the true difference in population proportions is (-8.949942710^{-4}, 0.0012549). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.8101945. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000179975352061444 or less than -0.000179975352061444.

There is no statistical difference between the two proportions of signups that started a paid SBP subscription, as the p-value is 0.810.

res <- prop.test(x = c(47, 43), n = c(11790, 11541), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(47, 43) out of c(11790, 11541)
## X-squared = 0.0464, df = 1, p-value = 0.8294
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.001415507  0.001936671
## sample estimates:
##      prop 1      prop 2 
## 0.003986429 0.003725847

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.000260582196937878. The observed proportion for the first group is 0.00398642917726887 (47 events out of a total sample size of 11,790). For the second group, the observed proportion is 0.00372584698033099 (43, out of a total sample size of 11,541).

The confidence interval for the true difference in population proportions is (-0.0014155, 0.0019367). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.8294495. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000260582196937878 or less than -0.000260582196937878.

There is no statistical difference between the two proportions of Publish trials that started a paid SBP subscription, as the p-value is 0.829.

Final Results

Given the above observations, the result of the experiment is that there is insufficient evidence to confirm the hypothesis. Based on these observations, there is no difference between the control and the variation.