21 min read

Publish & Analyze Bundle A/B Test (EID19)

In this analysis we’ll explore the results from the Publish and Analyze bundle A/B test, also known as experimnent EID19. The experiment was run as an A/B test via our A/Bert framework, and split visitors randomly 50/50 between the control and the variation groups. Visitors were enrolled in the experiment once they landed on a page that had a CTA promoting the experiment landing page (ie, a visitor did not have to land on the experiment landing page to be enrolled, as the variation treatment included menu items and ctas propmting users to go to the experiment landing page).

The experiment hypthosis was:

  • If we offer an easier, more convenient way for new-to-Buffer users to create a Buffer account and trial Publish & Analyze all at the same time (ie, create a Buffer account, signup for Analyze, signup for Publish, start an Analyze trial, and start a Publish trial all in one workflow), then we will see an increase in the number of users paying for both products.

Given this hypothesis, our success metric was:

  • Number of users paying for both products

Also, it is important to note, we are interested in comparing the overall impact between the two groups, not just the specific acquisition flow from within the experiment landing page.

TLDR

The result of the experiment is that there is insufficient evidence to confirm the hypothesis. Though it appears the variation treatment lead to more Analyze trials (which had a down stream effect of increasing the mrr value per paying account), the landing page underperformed compared to trials started from other locations, and the number of new users that ended up on a paid subscription to both products had no statistical difference between the two groups. Given there were only 3 users that started a trial from the experiment landing page and converted to paid subscriptions in both products, it appears the positive results from the variation group were related to increasing the exposure of Analyze during user acquisition.

We recommend persuing an additional iteration of this experiment, examing both ways to increase click throughs on the experiment CTAs to the solutions landing page, as well as increase the performance of the solutions landing page via changes to messaging and changes to which plans are paired for the solution offered.

Data Collection

To analyze the results of this experiment, we will use the following query to retrieve data about users enrolled in the experiment.

# connect to bigquery
con <- dbConnect(
  bigrquery::bigquery(),
  project = "buffer-data"
)
# define sql query to get experiment enrolled visitors
sql <- "
  with enrolled_users as (
    select
      anonymous_id
      , experiment_group
      , first_value(timestamp) over (
      partition by anonymous_id order by timestamp asc
      rows between unbounded preceding and unbounded following) as enrolled_at
    from segment_marketing.experiment_viewed
    where 
      first_viewed 
      and experiment_id = 'eid19_publish_analyze_bundle_ab'
  )
  select
    e.anonymous_id
    , e.experiment_group
    , e.enrolled_at
    , i.user_id as account_id
    , c.email
    , c.publish_user_id
    , c.analyze_user_id
    , a.timestamp as account_created_at
    , t.product as trial_product
    , t.timestamp as trial_started_at
    , t.subscription_id as trial_subscription_id
    , t.stripe_event_id as stripe_trial_event_id
    , t.plan_id as trial_plan_id
    , t.cycle as trial_billing_cycle
    , t.cta as trial_started_cta
    , t.multi_product_bundle_name as trial_multi_product_bundle_name
    , s.product as subscription_product
    , s.timestamp as subscription_started_at
    , s.subscription_id as subscription_id
    , s.stripe_event_id as stripe_subscription_event_id
    , s.plan_id as subscription_plan_id
    , s.cycle as subscritpion_billing_cycle
    , s.revenue as subscription_revenue
    , s.amount as subscritpion_amount
    , s.cta as subscription_started_cta
    , s.multi_product_bundle_name as subscription_multi_product_bundle_name
  from enrolled_users e
    left join segment_login_server.identifies i
      on e.anonymous_id = i.anonymous_id
    left join dbt_buffer.core_accounts c
      on i.user_id = c.id
    left join segment_login_server.account_created a
      on i.user_id = a.user_id
    left join segment_publish_server.trial_started t
      on i.user_id = t.user_id
      and t.timestamp > e.enrolled_at
    left join segment_publish_server.subscription_started s
      on i.user_id = s.user_id
      and s.timestamp > e.enrolled_at
  group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
"
  
# query BQ
# users <- dbGetQuery(con, sql)

Exploratory Analysis

Let’s start by reviewing a few of the summary statistics from out data.

skim(users)
Table 1: Data summary
Name users
Number of rows 284326
Number of columns 26
_______________________
Column type frequency:
character 20
numeric 2
POSIXct 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
anonymous_id 0 1.00 36 36 0 282262 0
experiment_group 0 1.00 7 9 0 2 0
account_id 259040 0.09 24 24 0 23855 0
email 259123 0.09 8 50 0 23673 0
publish_user_id 259218 0.09 24 24 0 23677 0
analyze_user_id 281767 0.01 24 24 0 1411 0
trial_product 268322 0.06 7 7 0 2 0
trial_subscription_id 268322 0.06 18 18 0 15843 0
stripe_trial_event_id 268322 0.06 18 18 0 15843 0
trial_plan_id 268322 0.06 13 44 0 14 0
trial_billing_cycle 268322 0.06 4 5 0 2 0
trial_started_cta 268440 0.06 28 70 0 52 0
trial_multi_product_bundle_name 284123 0.00 22 22 0 1 0
subscription_product 282644 0.01 7 7 0 2 0
subscription_id 282644 0.01 18 18 0 1397 0
stripe_subscription_event_id 282644 0.01 18 18 0 1398 0
subscription_plan_id 282644 0.01 13 44 0 14 0
subscritpion_billing_cycle 282644 0.01 4 5 0 2 0
subscription_started_cta 282802 0.01 30 70 0 40 0
subscription_multi_product_bundle_name 284314 0.00 22 22 0 1 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
subscription_revenue 283212 0.00 35.19 33.49 12 15 15 50 399 ▇▁▁▁▁
subscritpion_amount 282644 0.01 83.31 142.14 15 15 50 144 1010 ▇▁▁▁▁

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
enrolled_at 0 1.00 2019-10-14 19:21:46 2019-12-06 22:17:51 2019-10-22 03:30:14 282226
account_created_at 259040 0.09 2019-04-23 17:03:04 2019-12-11 13:42:29 2019-10-18 09:57:29 23855
trial_started_at 268322 0.06 2019-10-15 06:52:41 2019-12-11 15:00:59 2019-10-24 08:58:03 15845
subscription_started_at 282644 0.01 2019-10-15 10:02:53 2019-12-11 15:05:22 2019-11-04 22:58:53 1399

Let’s start with a quick validation of the visitor count split between the two experiment groups.

users %>% 
  group_by(experiment_group) %>% 
  summarise(visitors = n_distinct(anonymous_id), accounts = n_distinct(account_id), trials = n_distinct(trial_subscription_id), subscriptions = n_distinct(subscription_id)) %>% 
  mutate(visitor_split_perct = visitors / sum(visitors)) %>%
  kable() %>% 
  kable_styling()
## `summarise()` ungrouping output (override with `.groups` argument)
experiment_group visitors accounts trials subscriptions visitor_split_perct
control 141129 12148 8081 690 0.4999929
variant_1 141133 11709 7764 709 0.5000071

Great, there is a total of 282,249 unique visitors enrolled in the experiment, and the percentage split between the two experiment groups is within a few visitors of exactly 50/50. This confirms that our experiment framework correctly split enrollments for the experiment.

res <- prop.test(x = c(12080, 11656), n = c(141123, 141126), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(12080, 11656) out of c(141123, 141126)
## X-squared = 8.2403, df = 1, p-value = 0.004097
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.0009514326 0.0050610215
## sample estimates:
##     prop 1     prop 2 
## 0.08559909 0.08259286
explain(res)
## Warning: package 'broom' was built under R version 3.6.2

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we reject the null hypothesis, and conclude that two population proportions are not equal. The observed difference in proportions is -0.00300622704010574. The observed proportion for the first group is 0.0855990873209895 (12,080 events out of a total sample size of 141,123). For the second group, the observed proportion is 0.0825928602808838 (11,656, out of a total sample size of 141,126).

The confidence interval for the true difference in population proportions is (9.514325910^{-4}, 0.005061). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.0040971. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00300622704010574 or less than -0.00300622704010574.

We can see that the number of visitors that ended up creating a Buffer account (ie, signed up for their first Buffer product), was a few hundred higher in the control group. Using a quick proportion test we can see that this difference in proportion of enrolled visitors that created a Buffer account is statistically significant, with a p-value of 0.004 (which is less than the generally accepted 0.05 threshold).

Next, we will calculate how many users from each experiment group started a trial from each product.

users %>% 
  mutate(has_publish_trial = trial_product == "publish") %>% 
  group_by(experiment_group, has_publish_trial) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_publish_trial) %>% 
  group_by(experiment_group) %>% 
  summarise(users_with_publish_trials = users) %>%
  kable() %>% 
  kable_styling()
## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
experiment_group users_with_publish_trials
control 7440
variant_1 7012

There were 7372 users in the control group that started a Publish trial, and 6954 in the variation group. Just like above, we should also run a proportion test here.

res <- prop.test(x = c(7372, 6954), n = c(12080, 11656), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(7372, 6954) out of c(12080, 11656)
## X-squared = 4.5707, df = 1, p-value = 0.03252
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.001130084 0.026194502
## sample estimates:
##    prop 1    prop 2 
## 0.6102649 0.5966026
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we reject the null hypothesis, and conclude that two population proportions are not equal. The observed difference in proportions is -0.0136622925634184. The observed proportion for the first group is 0.610264900662252 (7,372 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.596602608098833 (6,954, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (0.0011301, 0.0261945). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.0325235. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.0136622925634184 or less than -0.0136622925634184.

We can see that the difference in proportion of accounts that started a Publish trial is also statistically significant, with a p-value of 0.03.

users %>% 
  mutate(has_analyze_trial = (trial_product == "analyze")) %>% 
  group_by(experiment_group, has_analyze_trial) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_analyze_trial) %>% 
  group_by(experiment_group) %>% 
  summarise(users_with_analyze_trials = users) %>%
  kable() %>% 
  kable_styling()
## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
experiment_group users_with_analyze_trials
control 466
variant_1 588

There were 449 users in the control group that started an Analyze trial, and 571 in the variation group. Just like above, we should also run a proportion test here.

res <- prop.test(x = c(449, 571), n = c(12080, 11656), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(449, 571) out of c(12080, 11656)
## X-squared = 19.862, df = 1, p-value = 8.324e-06
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.017073586 -0.006563957
## sample estimates:
##     prop 1     prop 2 
## 0.03716887 0.04898765
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we reject the null hypothesis, and conclude that two population proportions are not equal. The observed difference in proportions is 0.0118187716754467. The observed proportion for the first group is 0.0371688741721854 (449 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.0489876458476321 (571, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (-0.0170736, -0.006564). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 8.32444710^{-6}. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.0118187716754467 or less than -0.0118187716754467.

We can see that the difference in proportion of accounts that started an Analyze trial is also statistically significant, with a p-value of 8.324e-06 (which a very small number, and much less than the generally accepted 0.05 threshold). This means we can say with confidence that the observed difference in the number of Analyze trials started between the two experiment groups was not the result of random variance.

Next, we will calculate how many users from each experiment group started a paid subscription from each product.

users %>% 
  mutate(has_publish_sub = (subscription_product == "publish")) %>% 
  group_by(experiment_group, has_publish_sub) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_publish_sub) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_publish_users = users) %>%
  kable() %>% 
  kable_styling()
## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
experiment_group paying_publish_users
control 640
variant_1 639

There were 618 users in the control group that started a paid Publish subscription, and 620 in the variation group. Just like above, we should also run a proportion test here, for both the account to paid subscription proportion, and also the trial to paid subscription proportion.

res <- prop.test(x = c(618, 620), n = c(12080, 11656), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(618, 620) out of c(12080, 11656)
## X-squared = 0.45546, df = 1, p-value = 0.4998
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.007776708  0.003711610
## sample estimates:
##     prop 1     prop 2 
## 0.05115894 0.05319149
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is 0.00203254896435114. The observed proportion for the first group is 0.051158940397351 (618 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.0531914893617021 (620, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (-0.0077767, 0.0037116). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.4997515. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00203254896435114 or less than -0.00203254896435114.

There is no statistical difference between the two proportions of accounts that started a paid Publish subscription, as the p-value is 0.499.

res <- prop.test(x = c(618, 620), n = c(7372, 6954), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(618, 620) out of c(7372, 6954)
## X-squared = 1.2195, df = 1, p-value = 0.2695
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.014679446  0.004026229
## sample estimates:
##     prop 1     prop 2 
## 0.08383071 0.08915732
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is 0.00532660873071643. The observed proportion for the first group is 0.0838307107976126 (618 events out of a total sample size of 7,372). For the second group, the observed proportion is 0.089157319528329 (620, out of a total sample size of 6,954).

The confidence interval for the true difference in population proportions is (-0.0146794, 0.0040262). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.2694686. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00532660873071643 or less than -0.00532660873071643.

There is no statistical difference between the two proportions of accounts that started a paid Publish subscription, as the p-value is 0.269.

Next, let’s look into the number of Analyze paid subscriptions between the two experiment groups.

users %>% 
  mutate(has_analyze_sub = (subscription_product == "analyze")) %>% 
  group_by(experiment_group, has_analyze_sub) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_analyze_sub) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_publish_users = users) %>%
  kable() %>% 
  kable_styling()
## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
experiment_group paying_publish_users
control 47
variant_1 59

The number of paying users for Analyze was 44 in the control and 54 in the variation. Just like above, we should also run a couple of proportion tests here too.

explain(prop.test(x = c(44, 54), n = c(12080, 11656), alternative = "two.sided"))

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is 0.000990423031994436. The observed proportion for the first group is 0.00364238410596027 (44 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.0046328071379547 (54, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (-0.0027099, 7.290466310^{-4}). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.2764203. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000990423031994436 or less than -0.000990423031994436.

There is no statistical difference between the two proportions of users that started a paid Analyze subscription, as the p-value is 0.276.

res <- prop.test(x = c(44, 54), n = c(449, 571), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(44, 54) out of c(449, 571)
## X-squared = 0.0059629, df = 1, p-value = 0.9384
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.03506551  0.04191475
## sample estimates:
##     prop 1     prop 2 
## 0.09799555 0.09457093
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.00342461746086847. The observed proportion for the first group is 0.0979955456570156 (44 events out of a total sample size of 449). For the second group, the observed proportion is 0.0945709281961471 (54, out of a total sample size of 571).

The confidence interval for the true difference in population proportions is (-0.0350655, 0.0419147). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.9384488. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00342461746086847 or less than -0.00342461746086847.

There is no statistical difference between the two proportions of Analyze trials that started a paid Analyze subscription, as the p-value is 0.938. This means that the difference in paid Analyze subscriptions is the result of more Analyze trials started in the variation group, not a difference in conversion rate to paid between the two groups.

Next, we will look at the number of accounts that started paid subscriptions for both Publish and Analyze in both experiment groups.

users %>% 
  filter(!is.na(subscription_product)) %>% 
  group_by(account_id, experiment_group) %>% 
  summarise(products = n_distinct(subscription_product)) %>% 
  ungroup() %>% 
  filter(products > 1) %>% 
  group_by(experiment_group, products) %>% 
  summarise(users = n_distinct(account_id)) %>%
  kable() %>% 
  kable_styling()
## `summarise()` regrouping output by 'account_id' (override with `.groups` argument)
## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)
experiment_group products users
control 2 32
variant_1 2 44
res <- prop.test(x = c(30, 40), n = c(12080, 11656), alternative = "two.sided")
res
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(30, 40) out of c(12080, 11656)
## X-squared = 1.5059, df = 1, p-value = 0.2198
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.0024163450  0.0005198145
## sample estimates:
##      prop 1      prop 2 
## 0.002483444 0.003431709
explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is 0.000948265282468285. The observed proportion for the first group is 0.00248344370860927 (30 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.00343170899107756 (40, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (-0.0024163, 5.198144710^{-4}). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.2197602. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000948265282468285 or less than -0.000948265282468285.

There is no statistical difference between the two proportions of accounts started in each experiment group that ended up starting both a paid Analyze subscription and a paid Publish subscription, as the p-value is 0.219. Put another way, there is about a 1 out of 5 chance that the observed difference is the result of natural variance.

To ensure we have a full picture of the variation group’s behavior with converting to paid for both products, let’s also look at how many trials of both products were started from the experiment landing page and what those trial’s conversion to paid was.

users %>% 
  mutate(has_multiproduct_trial = !is.na(trial_multi_product_bundle_name)) %>% 
  group_by(has_multiproduct_trial) %>% 
  summarise(users = n_distinct(account_id)) %>%
  kable() %>% 
  kable_styling()
## `summarise()` ungrouping output (override with `.groups` argument)
has_multiproduct_trial users
FALSE 23776
TRUE 115
users %>% 
  mutate(has_multiproduct_trial = !is.na(trial_multi_product_bundle_name)) %>%  
  filter(has_multiproduct_trial) %>% 
  group_by(account_id) %>% 
  summarise(products = n_distinct(subscription_product), paid_subscriptions = n_distinct(subscription_id)) %>% 
  ungroup() %>% 
  filter(products > 1) %>%
  summarise(users = n_distinct(account_id)) %>%
  kable() %>% 
  kable_styling()
## `summarise()` ungrouping output (override with `.groups` argument)
users
3

3 of 115 trials converted to both paid products, which is a 2.6% conversion rate. For comparison, the conversion rate to paid for just Publish trials for both groups was 5%, and the conversion rate to paid for just Analye trials for both groups was 10%.

Finally, let’s also look at the MRR value of all converted trials per experiment group, to see if there was any overall difference in value between the two groups.

users %>% 
  filter(!is.na(subscription_id)) %>% 
  mutate(mrr_value = if_else(subscritpion_billing_cycle == "year", (subscritpion_amount / 12), (subscritpion_amount/1))) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_user_count = n_distinct(account_id), total_mrr_value = round(sum(mrr_value),2), mrr_value_per_account = round(total_mrr_value / paying_user_count,2)) %>%
  kable() %>% 
  kable_styling()
## `summarise()` ungrouping output (override with `.groups` argument)
experiment_group paying_user_count total_mrr_value mrr_value_per_account
control 655 25232.25 38.52
variant_1 654 27187.50 41.57

Given both groups ended up with almost the same number of paying customers, the difference in mrr value per account between the two groups indicates that the stat sig difference in analyze trial starts lead to more overall revenue. This could imply that the realized benefit of this experiment was more exposure to Analyze, not a more convient way to start trials for both products.

Final Results

Given the above observations, the result of the experiment is that there is insufficient evidence to confirm the hypothesis. Though it appears the variation treatment lead to more Analyze trials, and thus a slightly higher mrr value per paying account, the landing page underperformed compared to trials started from other locations, and the number of new users that ended up on a paid subscription to both products had no statistical difference between the two groups.

We recommend persuing an iteration of this experiment, examing both ways to increase click throughs on the experiment CTAs to the solutions landing page, as well as increase the performance of the solutions landing page via changes to messaging and changes to which plans are paired for the solution offered.