Publish & Analyze Bundle A/B Test (EID19)

In this analysis we’ll explore the results from the Publish and Analyze bundle A/B test, also known as experimnent EID19. The experiment was run as an A/B test via our A/Bert framework, and split visitors randomly 50/50 between the control and the variation groups. Visitors were enrolled in the experiment once they landed on a page that had a CTA promoting the experiment landing page (ie, a visitor did not have to land on the experiment landing page to be enrolled, as the variation treatment included menu items and ctas propmting users to go to the experiment landing page).

The experiment hypthosis was:

If we offer an easier, more convenient way for new-to-Buffer users to create a Buffer account and trial Publish & Analyze all at the same time (ie, create a Buffer account, signup for Analyze, signup for Publish, start an Analyze trial, and start a Publish trial all in one workflow), then we will see an increase in the number of users paying for both products.

Given this hypothesis, our success metric was:

Number of users paying for both products

Also, it is important to note, we are interested in comparing the overall impact between the two groups, not just the specific acquisition flow from within the experiment landing page.

TLDR

The result of the experiment is that there is insufficient evidence to confirm the hypothesis. Though it appears the variation treatment lead to more Analyze trials (which had a down stream effect of increasing the mrr value per paying account), the landing page underperformed compared to trials started from other locations, and the number of new users that ended up on a paid subscription to both products had no statistical difference between the two groups. Given there were only 3 users that started a trial from the experiment landing page and converted to paid subscriptions in both products, it appears the positive results from the variation group were related to increasing the exposure of Analyze during user acquisition.

We recommend persuing an additional iteration of this experiment, examing both ways to increase click throughs on the experiment CTAs to the solutions landing page, as well as increase the performance of the solutions landing page via changes to messaging and changes to which plans are paired for the solution offered.

Data Collection

To analyze the results of this experiment, we will use the following query to retrieve data about users enrolled in the experiment.

# connect to bigquery
con <- dbConnect(
  bigrquery::bigquery(),
  project = "buffer-data"
)
# define sql query to get experiment enrolled visitors
sql <- "
  with enrolled_users as (
    select
      anonymous_id
      , experiment_group
      , first_value(timestamp) over (
      partition by anonymous_id order by timestamp asc
      rows between unbounded preceding and unbounded following) as enrolled_at
    from segment_marketing.experiment_viewed
    where 
      first_viewed 
      and experiment_id = 'eid19_publish_analyze_bundle_ab'
  )
  select
    e.anonymous_id
    , e.experiment_group
    , e.enrolled_at
    , i.user_id as account_id
    , c.email
    , c.publish_user_id
    , c.analyze_user_id
    , a.timestamp as account_created_at
    , t.product as trial_product
    , t.timestamp as trial_started_at
    , t.subscription_id as trial_subscription_id
    , t.stripe_event_id as stripe_trial_event_id
    , t.plan_id as trial_plan_id
    , t.cycle as trial_billing_cycle
    , t.cta as trial_started_cta
    , t.multi_product_bundle_name as trial_multi_product_bundle_name
    , s.product as subscription_product
    , s.timestamp as subscription_started_at
    , s.subscription_id as subscription_id
    , s.stripe_event_id as stripe_subscription_event_id
    , s.plan_id as subscription_plan_id
    , s.cycle as subscritpion_billing_cycle
    , s.revenue as subscription_revenue
    , s.amount as subscritpion_amount
    , s.cta as subscription_started_cta
    , s.multi_product_bundle_name as subscription_multi_product_bundle_name
  from enrolled_users e
    left join segment_login_server.identifies i
      on e.anonymous_id = i.anonymous_id
    left join dbt_buffer.core_accounts c
      on i.user_id = c.id
    left join segment_login_server.account_created a
      on i.user_id = a.user_id
    left join segment_publish_server.trial_started t
      on i.user_id = t.user_id
      and t.timestamp > e.enrolled_at
    left join segment_publish_server.subscription_started s
      on i.user_id = s.user_id
      and s.timestamp > e.enrolled_at
  group by 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26
"
  
# query BQ
# users <- dbGetQuery(con, sql)

Exploratory Analysis

Let’s start by reviewing a few of the summary statistics from out data.

skim(users)

Table 1: Data summary

Name	users
Number of rows	284326
Number of columns	26
_______________________
Column type frequency:
character	20
numeric	2
POSIXct	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
anonymous_id	0	1.00	36	36	282262
experiment_group	0	1.00	7	9	2
account_id	259040	0.09	24	24	23855
email	259123	0.09	8	50	23673
publish_user_id	259218	0.09	24	24	23677
analyze_user_id	281767	0.01	24	24	1411
trial_product	268322	0.06	7	7	2
trial_subscription_id	268322	0.06	18	18	15843
stripe_trial_event_id	268322	0.06	18	18	15843
trial_plan_id	268322	0.06	13	44	14
trial_billing_cycle	268322	0.06	4	5	2
trial_started_cta	268440	0.06	28	70	52
trial_multi_product_bundle_name	284123	0.00	22	22	1
subscription_product	282644	0.01	7	7	2
subscription_id	282644	0.01	18	18	1397
stripe_subscription_event_id	282644	0.01	18	18	1398
subscription_plan_id	282644	0.01	13	44	14
subscritpion_billing_cycle	282644	0.01	4	5	2
subscription_started_cta	282802	0.01	30	70	40
subscription_multi_product_bundle_name	284314	0.00	22	22	1

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
subscription_revenue	283212	0.00	35.19	33.49	12	15	15	50	399	▇▁▁▁▁
subscritpion_amount	282644	0.01	83.31	142.14	15	15	50	144	1010	▇▁▁▁▁

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
enrolled_at	0	1.00	2019-10-14 19:21:46	2019-12-06 22:17:51	2019-10-22 03:30:14	282226
account_created_at	259040	0.09	2019-04-23 17:03:04	2019-12-11 13:42:29	2019-10-18 09:57:29	23855
trial_started_at	268322	0.06	2019-10-15 06:52:41	2019-12-11 15:00:59	2019-10-24 08:58:03	15845
subscription_started_at	282644	0.01	2019-10-15 10:02:53	2019-12-11 15:05:22	2019-11-04 22:58:53	1399

Let’s start with a quick validation of the visitor count split between the two experiment groups.

users %>% 
  group_by(experiment_group) %>% 
  summarise(visitors = n_distinct(anonymous_id), accounts = n_distinct(account_id), trials = n_distinct(trial_subscription_id), subscriptions = n_distinct(subscription_id)) %>% 
  mutate(visitor_split_perct = visitors / sum(visitors)) %>%
  kable() %>% 
  kable_styling()

## `summarise()` ungrouping output (override with `.groups` argument)

experiment_group	visitors	accounts	trials	subscriptions	visitor_split_perct
control	141129	12148	8081	690	0.4999929
variant_1	141133	11709	7764	709	0.5000071

Great, there is a total of 282,249 unique visitors enrolled in the experiment, and the percentage split between the two experiment groups is within a few visitors of exactly 50/50. This confirms that our experiment framework correctly split enrollments for the experiment.

res <- prop.test(x = c(12080, 11656), n = c(141123, 141126), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(12080, 11656) out of c(141123, 141126)
## X-squared = 8.2403, df = 1, p-value = 0.004097
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.0009514326 0.0050610215
## sample estimates:
##     prop 1     prop 2 
## 0.08559909 0.08259286

explain(res)

## Warning: package 'broom' was built under R version 3.6.2

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we reject the null hypothesis, and conclude that two population proportions are not equal. The observed difference in proportions is -0.00300622704010574. The observed proportion for the first group is 0.0855990873209895 (12,080 events out of a total sample size of 141,123). For the second group, the observed proportion is 0.0825928602808838 (11,656, out of a total sample size of 141,126).

The confidence interval for the true difference in population proportions is (9.514325910^{-4}, 0.005061). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.0040971. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00300622704010574 or less than -0.00300622704010574.

We can see that the number of visitors that ended up creating a Buffer account (ie, signed up for their first Buffer product), was a few hundred higher in the control group. Using a quick proportion test we can see that this difference in proportion of enrolled visitors that created a Buffer account is statistically significant, with a p-value of 0.004 (which is less than the generally accepted 0.05 threshold).

Next, we will calculate how many users from each experiment group started a trial from each product.

users %>% 
  mutate(has_publish_trial = trial_product == "publish") %>% 
  group_by(experiment_group, has_publish_trial) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_publish_trial) %>% 
  group_by(experiment_group) %>% 
  summarise(users_with_publish_trials = users) %>%
  kable() %>% 
  kable_styling()

## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

experiment_group	users_with_publish_trials
control	7440
variant_1	7012

There were 7372 users in the control group that started a Publish trial, and 6954 in the variation group. Just like above, we should also run a proportion test here.

res <- prop.test(x = c(7372, 6954), n = c(12080, 11656), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(7372, 6954) out of c(12080, 11656)
## X-squared = 4.5707, df = 1, p-value = 0.03252
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.001130084 0.026194502
## sample estimates:
##    prop 1    prop 2 
## 0.6102649 0.5966026

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we reject the null hypothesis, and conclude that two population proportions are not equal. The observed difference in proportions is -0.0136622925634184. The observed proportion for the first group is 0.610264900662252 (7,372 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.596602608098833 (6,954, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (0.0011301, 0.0261945). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.0325235. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.0136622925634184 or less than -0.0136622925634184.

We can see that the difference in proportion of accounts that started a Publish trial is also statistically significant, with a p-value of 0.03.

users %>% 
  mutate(has_analyze_trial = (trial_product == "analyze")) %>% 
  group_by(experiment_group, has_analyze_trial) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_analyze_trial) %>% 
  group_by(experiment_group) %>% 
  summarise(users_with_analyze_trials = users) %>%
  kable() %>% 
  kable_styling()

## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

experiment_group	users_with_analyze_trials
control	466
variant_1	588

There were 449 users in the control group that started an Analyze trial, and 571 in the variation group. Just like above, we should also run a proportion test here.

res <- prop.test(x = c(449, 571), n = c(12080, 11656), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(449, 571) out of c(12080, 11656)
## X-squared = 19.862, df = 1, p-value = 8.324e-06
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.017073586 -0.006563957
## sample estimates:
##     prop 1     prop 2 
## 0.03716887 0.04898765

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we reject the null hypothesis, and conclude that two population proportions are not equal. The observed difference in proportions is 0.0118187716754467. The observed proportion for the first group is 0.0371688741721854 (449 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.0489876458476321 (571, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (-0.0170736, -0.006564). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 8.32444710^{-6}. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.0118187716754467 or less than -0.0118187716754467.

We can see that the difference in proportion of accounts that started an Analyze trial is also statistically significant, with a p-value of 8.324e-06 (which a very small number, and much less than the generally accepted 0.05 threshold). This means we can say with confidence that the observed difference in the number of Analyze trials started between the two experiment groups was not the result of random variance.

Next, we will calculate how many users from each experiment group started a paid subscription from each product.

users %>% 
  mutate(has_publish_sub = (subscription_product == "publish")) %>% 
  group_by(experiment_group, has_publish_sub) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_publish_sub) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_publish_users = users) %>%
  kable() %>% 
  kable_styling()

## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

experiment_group	paying_publish_users
control	640
variant_1	639

There were 618 users in the control group that started a paid Publish subscription, and 620 in the variation group. Just like above, we should also run a proportion test here, for both the account to paid subscription proportion, and also the trial to paid subscription proportion.

res <- prop.test(x = c(618, 620), n = c(12080, 11656), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(618, 620) out of c(12080, 11656)
## X-squared = 0.45546, df = 1, p-value = 0.4998
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.007776708  0.003711610
## sample estimates:
##     prop 1     prop 2 
## 0.05115894 0.05319149

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is 0.00203254896435114. The observed proportion for the first group is 0.051158940397351 (618 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.0531914893617021 (620, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (-0.0077767, 0.0037116). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.4997515. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00203254896435114 or less than -0.00203254896435114.

There is no statistical difference between the two proportions of accounts that started a paid Publish subscription, as the p-value is 0.499.

res <- prop.test(x = c(618, 620), n = c(7372, 6954), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(618, 620) out of c(7372, 6954)
## X-squared = 1.2195, df = 1, p-value = 0.2695
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.014679446  0.004026229
## sample estimates:
##     prop 1     prop 2 
## 0.08383071 0.08915732

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is 0.00532660873071643. The observed proportion for the first group is 0.0838307107976126 (618 events out of a total sample size of 7,372). For the second group, the observed proportion is 0.089157319528329 (620, out of a total sample size of 6,954).

The confidence interval for the true difference in population proportions is (-0.0146794, 0.0040262). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.2694686. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00532660873071643 or less than -0.00532660873071643.

There is no statistical difference between the two proportions of accounts that started a paid Publish subscription, as the p-value is 0.269.

Next, let’s look into the number of Analyze paid subscriptions between the two experiment groups.

users %>% 
  mutate(has_analyze_sub = (subscription_product == "analyze")) %>% 
  group_by(experiment_group, has_analyze_sub) %>% 
  summarise(users = n_distinct(account_id)) %>% 
  ungroup() %>% 
  filter(has_analyze_sub) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_publish_users = users) %>%
  kable() %>% 
  kable_styling()

## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

experiment_group	paying_publish_users
control	47
variant_1	59

The number of paying users for Analyze was 44 in the control and 54 in the variation. Just like above, we should also run a couple of proportion tests here too.

explain(prop.test(x = c(44, 54), n = c(12080, 11656), alternative = "two.sided"))

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is 0.000990423031994436. The observed proportion for the first group is 0.00364238410596027 (44 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.0046328071379547 (54, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (-0.0027099, 7.290466310^{-4}). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.2764203. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000990423031994436 or less than -0.000990423031994436.

There is no statistical difference between the two proportions of users that started a paid Analyze subscription, as the p-value is 0.276.

res <- prop.test(x = c(44, 54), n = c(449, 571), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(44, 54) out of c(449, 571)
## X-squared = 0.0059629, df = 1, p-value = 0.9384
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.03506551  0.04191475
## sample estimates:
##     prop 1     prop 2 
## 0.09799555 0.09457093

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is -0.00342461746086847. The observed proportion for the first group is 0.0979955456570156 (44 events out of a total sample size of 449). For the second group, the observed proportion is 0.0945709281961471 (54, out of a total sample size of 571).

The confidence interval for the true difference in population proportions is (-0.0350655, 0.0419147). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.9384488. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.00342461746086847 or less than -0.00342461746086847.

There is no statistical difference between the two proportions of Analyze trials that started a paid Analyze subscription, as the p-value is 0.938. This means that the difference in paid Analyze subscriptions is the result of more Analyze trials started in the variation group, not a difference in conversion rate to paid between the two groups.

Next, we will look at the number of accounts that started paid subscriptions for both Publish and Analyze in both experiment groups.

users %>% 
  filter(!is.na(subscription_product)) %>% 
  group_by(account_id, experiment_group) %>% 
  summarise(products = n_distinct(subscription_product)) %>% 
  ungroup() %>% 
  filter(products > 1) %>% 
  group_by(experiment_group, products) %>% 
  summarise(users = n_distinct(account_id)) %>%
  kable() %>% 
  kable_styling()

## `summarise()` regrouping output by 'account_id' (override with `.groups` argument)

## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)

experiment_group	products	users
control	2	32
variant_1	2	44

res <- prop.test(x = c(30, 40), n = c(12080, 11656), alternative = "two.sided")
res

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(30, 40) out of c(12080, 11656)
## X-squared = 1.5059, df = 1, p-value = 0.2198
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.0024163450  0.0005198145
## sample estimates:
##      prop 1      prop 2 
## 0.002483444 0.003431709

explain(res)

This was a two-sample proportion test of the null hypothesis that the true population proportions are equal. Using a significance level of 0.05, we do not reject the null hypothesis, and cannot conclude that two population proportions are different from one another. The observed difference in proportions is 0.000948265282468285. The observed proportion for the first group is 0.00248344370860927 (30 events out of a total sample size of 12,080). For the second group, the observed proportion is 0.00343170899107756 (40, out of a total sample size of 11,656).

The confidence interval for the true difference in population proportions is (-0.0024163, 5.198144710^{-4}). This interval will contain the true difference in population proportions 95 times out of 100.

The p-value for this test is 0.2197602. This, formally, is defined as the probability – if the null hypothesis is true – of observing a difference in sample proportions that is as or more extreme than the difference in sample proportions from this data set. In this case, this is the probability – if the true population proportions are equal – of observing a difference in sample proportions that is greater than 0.000948265282468285 or less than -0.000948265282468285.

There is no statistical difference between the two proportions of accounts started in each experiment group that ended up starting both a paid Analyze subscription and a paid Publish subscription, as the p-value is 0.219. Put another way, there is about a 1 out of 5 chance that the observed difference is the result of natural variance.

To ensure we have a full picture of the variation group’s behavior with converting to paid for both products, let’s also look at how many trials of both products were started from the experiment landing page and what those trial’s conversion to paid was.

users %>% 
  mutate(has_multiproduct_trial = !is.na(trial_multi_product_bundle_name)) %>% 
  group_by(has_multiproduct_trial) %>% 
  summarise(users = n_distinct(account_id)) %>%
  kable() %>% 
  kable_styling()

## `summarise()` ungrouping output (override with `.groups` argument)

has_multiproduct_trial	users
FALSE	23776
TRUE	115

users %>% 
  mutate(has_multiproduct_trial = !is.na(trial_multi_product_bundle_name)) %>%  
  filter(has_multiproduct_trial) %>% 
  group_by(account_id) %>% 
  summarise(products = n_distinct(subscription_product), paid_subscriptions = n_distinct(subscription_id)) %>% 
  ungroup() %>% 
  filter(products > 1) %>%
  summarise(users = n_distinct(account_id)) %>%
  kable() %>% 
  kable_styling()

## `summarise()` ungrouping output (override with `.groups` argument)

users
3

3 of 115 trials converted to both paid products, which is a 2.6% conversion rate. For comparison, the conversion rate to paid for just Publish trials for both groups was 5%, and the conversion rate to paid for just Analye trials for both groups was 10%.

Finally, let’s also look at the MRR value of all converted trials per experiment group, to see if there was any overall difference in value between the two groups.

users %>% 
  filter(!is.na(subscription_id)) %>% 
  mutate(mrr_value = if_else(subscritpion_billing_cycle == "year", (subscritpion_amount / 12), (subscritpion_amount/1))) %>% 
  group_by(experiment_group) %>% 
  summarise(paying_user_count = n_distinct(account_id), total_mrr_value = round(sum(mrr_value),2), mrr_value_per_account = round(total_mrr_value / paying_user_count,2)) %>%
  kable() %>% 
  kable_styling()

## `summarise()` ungrouping output (override with `.groups` argument)

experiment_group	paying_user_count	total_mrr_value	mrr_value_per_account
control	655	25232.25	38.52
variant_1	654	27187.50	41.57

Given both groups ended up with almost the same number of paying customers, the difference in mrr value per account between the two groups indicates that the stat sig difference in analyze trial starts lead to more overall revenue. This could imply that the realized benefit of this experiment was more exposure to Analyze, not a more convient way to start trials for both products.

Final Results

Given the above observations, the result of the experiment is that there is insufficient evidence to confirm the hypothesis. Though it appears the variation treatment lead to more Analyze trials, and thus a slightly higher mrr value per paying account, the landing page underperformed compared to trials started from other locations, and the number of new users that ended up on a paid subscription to both products had no statistical difference between the two groups.

We recommend persuing an iteration of this experiment, examing both ways to increase click throughs on the experiment CTAs to the solutions landing page, as well as increase the performance of the solutions landing page via changes to messaging and changes to which plans are paired for the solution offered.