3 min read

Analyzing the Publish Analyze Bundle Experiment

TLDR

As of December 10, 40 users in the control group and 37 in the treatment are paying for Analyze and Publish.

Data Collection

We’ll use the SQL query below to retrieve data from Bigquery.

# connect to bigquery
con <- dbConnect(
  bigrquery::bigquery(),
  project = "buffer-data"
)

# define sql query
sql <- "
  select distinct
    e.user_id
    , e.anonymous_id
    , e.experiment_group
    , a.email
    , a.publish_user_id
    , a.analyze_user_id
    , s.customer_id
    , s.id as subscription_id
    , s.plan_id
    , s.product
  from segment_marketing.experiment_viewed e
  left join dbt_buffer.core_accounts a
    on e.user_id = a.id
  left join dbt_buffer.publish_users u
    on u.id = a.publish_user_id
  left join segment_stripe.customers_view c
    on u.stripe_customer_id = c.id
  left join dbt_buffer.stripe_paid_subscriptions s
    on c.id = s.customer_id
    and s.status = 'active'
    and s.created_at > e.original_timestamp
  left join dbt_buffer.stripe_plans p
    on s.plan_id = p.id
  where e.experiment_id = 'eid19_publish_analyze_bundle_ab'
"
  
# query BQ
users <- dbGetQuery(con, sql)

There are around 303K visitors in the experiment.

Exploratory Analysis

Let’s count the number of visitors in each experiment group.

# count visitors in groups
users %>% 
  group_by(experiment_group) %>% 
  summarise(visitors = n_distinct(anonymous_id)) %>% 
  mutate(prop = visitors / sum(visitors))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 2 x 3
##   experiment_group visitors  prop
##   <chr>               <int> <dbl>
## 1 control            142100 0.500
## 2 variant_1          142141 0.500

Good, it’s about a 50/50 split. Let’s see how many have an active publish subscription.

# count users with active publish subscription
users %>% 
  mutate(has_publish_sub = (!is.na(subscription_id) & product == "publish")) %>% 
  group_by(experiment_group, has_publish_sub) %>% 
  summarise(users = n_distinct(user_id)) %>% 
  mutate(prop = users / sum(users)) 
## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)
## # A tibble: 4 x 4
## # Groups:   experiment_group [2]
##   experiment_group has_publish_sub users   prop
##   <chr>            <lgl>           <int>  <dbl>
## 1 control          FALSE           45547 0.977 
## 2 control          TRUE             1096 0.0235
## 3 variant_1        FALSE           45553 0.977 
## 4 variant_1        TRUE             1060 0.0227

Around 2.3% of users in each experiment group have an active Publish subscription. Now let’s see how many have active Analyze subscriptions.

# count users with active analyze subscription
users %>% 
  mutate(has_publish_sub = (!is.na(subscription_id) & product == "publish"),
         has_analyze_sub = (!is.na(subscription_id) & product == "analyze")) %>% 
  group_by(experiment_group, has_analyze_sub) %>% 
  summarise(users = n_distinct(user_id)) %>% 
  mutate(prop = users / sum(users)) 
## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)
## # A tibble: 4 x 4
## # Groups:   experiment_group [2]
##   experiment_group has_analyze_sub users    prop
##   <chr>            <lgl>           <int>   <dbl>
## 1 control          FALSE           46439 0.997  
## 2 control          TRUE              159 0.00341
## 3 variant_1        FALSE           46408 0.996  
## 4 variant_1        TRUE              165 0.00354

Around 0.34% of users in the control group are paying for an Analyze subscription, and around 0.35% of users in the treatment group are paying for an Analyze subscription. Now let’s compare the percentage that are paying for both.

# count users paying for both publish and analyze
users %>% 
  filter(!is.na(product) & product != "reply" & !is.na(subscription_id)) %>% 
  group_by(user_id, experiment_group) %>% 
  summarise(products = n_distinct(product)) %>% 
  ungroup() %>% 
  filter(products > 1) %>% 
  group_by(experiment_group, products) %>% 
  summarise(users = n_distinct(user_id))
## `summarise()` regrouping output by 'user_id' (override with `.groups` argument)
## `summarise()` regrouping output by 'experiment_group' (override with `.groups` argument)
## # A tibble: 2 x 3
## # Groups:   experiment_group [2]
##   experiment_group products users
##   <chr>               <int> <int>
## 1 control                 2    45
## 2 variant_1               2    45

As of December 11, 45 users in the control group and 45 in the treatment are paying for Analyze and Publish.