8 min read

Reply Login Flow Experiment Results

On Wendesday February 6, we began an experiment in which 50% of visitors to the Reply marketing site were directed to a new signup flow that had been developed by the Core team. This was a test of non-inferiority, i.e. we were testing whether the new signup flow was inferior to the old signup flow.

The results of this experiment suggest that the new signup flow is not inferior and does not negatively impact the number of people that sign up for Reply and complete the onboarding flow. We will therefore move ahead and direct all Reply visitors to the new signup flow.

Measuring Imact on Signups

First let’s plot the number of new Reply users over time.

# set project id
project_id <- "buffer-data"

# sql query
sql_string <- "
  select 
    date(profile_created_at) as signup_date
    , count(distinct id) as reply_users
  from buda_reply_dbt.users
  group by 1
  order by 1 desc
  limit 60
"

# execute the query and store the result
reply_users <- query_exec(sql_string, project = project_id, use_legacy_sql = FALSE, max_pages = 10)

We can see that there is clearly some seasonality, i.e. the number of signups is dependent on the day of the week. Because we don’t quite know which experiment group each user was in (more on that below), we can run a causal impact analysis to estimate the effect that the new signup flow had.

Essentially, we take the number of daily signups before the experiment started and forecast it into the future. Then we compare the forecast with the actual observed values. The difference between the counterfactual (what signups would have been without the experiment) and the observed number of signups is our estimated effect size.

The “pre-intervention period”" includes dates from June 1, 2017 to the week of February 26, which is the week that we announced the price change and the offer to lock in the lower prices. We use the average MRR growth from the dates in the pre-intervention period to create our forecast of what MRR growth would have looked like without the pricing change. In this case, the simple average of is used.

To perform inference, we run the analysis using the CausalImpact command.

# run analysis
impact <- CausalImpact(reply_ts, pre.period, post.period, model.args = list(niter = 5000, nseasons = 7))

# plot impact
plot(impact) +
  labs(title = "Impact on Reply Signups", subtitle = "There is no evidence of a negative impact on signups") +
  buffer_theme()

The pre-intervention period includes all data to the left of the first vertical dotted line, and the post-intervention period includes all data to the right of the second vertical dotted line.

The top panel in the graph shows the actual observed data (the black line) as well as the counterfactual, which is our best guess at what the number of signups would have been had we not introduced the new signup flow.

The second panel displays the estimated effect that the experiment had on signups each day, and the bottom panel shows the cumulative effect over time on signups.

We can see that there does not appear to be evidence of a negative effect on signups. Now, let’s take the same approach but look at complete_onboarding_success events.

Onboarding Completions

We’ll gather the raw events from Redshift and use the same methods to measure the experiment’s effect over time.

select 
  date(a.created_at) as date
  , m.value as visitor_id
  , a.id as event_id
from buda.actions_taken a
left join buda.actions_taken_metadata m
  on a.id = m.action_taken_id
  and m.name = 'visitorId'
where a.action = 'complete_onboarding_success'
and a.created_at >= current_date - 60

Now let’s plot the number of events that occurred each day.

## `summarise()` ungrouping output (override with `.groups` argument)

We can see that the number of events has increased slightly since we started the experiment. This corresponds with the increase in signups that we saw earlier. Let’s run a causal impact analysis.

# run analysis
impact <- CausalImpact(events_ts, pre.period, post.period, model.args = list(niter = 5000, nseasons = 7))

# plot impact
plot(impact) +
  labs(title = "Impact on Onboarding Complete Events", subtitle = "There is no evidence of a negative impact on onboarding") +
  buffer_theme()

Similar to what we saw with signups, there does not appear to be evidence suggesting that the experiment negatively impacted the number of people that completed the onboarding flow.

Thoughts on Experiment Setup

Because we don’t use the Einstein library or something comparable in Reply, we made the decision to randomly assign visitors to experiment groups using actions_taken events. The scope of such an event would include ab_signup_flow_test:old_flow or ab_signup_flow_test:new_flow. Let’s quickly gather all of these events created since February 6, the day we started the experiment.

select
  a.id as action_id
  , v.value as visitor_id
  , left(action, charindex(':', action) - 1) as experiment_name
  , right(action, len(action) - charindex(':', action) - 0) as group
from buda.actions_taken a
left join buda.actions_taken_metadata v
  on a.id = v.action_taken_id
  and v.name = 'visitorId'
where a.action in ('ab_signup_flow_test:old_flow', 'ab_signup_flow_test:new_flow')
and date(a.created_at) >= '2019-02-06' 
group by 1,2,3,4
select
  v.value as visitor_id
  , date(min(a.created_at)) as signup_date
  , count(distinct case when a.action = 'create_account_success' then a.id else null end) as account_creations
  , count(distinct case when a.action = 'complete_onboarding_success' then a.id else null end) as onboarding_completed
from buda.actions_taken a
left join buda.actions_taken_metadata v
  on a.id = v.action_taken_id
  and v.name = 'visitorId'
where a.action in ('create_account_success', 'complete_onboarding_success')
and a.created_at >= '2019-02-06'
group by 1

Now let’s join the two datasets and count the number of users that completed onboarding actions that were in each experiment group.

# join data
onboarding_events <- onboarding_events %>% 
  left_join(test_groups, by = "visitor_id")

# count visitors without groups 
onboarding_events %>% 
  group_by(group) %>% 
  summarise(visitors = n_distinct(visitor_id))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 2
##   group    visitors
##   <chr>       <int>
## 1 new_flow       16
## 2 old_flow       23
## 3 <NA>          134

There appears to be 134 users that have account_created_success or onboarding_completed_success events that have not been assigned to groups. Here are their visitor ids:

onboarding_events %>% 
  filter(is.na(group)) %>% 
  dplyr::select(visitor_id, group, completed_onboarding) %>% 
  unique()
##                               visitor_id group completed_onboarding
## 1   0e7ef03e-140f-42b1-8f62-9d764cde244b  <NA>                FALSE
## 2   33fb3547-5218-4ffc-aacb-9ed0c4bf366c  <NA>                 TRUE
## 3   128ab058-ebe8-4065-8dab-a714c02eaa27  <NA>                FALSE
## 4   5ff81ca8-2c74-40a9-9d1d-f9f171b2c122  <NA>                FALSE
## 5   176c92d1-8461-4e85-8d7d-588eb448c449  <NA>                FALSE
## 6   1970fad8-d6b4-49ac-8929-a4c48c247e31  <NA>                FALSE
## 7   e40f1fa0-8021-415a-80ec-570b2c8c93ae  <NA>                FALSE
## 8   ac2d5419-5a28-4e87-bfd2-8ae94e2cfbc5  <NA>                FALSE
## 9   fcdedf54-a895-47f0-8b75-34d0d84c4889  <NA>                 TRUE
## 10  6e6f079d-0d6d-447a-8bf1-fa173e821598  <NA>                 TRUE
## 11  ad926e66-d716-4cfa-b326-bee4ef5691b2  <NA>                 TRUE
## 12  defef137-1a94-4195-9b95-e44cf3d09f25  <NA>                 TRUE
## 13  ac357212-42c7-401a-965b-e4839ee3c637  <NA>                FALSE
## 14  4ff7b1a4-5b8e-4022-bd39-706be850ba8d  <NA>                FALSE
## 15  24d1172a-e5c4-4deb-9d68-926124f5d7a4  <NA>                FALSE
## 16  34270feb-18ff-4088-a4a4-1e1034f37153  <NA>                FALSE
## 17  8735c0ad-b571-48af-9119-b4cd1da2de5b  <NA>                FALSE
## 18  d9aa3c46-781f-4686-bd61-575d1a72cd9c  <NA>                 TRUE
## 19  73cfda34-197a-4fe0-a423-782150e79c42  <NA>                 TRUE
## 20  973d3355-79c6-4fe8-a8d2-c4a435bbe5b0  <NA>                 TRUE
## 21  6c64f4c5-18b7-4b3d-88fc-c63364a6ccc5  <NA>                 TRUE
## 22  ccd6573b-03fa-4904-af6f-bdf7d9519bd6  <NA>                FALSE
## 23  114854d5-09d4-44b0-b502-801b36b66b54  <NA>                FALSE
## 24  6995c2eb-18bb-47dc-a275-696c4437b955  <NA>                FALSE
## 25  602f213d-b826-42af-afa1-43f6aba02cfc  <NA>                 TRUE
## 26  89c768ce-164b-4822-9066-91288555bd60  <NA>                FALSE
## 27  cfa30769-314b-4de5-b233-1859f82f0e1f  <NA>                 TRUE
## 28  7bd2f76e-e164-4321-8b9f-e75459607272  <NA>                FALSE
## 29  47ee785a-8618-490b-8df0-5ff36592768e  <NA>                 TRUE
## 30  a43dbe72-2fe3-4356-8662-143fc4460b0c  <NA>                 TRUE
## 31  f7b469fb-0bc7-417a-81e5-5d18a12b1483  <NA>                FALSE
## 32  38854c77-801a-4c12-8f89-6360c92c9de2  <NA>                 TRUE
## 33  030d0754-48cb-419b-918a-32f291723cd4  <NA>                 TRUE
## 34  9e533544-d823-4fc6-8556-37f320313eff  <NA>                 TRUE
## 35  a10e1aab-ed1c-456b-b270-f95b0882e912  <NA>                 TRUE
## 36  b8293d04-7f48-4092-96f6-b9886953da19  <NA>                FALSE
## 37  02a4dbca-8fdc-4e94-8723-cacdd9de8bbc  <NA>                 TRUE
## 38  f79e0e82-cf25-4335-9473-7fb26f772c02  <NA>                 TRUE
## 39  23aeaf74-9436-46a9-bae2-712644bc29be  <NA>                 TRUE
## 40  163b2656-ac59-4f96-b530-cb5bc0a4f46b  <NA>                FALSE
## 41  2be5e2c2-24a0-4fcd-86e2-f6151126abec  <NA>                 TRUE
## 42  a96ed07d-93c8-47db-86a1-e964b625ff4e  <NA>                FALSE
## 43  e26ae21b-f968-4aca-8081-0be046513d61  <NA>                FALSE
## 44  a733c571-1bfa-456a-b53d-5107525df81f  <NA>                 TRUE
## 45  cea1b374-de45-4056-a072-95da08a5c9d8  <NA>                 TRUE
## 46  02f0f57f-88d0-479f-b26d-4d613381af54  <NA>                FALSE
## 47  d1496bc8-1ac1-4725-9b20-b6b11c901834  <NA>                 TRUE
## 48  23eb6aa0-3e1e-443d-9b8f-0c61ffb2b36f  <NA>                FALSE
## 49  552e928a-f785-4564-89a7-ec64ce859e6f  <NA>                 TRUE
## 50  f7548d1c-a5b4-4052-9282-410eed166849  <NA>                FALSE
## 51  2a6cb173-61ac-426d-a103-d4f2097bf6ea  <NA>                FALSE
## 52  73a777d1-4c47-453f-9cd5-62a4a704d40d  <NA>                FALSE
## 53  d944faf4-da70-4984-b6b2-171bf2075dbc  <NA>                 TRUE
## 54  c99580f5-e6b2-4773-806d-b410a028585b  <NA>                FALSE
## 55  fcc7b332-3f4d-4a34-989e-bb024001e9b2  <NA>                 TRUE
## 56  4b121e16-5f16-4d2d-8708-bcc7e5a6030d  <NA>                FALSE
## 57  fa967281-fafc-4d3d-8fb7-4af8e939090c  <NA>                FALSE
## 58  0be0f360-7db5-48b6-9b5b-bee8aa5cf6a1  <NA>                FALSE
## 59  77ce515d-5eb4-46b3-a2bc-5e14ff0b9e6b  <NA>                 TRUE
## 60  45c73c04-44e0-4b2a-8b88-57edbd2bf896  <NA>                 TRUE
## 61  a4756575-87fd-44a6-b02e-582130fe9b83  <NA>                 TRUE
## 62  f7f4eb0d-6a15-476d-84aa-840f1593e99d  <NA>                FALSE
## 63  faf67496-4342-4925-98b8-41e986dcf441  <NA>                FALSE
## 64  7e34ca61-a9cd-4160-abe6-2d5cb1703379  <NA>                FALSE
## 65  3e3129cb-dfc7-4704-b2d2-43134afe7d32  <NA>                FALSE
## 66  a02e76fd-57e0-4a35-80f9-9517f18b614e  <NA>                 TRUE
## 67  59ddde38-241f-4cbf-a4c2-3a30d12ebf77  <NA>                 TRUE
## 68  f5e40b9e-e608-4893-956a-b2724cc0d43d  <NA>                FALSE
## 69  03714413-e6ab-4fd1-a0b6-c115b3e4b128  <NA>                FALSE
## 70  5bd83ebe-6f40-4be1-a66a-980b875c3c92  <NA>                FALSE
## 71  cd1df4f8-dc03-47c9-8786-8f5ab1e376fb  <NA>                 TRUE
## 72  9a9315b4-5c64-4bdd-a303-a24df260a409  <NA>                 TRUE
## 73  28100e15-df35-4d37-82f2-0db3bfbac385  <NA>                 TRUE
## 74  f15b44a0-4047-4e94-a1cc-a58a3f7a7901  <NA>                 TRUE
## 75  a99e18da-71a4-40d0-8ad4-a83c0cdd68d5  <NA>                 TRUE
## 76  842dfb3d-31aa-4c2f-b7fd-b33c6fc1380e  <NA>                FALSE
## 77  3027b5ca-9dd4-46ee-bf0f-642641ba4233  <NA>                 TRUE
## 78  a05361af-ccb9-4af0-afbd-863b6cdccb87  <NA>                FALSE
## 79  6a12fdd7-17c2-4577-8304-792fab040f5a  <NA>                FALSE
## 80  3f41a360-c8c9-4a2d-8867-447bbf7dbd1f  <NA>                FALSE
## 81  e67b061f-739d-4398-825e-6fb180ae9690  <NA>                FALSE
## 82  419b9fbd-2cfd-4e29-b444-5875e717da5d  <NA>                 TRUE
## 83  cd218279-5875-4383-b476-d4918e2c76c8  <NA>                 TRUE
## 84  93611416-cc27-405c-be4c-b9dcb79f104f  <NA>                 TRUE
## 85  90d445f2-9231-4bc8-9df2-de7d5b9f9a6d  <NA>                 TRUE
## 86  17acf001-bb6f-4f65-a320-4111649de044  <NA>                 TRUE
## 87  2f469bf2-c479-40b4-92d0-70adacf88a76  <NA>                 TRUE
## 88  a871992c-11ae-49cf-b55b-0af448e299af  <NA>                FALSE
## 89  3034938e-3566-4b74-9f40-f4ac3f938f51  <NA>                FALSE
## 90  657482ad-9746-4c78-81c8-65e5a3707daf  <NA>                FALSE
## 91  87c2931e-5808-4299-986b-16de0f21f569  <NA>                 TRUE
## 92  c34e7a20-2b0c-4ae4-8551-bbd23d3040af  <NA>                 TRUE
## 93  8ad214d5-adaa-4dc7-9601-a2ff221ab5eb  <NA>                FALSE
## 94  81bf138e-7cd4-4a35-bf73-b13a63e53aa8  <NA>                FALSE
## 95  83a9828e-4503-45f2-ad86-a0cc5036ceb6  <NA>                FALSE
## 96  641f0a47-d5e9-4bd6-a929-47109aacc065  <NA>                 TRUE
## 97  de34ba83-0f1e-47e3-893c-de24611055aa  <NA>                 TRUE
## 98  b2451e71-7b38-4e79-94e7-0372beae4bb6  <NA>                FALSE
## 99  479c34a4-9d1e-442a-9906-773ce080e99a  <NA>                FALSE
## 100 7f1dc248-9883-4c25-9e1b-15c7e59aa076  <NA>                FALSE
## 101 13776a80-b948-4b83-b862-55a04ee6026e  <NA>                 TRUE
## 102 61aa0dab-c152-4286-863f-07c94f675ce0  <NA>                FALSE
## 103 d7891605-a8b3-4a0e-ba04-99f59563035a  <NA>                 TRUE
## 104 9d65dafb-6d8a-4add-a486-4fb721f41ab7  <NA>                 TRUE
## 105 a1de1e02-8d16-4c16-91bd-7e6eceef9613  <NA>                FALSE
## 106 829266b1-d805-47f6-bdca-0d888a402c9a  <NA>                 TRUE
## 107 6bf6dc20-7dda-4324-9893-00d693ad0cfc  <NA>                FALSE
## 108 f5efa976-5820-4fa9-8d2a-8751a79124c1  <NA>                 TRUE
## 109 0d06d6ab-9f85-4b26-8be3-46646210d383  <NA>                FALSE
## 110 6526ea81-a940-4e66-bd78-711871020fbf  <NA>                 TRUE
## 111 38e507a2-22d5-427d-91f9-a3fb8ce91d7f  <NA>                 TRUE
## 112 f38037ab-0ec6-402f-b072-7656ca260fe2  <NA>                 TRUE
## 113 4ae16709-4ef8-4873-ac3b-8c31dba408a9  <NA>                FALSE
## 114 465eb369-c911-408b-a9a5-dccfc8c25a5d  <NA>                FALSE
## 115 f302adb6-6742-4a7a-8024-4493b44ba406  <NA>                 TRUE
## 116 939247a6-edfe-4893-b229-0df3057c85d0  <NA>                FALSE
## 117 23952256-cbed-40f9-8c9d-cfb797d1fc39  <NA>                 TRUE
## 118 39fc539c-cdd0-47f7-a3a5-c6deae04d3f3  <NA>                FALSE
## 119 2c318434-6b16-4006-8577-178144fefcd5  <NA>                FALSE
## 120 91c77fc9-9e1f-4509-a9c2-eaa3196a9cd5  <NA>                FALSE
## 121 672f7025-d97f-46a0-ba28-eb61ca32bc56  <NA>                 TRUE
## 122 df67fd2b-0f7f-403b-b3e9-7316cc13a2e2  <NA>                 TRUE
## 123 6e3a7260-dc53-4112-afa0-2fa66633eb9b  <NA>                 TRUE
## 124 561d7258-ae2c-4046-a8bb-ed857d8a5b0a  <NA>                 TRUE
## 125 6f68ff78-52f4-4890-a692-d0891fa0603d  <NA>                FALSE
## 126 5122dedb-2d0e-42ff-bd1b-6a9545b39ff1  <NA>                 TRUE
## 127 17872429-72de-46fa-bb9e-29f3562dde34  <NA>                FALSE
## 128 6f1ce588-e597-4bee-8e09-699006f59b8c  <NA>                 TRUE
## 129 aaff4ab7-be26-4f85-94ce-6fea49eb5479  <NA>                FALSE
## 130 f48ac8c8-0c19-4ee4-b935-9ea6ffff7a7a  <NA>                 TRUE
## 131 c449eba4-63ea-47cc-85c9-858111b67746  <NA>                FALSE
## 132 3367f798-da06-4093-b2fb-2b48d821d28e  <NA>                 TRUE
## 133 4b6eda80-0a66-44b7-b244-13a4d8403266  <NA>                 TRUE
## 134 a8c32ace-5fb5-483c-a5b5-37d225fd2718  <NA>                 TRUE

In future experiments it will be more important to make sure that each user that triggers a “success event” is assigned to one of the experiment groups. :)