# Audience Growth of Analyze Users

In this analysis we will explore the growth rates of the total audiences of active Analyze users’ profiles.

## Data Collection

Tom has exported a file of json files containing the history of follower counts for 536 Instagram profiles. Let’s read them into this R session.

library(jsonlite)
library(purrr)

# get filenames

# create empty data frame
users <- data.frame(day = character(),
followers = numeric(),
numberOfProfiles = numeric(),
user = character(),
stringsAsFactors = FALSE)

# loop through files
for (name in filenames) {

user <- read_json(name, simplifyVector = TRUE) %>%
user = gsub(".json", "", user))

users <- rbind(users, user)

}

# set column names
names(users) <- c("date", "followers", "profiles", "user_id")

# set date
users$date <- as.Date(users$date)

# save data
saveRDS(users, "analyze_user_followers.rds")

Now let’s summarise the data to get a sense of what it looks like.

# skim followed by count
summary(users$followers) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## -88135 875 5368 67337 27031 25501687 There are some missing values and negative values. Let’s remove these and replace them with the average of the preceding and following follower count for each user. # fill missing values with avg filled <- users %>% mutate(followers = ifelse(followers <= 0, NA, followers)) %>% group_by(user_id, date) %>% summarise(followers = mean(followers, na.rm = TRUE)) %>% mutate(prev_followers = lag(followers, 1), next_followers = lead(followers, 1), avg_followers = (prev_followers + next_followers) / 2, followers = coalesce(followers, avg_followers)) %>% select(-c(prev_followers:avg_followers)) %>% filter(!is.na(followers)) Now let’s summarise the follower counts again. # skim followed by count summary(filled$followers)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.
##        1     1343     6787    72772    30587 25501687

That looks much better. Now we want to assign these users into buckets based on the maximum number of followers they have.

# follower buckets
buckets <- c(0, 10000, 100000, 200000, 500000, 1000000, Inf)

# assign profiles to buckets
follower_buckets <- filled %>%
group_by(user_id) %>%
summarise(max_followers = max(followers, na.rm = TRUE)) %>%
mutate(bucket = cut(max_followers, buckets, dig.lab = 10))

## Distribution of Follower Counts

Let’s plot the distribution follower counts for these users We’ll use the maximum follower count for each user here. We see that there is a long tail of user with many followers.

## Distribution of Follower Growth

Let’s calculate the rolling 7-day and 30-day growth rates for each user

# calculate growth rates
growth_rates <- filled %>%
group_by(user_id, ,date) %>%
summarise(followers = mean(followers)) %>%
mutate(followers_last_week = lag(followers, 7),
followers_last_month = lag(followers, 30),
weekly_growth_rate = followers / followers_last_week - 1,
monthly_growth_rate = followers / followers_last_month - 1) %>%
left_join(follower_buckets, by = "user_id")

Let’s look at the rolling 7-day growth rate of a single user

The big changes in the growth rate of followers for this user came when they added and removed profiles to the anlayze account. This will be true for all users that add and remove profiles.

Because of these extreme jumps in the growth rates, we will take the median 7-day and 30-day growth rates for each user.

# get median growth rates
growth_by_user <- growth_rates %>%
group_by(user_id, bucket) %>%
summarise(users = n_distinct(user_id),
med_7_day_growth = median(weekly_growth_rate, na.rm = TRUE),
med_30_day_growth = median(monthly_growth_rate, na.rm = TRUE))

## Weekly Growth Rate

Let’s plot the distribution of weekly growth rates for each follower bucket.

# summarise distributions
growth_by_user %>%
ungroup() %>%
filter(!is.na(med_7_day_growth) & med_7_day_growth <= 1) %>%
group_by(bucket) %>%
select(bucket, med_7_day_growth) %>%
skim()
## Skim summary statistics
##  n obs: 372
##  n variables: 2
##  group variables: bucket
##
## ── Variable type:numeric ─────────────────────────────────────────────────────────────────────────────────────────────
##            bucket         variable missing complete   n   mean     sd
##         (0,10000] med_7_day_growth       0      114 114 0.018  0.026
##    (10000,100000] med_7_day_growth       0      166 166 0.016  0.018
##   (100000,200000] med_7_day_growth       0       26  26 0.012  0.011
##   (200000,500000] med_7_day_growth       0       36  36 0.013  0.014
##  (500000,1000000] med_7_day_growth       0       12  12 0.0089 0.0079
##     (1000000,Inf] med_7_day_growth       0       18  18 0.012  0.0087
##           p0    p25    p50    p75  p100     hist
##     -0.00045 0.0021 0.01   0.024  0.2   ▇▂▁▁▁▁▁▁
##     -0.0014  0.0048 0.01   0.019  0.094 ▇▅▂▁▁▁▁▁
##      0.00049 0.0047 0.0065 0.012  0.042 ▇▃▃▂▁▁▁▂
##      0       0.0067 0.012  0.016  0.082 ▇▇▂▁▁▁▁▁
##      7.4e-05 0.0048 0.0068 0.0087 0.027 ▆▆▇▁▁▁▂▂
##  1e-06       0.0065 0.011  0.015  0.035 ▅▆▇▅▂▂▁▂

This gives us summary statistics for each profile bucket.

• For the 114 users with 0-10k followers, the median 7-day growth rate is 1% and the average is 1.8%.
• For the 166 users with 10-100k followers, the median 7-day growth rate is 1% and the average is 1.6%.
• For the 26 users with 100k-200k followers, the median 7-day growth rate is 0.65% and the average is 1.2%.
• For the 36 users with 200k-500k followers, the median 7-day growth rate is 1.2% and the average is 1.3%.
• For the 12 users with 500k-1m followers, the median 7-day growth rate is 0.87% and the average is 0.89%.
• For the 18 users with over 1m followers, the median 7-day growth rate is 1.5% and the average is 1.2%.

## Monthly Growth Rates

Let’s plot the distribution of monthly growth rates for each follower bucket.

# summarise distributions
growth_by_user %>%
ungroup() %>%
filter(!is.na(med_30_day_growth) & med_30_day_growth <= 1) %>%
group_by(bucket) %>%
select(bucket, med_30_day_growth) %>%
skim()
## Skim summary statistics
##  n obs: 367
##  n variables: 2
##  group variables: bucket
##
## ── Variable type:numeric ─────────────────────────────────────────────────────────────────────────────────────────────
##            bucket          variable missing complete   n  mean    sd
##         (0,10000] med_30_day_growth       0      111 111 0.093 0.11
##    (10000,100000] med_30_day_growth       0      165 165 0.097 0.12
##   (100000,200000] med_30_day_growth       0       26  26 0.08  0.13
##   (200000,500000] med_30_day_growth       0       36  36 0.081 0.081
##  (500000,1000000] med_30_day_growth       0       12  12 0.097 0.16
##     (1000000,Inf] med_30_day_growth       0       17  17 0.094 0.11
##           p0   p25   p50   p75 p100     hist
##      -0.0041 0.018 0.055 0.13  0.78 ▇▃▁▁▁▁▁▁
##      -0.0057 0.028 0.061 0.11  0.94 ▇▂▁▁▁▁▁▁
##       0.0036 0.022 0.045 0.09  0.65 ▇▂▁▁▁▁▁▁
##       0      0.037 0.06  0.1   0.43 ▇▆▃▁▁▁▁▁
##       0.015  0.026 0.036 0.089 0.59 ▇▂▁▁▁▁▁▁
##  -2e-04      0.041 0.06  0.11  0.45 ▇▅▂▁▁▁▁▁

Here are summary statistics for each profile bucket.

• For the 114 users with 0-10k followers, the median 30-day growth rate is 5.5% and the average is 9.3%.
• For the 166 users with 10-100k followers, the median 30-day growth rate is 6.1% and the average is 9.7%.
• For the 26 users with 100k-200k followers, the median 30-day growth rate is 4.5% and the average is 8%.
• For the 36 users with 200k-500k followers, the median 30-day growth rate is 6% and the average is 8.1%.
• For the 12 users with 500k-1m followers, the median 30-day growth rate is 3.6% and the average is 9.7%.
• For the 18 users with over 1m followers, the median 30-day growth rate is 6% and the average is 9.4%.