6 min read

Audience Growth of Analyze Users

In this analysis we will explore the growth rates of the total audiences of active Analyze users’ profiles.

Data Collection

Tom has exported a file of json files containing the history of follower counts for 536 Instagram profiles. Let’s read them into this R session.

library(jsonlite)
library(purrr)

# get filenames
filenames <- list.files("~/Downloads/analyze/users/", pattern = "*.json", full.names = TRUE) 

# create empty data frame
users <- data.frame(day = character(),
                    followers = numeric(), 
                    numberOfProfiles = numeric(), 
                    user = character(),
                    stringsAsFactors = FALSE) 

# loop through files 
for (name in filenames) {
  
  user <- read_json(name, simplifyVector = TRUE) %>% 
    mutate(user = gsub("/Users/julianwinternheimer/Downloads/analyze/users//", "", name),
         user = gsub(".json", "", user))
  
  users <- rbind(users, user)
  
}

# set column names
names(users) <- c("date", "followers", "profiles", "user_id")

# set date
users$date <- as.Date(users$date)

# save data
saveRDS(users, "analyze_user_followers.rds")

Now let’s summarise the data to get a sense of what it looks like.

# skim followed by count
summary(users$followers)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##   -88135      875     5368    67337    27031 25501687

There are some missing values and negative values. Let’s remove these and replace them with the average of the preceding and following follower count for each user.

# fill missing values with avg
filled <- users %>% 
  mutate(followers = ifelse(followers <= 0, NA, followers)) %>% 
  group_by(user_id, date) %>% 
  summarise(followers = mean(followers, na.rm = TRUE)) %>% 
  mutate(prev_followers = lag(followers, 1),
         next_followers = lead(followers, 1),
         avg_followers = (prev_followers + next_followers) / 2,
         followers = coalesce(followers, avg_followers)) %>% 
  select(-c(prev_followers:avg_followers)) %>% 
  filter(!is.na(followers))
## `summarise()` regrouping output by 'user_id' (override with `.groups` argument)

Now let’s summarise the follower counts again.

# skim followed by count
summary(filled$followers)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        1     1343     6787    72772    30587 25501687

That looks much better. Now we want to assign these users into buckets based on the maximum number of followers they have.

# follower buckets
buckets <- c(0, 10000, 100000, 200000, 500000, 1000000, Inf)

# assign profiles to buckets
follower_buckets <- filled %>% 
  group_by(user_id) %>% 
  summarise(max_followers = max(followers, na.rm = TRUE)) %>% 
  mutate(bucket = cut(max_followers, buckets, dig.lab = 10))
## `summarise()` ungrouping output (override with `.groups` argument)

Distribution of Follower Counts

Let’s plot the distribution follower counts for these users We’ll use the maximum follower count for each user here. We see that there is a long tail of user with many followers.

Distribution of Follower Growth

Let’s calculate the rolling 7-day and 30-day growth rates for each user

# calculate growth rates
growth_rates <- filled %>% 
  group_by(user_id, ,date) %>% 
  summarise(followers = mean(followers)) %>% 
  mutate(followers_last_week = lag(followers, 7),
         followers_last_month = lag(followers, 30),
         weekly_growth_rate = followers / followers_last_week - 1,
         monthly_growth_rate = followers / followers_last_month - 1) %>% 
  left_join(follower_buckets, by = "user_id")
## `summarise()` regrouping output by 'user_id' (override with `.groups` argument)

Let’s look at the rolling 7-day growth rate of a single user

The big changes in the growth rate of followers for this user came when they added and removed profiles to the anlayze account. This will be true for all users that add and remove profiles.

Because of these extreme jumps in the growth rates, we will take the median 7-day and 30-day growth rates for each user.

# get median growth rates
growth_by_user <- growth_rates %>% 
  group_by(user_id, bucket) %>% 
  summarise(users = n_distinct(user_id),
            med_7_day_growth = median(weekly_growth_rate, na.rm = TRUE),
            med_30_day_growth = median(monthly_growth_rate, na.rm = TRUE))
## `summarise()` regrouping output by 'user_id' (override with `.groups` argument)

Weekly Growth Rate

Let’s plot the distribution of weekly growth rates for each follower bucket.

# summarise distributions
growth_by_user %>% 
  ungroup() %>% 
  filter(!is.na(med_7_day_growth) & med_7_day_growth <= 1) %>% 
  group_by(bucket) %>% 
  select(bucket, med_7_day_growth) %>% 
  skim()
Table 1: Data summary
Name Piped data
Number of rows 372
Number of columns 2
_______________________
Column type frequency:
numeric 1
________________________
Group variables bucket

Variable type: numeric

skim_variable bucket n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
med_7_day_growth (0,10000] 0 1 0.02 0.03 0 0.00 0.01 0.02 0.20 ▇▁▁▁▁
med_7_day_growth (10000,100000] 0 1 0.02 0.02 0 0.00 0.01 0.02 0.09 ▇▂▁▁▁
med_7_day_growth (100000,200000] 0 1 0.01 0.01 0 0.00 0.01 0.01 0.04 ▇▂▂▁▁
med_7_day_growth (200000,500000] 0 1 0.01 0.01 0 0.01 0.01 0.02 0.08 ▇▂▁▁▁
med_7_day_growth (500000,1000000] 0 1 0.01 0.01 0 0.00 0.01 0.01 0.03 ▅▇▁▁▂
med_7_day_growth (1000000,Inf] 0 1 0.01 0.01 0 0.01 0.01 0.02 0.03 ▇▇▂▂▁

This gives us summary statistics for each profile bucket.

  • For the 114 users with 0-10k followers, the median 7-day growth rate is 1% and the average is 1.8%.
  • For the 166 users with 10-100k followers, the median 7-day growth rate is 1% and the average is 1.6%.
  • For the 26 users with 100k-200k followers, the median 7-day growth rate is 0.65% and the average is 1.2%.
  • For the 36 users with 200k-500k followers, the median 7-day growth rate is 1.2% and the average is 1.3%.
  • For the 12 users with 500k-1m followers, the median 7-day growth rate is 0.87% and the average is 0.89%.
  • For the 18 users with over 1m followers, the median 7-day growth rate is 1.5% and the average is 1.2%.

Monthly Growth Rates

Let’s plot the distribution of monthly growth rates for each follower bucket.

# summarise distributions
growth_by_user %>% 
  ungroup() %>% 
  filter(!is.na(med_30_day_growth) & med_30_day_growth <= 1) %>% 
  group_by(bucket) %>% 
  select(bucket, med_30_day_growth) %>% 
  skim()
Table 2: Data summary
Name Piped data
Number of rows 367
Number of columns 2
_______________________
Column type frequency:
numeric 1
________________________
Group variables bucket

Variable type: numeric

skim_variable bucket n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
med_30_day_growth (0,10000] 0 1 0.09 0.11 0.00 0.02 0.06 0.13 0.78 ▇▂▁▁▁
med_30_day_growth (10000,100000] 0 1 0.10 0.12 -0.01 0.03 0.06 0.11 0.94 ▇▁▁▁▁
med_30_day_growth (100000,200000] 0 1 0.08 0.13 0.00 0.02 0.04 0.09 0.65 ▇▁▁▁▁
med_30_day_growth (200000,500000] 0 1 0.08 0.08 0.00 0.04 0.06 0.10 0.43 ▇▂▁▁▁
med_30_day_growth (500000,1000000] 0 1 0.10 0.16 0.01 0.03 0.04 0.09 0.59 ▇▁▁▁▁
med_30_day_growth (1000000,Inf] 0 1 0.09 0.11 0.00 0.04 0.06 0.11 0.45 ▇▃▁▁▁

Here are summary statistics for each profile bucket.

  • For the 114 users with 0-10k followers, the median 30-day growth rate is 5.5% and the average is 9.3%.
  • For the 166 users with 10-100k followers, the median 30-day growth rate is 6.1% and the average is 9.7%.
  • For the 26 users with 100k-200k followers, the median 30-day growth rate is 4.5% and the average is 8%.
  • For the 36 users with 200k-500k followers, the median 30-day growth rate is 6% and the average is 8.1%.
  • For the 12 users with 500k-1m followers, the median 30-day growth rate is 3.6% and the average is 9.7%.
  • For the 18 users with over 1m followers, the median 30-day growth rate is 6% and the average is 9.4%.