# Audience Growth Rate of Instagram Profiles

In this analysis we will explore the growth rates of the Instagram profiles of active Analyze users. These profiles all have more than 100k followers as of today.

## Data Collection

Tom has exported a file of json files containing the history of follower counts for 536 Instagram profiles. Let’s read them into this R session.

library(jsonlite)
library(janitor)

# clean column names
profiles <- clean_names(profiles) %>%
mutate(date = as.Date(day)) %>%
rename(current_followers = followed_by_count) %>%
select(user_id, current_followers)

# set file names
files <- dir(path, pattern = "*.json")

# loop through files and map data
followers <- files %>%
map_df(~fromJSON(file.path(path, .), flatten = TRUE))

# clean names
followers <- clean_names(followers)

# set date
followers <- followers %>%
mutate(date = as.Date(day)) %>%
select(-day) %>%
inner_join(profiles, by = "user_id")

# save data
saveRDS(followers, "ig_followers_100k.rds")

Let’s briefly skim the data to get a sense of what it looks like.

# skim followed by count
summary(followers\$followed_by_count)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
## -365063  107958  172543  386691  339284 7900228      73

We see that there are a lot of missing values for username and some missing values for followed_by_count. In addition, there are negative values in followed_by_count we’ll need to address. Let’s look at some of these.

# negative follower counts
followers %>%
filter(followed_by_count < 0) %>%
head()
##   followed_by_count           user_id         username       date
## 1             -2741 17841400018960244           pacers 2018-11-04
## 2              -269 17841400039956300            lions 2018-11-04
## 3              -365 17841400040295328 pegasusyayinlari 2018-11-04
## 4              -139 17841400040937216             <NA> 2018-11-04
## 5              -700 17841400054473477    markmansonnet 2018-11-04
## 6              -209 17841400060450692  globalstreetart 2018-11-04
##   current_followers
## 1           1618214
## 2            203478
## 3            225235
## 4            370798
## 5            228932
## 6            250632

Below we fill the missing values with the average of the preceding and following follower counts.

# fill missing values with avg
filled <- followers %>%
mutate(followed_by_count = ifelse(followed_by_count <= 0, NA, followed_by_count)) %>%
group_by(user_id, date) %>%
summarise(followed_by_count = mean(followed_by_count)) %>%
mutate(prev_followers = lag(followed_by_count, 1),
avg_followers = (prev_followers + next_followers) / 2,
followed_by_count = coalesce(followed_by_count, avg_followers)) %>%
select(-c(prev_followers:avg_followers)) %>%
filter(!is.na(followed_by_count))

Now I want to assign these profiles into buckets based on the maximum number of followers they have.

# follower buckets
buckets <- c(0, 100000, 200000, 500000, 1000000, Inf)

# assign profiles to buckets
follower_buckets <- filled %>%
group_by(user_id) %>%
summarise(max_followers = max(followed_by_count, na.rm = TRUE)) %>%
mutate(bucket = cut(max_followers, buckets, dig.lab = 10))

## Distribution of Follower Counts

Let’s plot the distribution follower counts for these profiles. We’ll use the maximum follower count for each profile here. We see that there is a long tail of profiles with many followers.

## Distribution of Follower Growth

Let’s calculate the rolling 7-day and 30-day growth rates for each profile.

# calculate growth rates
growth_rates <- filled %>%
group_by(user_id, date) %>%
summarise(followers = mean(followed_by_count)) %>%
mutate(followers_last_week = lag(followers, 7),
followers_last_month = lag(followers, 30),
weekly_growth_rate = followers / followers_last_week - 1,
monthly_growth_rate = followers / followers_last_month - 1) %>%
left_join(follower_buckets, by = "user_id")

Let’s look at the rolling 7-day growth rate of a single profile.

Now, for each profile, we average the 7-day and 30-day growth rates.

# get average growth rates
growth_by_profile <- growth_rates %>%
group_by(user_id, bucket) %>%
summarise(n = n(),
avg_7_day_growth = mean(weekly_growth_rate, na.rm = TRUE),
avg_30_day_growth = mean(monthly_growth_rate, na.rm = TRUE))

## Weekly Growth Rate

Now let’s plot the distribution of the weekly growth rate for each profile.

Let’s summarise this distribution to get a better sense of weekly growth.

growth_by_profile %>%
ungroup() %>%
filter(!is.na(avg_7_day_growth) & n > 5 & avg_7_day_growth <= 1) %>%
select(avg_7_day_growth) %>%
skim()
## Skim summary statistics
##  n obs: 534
##  n variables: 1
##
## ── Variable type:numeric ────────────────────────────────────────────────────────────────────────────────────────────
##          variable missing complete   n  mean    sd      p0    p25    p50
##  avg_7_day_growth       0      534 534 0.021 0.052 -0.0018 0.0042 0.0089
##    p75 p100     hist
##  0.017  0.5 ▇▁▁▁▁▁▁▁

The median 7-day growth rate for all of these profiles is 0.89% and the average is 2.1%. Around a quarter of profiles grow at a rate of 0.42% per week or less, and 75% of profiles grow at a rate of 1.7% or less.

Let’s break this down by bucket.

# summarise distributions
growth_by_profile %>%
ungroup() %>%
filter(!is.na(avg_7_day_growth) & n > 5 & avg_7_day_growth <= 1) %>%
group_by(bucket) %>%
select(bucket, avg_7_day_growth) %>%
skim()
## Skim summary statistics
##  n obs: 534
##  n variables: 2
##  group variables: bucket
##
## ── Variable type:numeric ────────────────────────────────────────────────────────────────────────────────────────────
##            bucket         variable missing complete   n  mean    sd
##   (100000,200000] avg_7_day_growth       0      240 240 0.025 0.061
##   (200000,500000] avg_7_day_growth       0      181 181 0.019 0.05
##  (500000,1000000] avg_7_day_growth       0       53  53 0.014 0.031
##     (1000000,Inf] avg_7_day_growth       0       60  60 0.019 0.037
##        p0    p25    p50   p75 p100     hist
##  -0.0011  0.004  0.01   0.018 0.45 ▇▁▁▁▁▁▁▁
##  -0.0018  0.0047 0.0084 0.014 0.5  ▇▁▁▁▁▁▁▁
##  -0.0012  0.0031 0.0083 0.012 0.22 ▇▁▁▁▁▁▁▁
##   9.8e-05 0.0041 0.01   0.018 0.24 ▇▁▁▁▁▁▁▁

This gives us summary statistics for each profile bucket.

• For profiles with 100-200k followers, the median 7-day growth rate is 1% and the average is 2.5%.
• For profiles with 200-500k followers, the median 7-day growth rate is 0.84% and the average is 1.9%.
• For profiles with 500k-1m followers, the median 7-day growth rate is 0.83% and the average is 1.4%.
• For profiles with over 1m followers, the median 7-day growth rate is 1% and the average is 1.9%.

## 30-Day Growth Rates

Let’s take the same approach for looking at monthly growth rates.

The median 30-day growth rate for these profiles is 3.8%, and the average is 6.1%. Around a quarter of these profiles have an average 30-day growth rate of 1.7% or less, and three quarters have an average 30-day growth rate of 6.9% or less.

Let’s break this down by follower bucket again.

# summarise distributions
growth_by_profile %>%
ungroup() %>%
filter(!is.na(avg_30_day_growth) & n > 5 & avg_30_day_growth <= 1) %>%
group_by(bucket) %>%
select(bucket, avg_30_day_growth) %>%
skim()
## Skim summary statistics
##  n obs: 519
##  n variables: 2
##  group variables: bucket
##
## ── Variable type:numeric ────────────────────────────────────────────────────────────────────────────────────────────
##            bucket          variable missing complete   n  mean    sd
##   (100000,200000] avg_30_day_growth       0      233 233 0.07  0.11
##   (200000,500000] avg_30_day_growth       0      176 176 0.05  0.051
##  (500000,1000000] avg_30_day_growth       0       52  52 0.046 0.051
##     (1000000,Inf] avg_30_day_growth       0       58  58 0.068 0.094
##          p0   p25   p50   p75 p100     hist
##     -0.0045 0.016 0.042 0.079 0.73 ▇▂▁▁▁▁▁▁
##     -0.0073 0.019 0.034 0.062 0.31 ▇▆▁▁▁▁▁▁
##     -0.0052 0.013 0.036 0.049 0.22 ▇▇▂▁▁▁▁▁
##  8e-04      0.019 0.044 0.079 0.63 ▇▂▁▁▁▁▁▁

This gives us summary statistics for each profile bucket.

• For profiles with 100-200k followers, the median 30-day growth rate is 4.2% and the average is 7%.
• For profiles with 200-500k followers, the median 30-day growth rate is 3.4% and the average is 5%.
• For profiles with 500k-1m followers, the median 30-day growth rate is 3.6% and the average is 4.6%.
• For profiles with over 1m followers, the median 30-day growth rate is 4.4% and the average is 6.8%.