How Often do OFCBs Post to Instagram

In this analysis we’ll try to learn about the posting habits of our target customers. More specifically, we will be looking at their posting frequency to Instagram.

Data Collection

First we need to collect the profile names of our target customers. We’ll query the target_customer_survey_responses and profiles tables to get these. It should be noted that the post data is only a snapshot of what is in Analyze’s production databases. It it possible that we are missing some posts that occurred before the user signed up for Analyze or after the snapshot was taken.

select 
  tc.user_id
  , p.service_id as profile_id
  , p.service_username as handle
from dbt.target_customer_survey_responses tc
inner join dbt.profiles p
  on tc.user_id = p.user_id
where tc.is_target_customer
and p.service = 'instagram'
and p.service_username != '[REDACTED]'
and p.service_username != ''
and p.service_username is not null
group by 1,2,3

This dataset includes 830 Instagram handles of our target customers. Now we need to collect their post data from BigQuery.

# connect to BigQuery
con <- dbConnect(
  bigrquery::bigquery(),
  project = "buffer-data",
  dataset = "fivetran_dynamodb"
)

# get profile ids
service_ids <- build_sql(profiles$profile_id)

# define sql query
sql <- "
  select 
    m.created_time as created_at
    , cast(m.user_id as string) as service_id
    , cast(m.ig_id as string) as id
    , m.media_type
    , 'post' as type
  from instagram_analytics_user_media_totals m
  where cast(m.user_id as string) in profile_id_list
"

# final query
final_query <- gsub("profile_id_list", service_ids, sprintf(sql))
  
# query BQ
posts <- dbGetQuery(con, final_query)

There are 2601 posts in this dataset. Now let’s get the Stories

# define sql query
sql <- "
  select 
    m.created_time as created_at
    , cast(m.user_id as string) as service_id
    , cast(m.ig_id as string) as id
    , m.media_type
    , 'story' as type
  from instagram_analytics_user_story_totals m
  where cast(m.user_id as string) in profile_id_list
"

# final query
final_query <- gsub("profile_id_list", service_ids, sprintf(sql))
  
# query BQ
stories <- dbGetQuery(con, final_query)

# merge dataframes
posts <- posts %>% 
  rbind(stories)

Cool, there are 313 stories in this dataset.

Exploratory Analysis

Let’s see how many unique profiles have shared posts in our dataset.

posts %>% 
  inner_join(profiles, by = c("service_id" = "profile_id")) %>% 
  group_by(handle) %>% 
  summarise(posts = n_distinct(id),
            first_post = min(created_at),
            last_post = max(created_at)) %>% 
  arrange(desc(posts))

## `summarise()` ungrouping output (override with `.groups` argument)

## # A tibble: 69 x 4
##    handle          posts first_post               last_post               
##    <chr>           <int> <chr>                    <chr>                   
##  1 taratemptations   789 2018-12-29T15:53:39+0000 2019-06-30T11:00:43+0000
##  2 nipandfab         197 2019-05-27T13:18:27+0000 2019-06-30T08:52:51+0000
##  3 licila.si         154 2019-05-27T13:01:25+0000 2019-06-30T13:03:36+0000
##  4 homiepieins       124 2018-12-30T12:29:29+0000 2019-06-18T07:46:29+0000
##  5 comfyfun          105 2018-12-30T12:27:54+0000 2019-06-18T07:43:07+0000
##  6 rodialbeauty      101 2019-05-27T19:01:06+0000 2019-06-30T08:00:13+0000
##  7 rodial_makeup      95 2019-06-03T12:57:10+0000 2019-06-30T11:38:01+0000
##  8 kidzappuae         92 2019-05-27T15:26:55+0000 2019-06-30T09:01:44+0000
##  9 bodybuildingcom    89 2019-05-27T13:54:41+0000 2019-06-30T23:35:44+0000
## 10 body_merry         62 2019-05-28T17:57:43+0000 2019-06-27T23:00:44+0000
## # … with 59 more rows

There are 69 profiles in the dataset, and we have posts until June 2019. This limited date range might make it difficult to determine the posting frequency of these users. To me this feels like an incomplete dataset. We may not be able to answer the question sufficiently.