In this analysis we’ll try to learn about the posting habits of our target customers. More specifically, we will be looking at their posting frequency to Instagram.
First we need to collect the profile names of our target customers. We’ll query the
profiles tables to get these. It should be noted that the post data is only a snapshot of what is in Analyze’s production databases. It it possible that we are missing some posts that occurred before the user signed up for Analyze or after the snapshot was taken.
select tc.user_id , p.service_id as profile_id , p.service_username as handle from dbt.target_customer_survey_responses tc inner join dbt.profiles p on tc.user_id = p.user_id where tc.is_target_customer and p.service = 'instagram' and p.service_username != '[REDACTED]' and p.service_username != '' and p.service_username is not null group by 1,2,3
This dataset includes 830 Instagram handles of our target customers. Now we need to collect their post data from BigQuery.
# connect to BigQuery con <- dbConnect( bigrquery::bigquery(), project = "buffer-data", dataset = "fivetran_dynamodb" ) # get profile ids service_ids <- build_sql(profiles$profile_id) # define sql query sql <- " select m.created_time as created_at , cast(m.user_id as string) as service_id , cast(m.ig_id as string) as id , m.media_type , 'post' as type from instagram_analytics_user_media_totals m where cast(m.user_id as string) in profile_id_list " # final query final_query <- gsub("profile_id_list", service_ids, sprintf(sql)) # query BQ posts <- dbGetQuery(con, final_query)
There are 2601 posts in this dataset. Now let’s get the Stories
# define sql query sql <- " select m.created_time as created_at , cast(m.user_id as string) as service_id , cast(m.ig_id as string) as id , m.media_type , 'story' as type from instagram_analytics_user_story_totals m where cast(m.user_id as string) in profile_id_list " # final query final_query <- gsub("profile_id_list", service_ids, sprintf(sql)) # query BQ stories <- dbGetQuery(con, final_query) # merge dataframes posts <- posts %>% rbind(stories)
Cool, there are 313 stories in this dataset.
Let’s see how many unique profiles have shared posts in our dataset.
posts %>% inner_join(profiles, by = c("service_id" = "profile_id")) %>% group_by(handle) %>% summarise(posts = n_distinct(id), first_post = min(created_at), last_post = max(created_at)) %>% arrange(desc(posts))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 69 x 4 ## handle posts first_post last_post ## <chr> <int> <chr> <chr> ## 1 taratemptations 789 2018-12-29T15:53:39+0000 2019-06-30T11:00:43+0000 ## 2 nipandfab 197 2019-05-27T13:18:27+0000 2019-06-30T08:52:51+0000 ## 3 licila.si 154 2019-05-27T13:01:25+0000 2019-06-30T13:03:36+0000 ## 4 homiepieins 124 2018-12-30T12:29:29+0000 2019-06-18T07:46:29+0000 ## 5 comfyfun 105 2018-12-30T12:27:54+0000 2019-06-18T07:43:07+0000 ## 6 rodialbeauty 101 2019-05-27T19:01:06+0000 2019-06-30T08:00:13+0000 ## 7 rodial_makeup 95 2019-06-03T12:57:10+0000 2019-06-30T11:38:01+0000 ## 8 kidzappuae 92 2019-05-27T15:26:55+0000 2019-06-30T09:01:44+0000 ## 9 bodybuildingcom 89 2019-05-27T13:54:41+0000 2019-06-30T23:35:44+0000 ## 10 body_merry 62 2019-05-28T17:57:43+0000 2019-06-27T23:00:44+0000 ## # … with 59 more rows
There are 69 profiles in the dataset, and we have posts until June 2019. This limited date range might make it difficult to determine the posting frequency of these users. To me this feels like an incomplete dataset. We may not be able to answer the question sufficiently.