In this analysis we’ll try to learn about the posting habits of our target customers. More specifically, we will be looking at their posting frequency to Instagram.
Data Collection
First we need to collect the profile names of our target customers. We’ll query the target_customer_survey_responses
and profiles
tables to get these. It should be noted that the post data is only a snapshot of what is in Analyze’s production databases. It it possible that we are missing some posts that occurred before the user signed up for Analyze or after the snapshot was taken.
select
tc.user_id
, p.service_id as profile_id
, p.service_username as handle
from dbt.target_customer_survey_responses tc
inner join dbt.profiles p
on tc.user_id = p.user_id
where tc.is_target_customer
and p.service = 'instagram'
and p.service_username != '[REDACTED]'
and p.service_username != ''
and p.service_username is not null
group by 1,2,3
This dataset includes 830 Instagram handles of our target customers. Now we need to collect their post data from BigQuery.
# connect to BigQuery
con <- dbConnect(
bigrquery::bigquery(),
project = "buffer-data",
dataset = "fivetran_dynamodb"
)
# get profile ids
service_ids <- build_sql(profiles$profile_id)
# define sql query
sql <- "
select
m.created_time as created_at
, cast(m.user_id as string) as service_id
, cast(m.ig_id as string) as id
, m.media_type
, 'post' as type
from instagram_analytics_user_media_totals m
where cast(m.user_id as string) in profile_id_list
"
# final query
final_query <- gsub("profile_id_list", service_ids, sprintf(sql))
# query BQ
posts <- dbGetQuery(con, final_query)
There are 2601 posts in this dataset. Now let’s get the Stories
# define sql query
sql <- "
select
m.created_time as created_at
, cast(m.user_id as string) as service_id
, cast(m.ig_id as string) as id
, m.media_type
, 'story' as type
from instagram_analytics_user_story_totals m
where cast(m.user_id as string) in profile_id_list
"
# final query
final_query <- gsub("profile_id_list", service_ids, sprintf(sql))
# query BQ
stories <- dbGetQuery(con, final_query)
# merge dataframes
posts <- posts %>%
rbind(stories)
Cool, there are 313 stories in this dataset.
Exploratory Analysis
Let’s see how many unique profiles have shared posts in our dataset.
posts %>%
inner_join(profiles, by = c("service_id" = "profile_id")) %>%
group_by(handle) %>%
summarise(posts = n_distinct(id),
first_post = min(created_at),
last_post = max(created_at)) %>%
arrange(desc(posts))
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 69 x 4
## handle posts first_post last_post
## <chr> <int> <chr> <chr>
## 1 taratemptations 789 2018-12-29T15:53:39+0000 2019-06-30T11:00:43+0000
## 2 nipandfab 197 2019-05-27T13:18:27+0000 2019-06-30T08:52:51+0000
## 3 licila.si 154 2019-05-27T13:01:25+0000 2019-06-30T13:03:36+0000
## 4 homiepieins 124 2018-12-30T12:29:29+0000 2019-06-18T07:46:29+0000
## 5 comfyfun 105 2018-12-30T12:27:54+0000 2019-06-18T07:43:07+0000
## 6 rodialbeauty 101 2019-05-27T19:01:06+0000 2019-06-30T08:00:13+0000
## 7 rodial_makeup 95 2019-06-03T12:57:10+0000 2019-06-30T11:38:01+0000
## 8 kidzappuae 92 2019-05-27T15:26:55+0000 2019-06-30T09:01:44+0000
## 9 bodybuildingcom 89 2019-05-27T13:54:41+0000 2019-06-30T23:35:44+0000
## 10 body_merry 62 2019-05-28T17:57:43+0000 2019-06-27T23:00:44+0000
## # … with 59 more rows
There are 69 profiles in the dataset, and we have posts until June 2019. This limited date range might make it difficult to determine the posting frequency of these users. To me this feels like an incomplete dataset. We may not be able to answer the question sufficiently.