TV Viewer Fan Clustering 

Business Objective

A major entertainment client wanted to leverage a new, very noisy dataset consisting of terabytes of second by second television viewing data to identify high value audiences and provide deeper insights into viewers’ needs and tastes to inform targeting marketing and TV show development efforts.


  • eSage Group distilled terabytes of viewing data into meaningful datasets using SQL and integrated new data sources into Snowflake to augment feature engineering. 
  • We leveraged extensive television domain knowledge to account for the nuances of television viewing behavior for data cleansing, feature engineering, modeling and cluster assessment. 
  • During the process, an Exploratory Data Analysis was conducted to assess the integrity of the data and understand feature relationships. Dimensionality reduction was also employed to focus on key features and improve processing speed. The team analyzed multiple models with Python and discovered feature interactions that significantly improved cluster separation, both thematically and quantitatively. 
  • Throughout the engagement, eSage Group worked closely with client stakeholders to understand changing priorities, incorporate feedback on the approach, and to name identified clusters to best support executive adoption of the model.


  • The effort gave strong direction on the value of the new dataset with detailed guidance on identifying the signal in the noise for television viewers. 
  • Executive presentation illustrated thematic and statistical separation between clusters to inform future marketing and development efforts. 
  • Results showed that specific feature interactions vastly improved cluster separation both in kmeans and Gaussian Mixture Models. This approach also improved training time significantly due to a more compact feature space with more meaningful features.
  • As this was the first modeling effort of television viewers on this dataset, the model was handed off to internal teams for final iteration and ultimate integration of code into targeted marketing efforts.


  • Exploratory Data Analysis 
  • Feature Documentation 
  • Cluster Profiles 
  • Model Assessment
  • Executive Presentation
  • Python & SQL Code handoff