Better Box Office Predictions

Business Objective

A major entertainment client asked eSage Group to refine a box office prediction model built by another vendor to improve overall stability of the model, reduce error rates and account for performance of both big and small budget movies. The existing model was also taking excessive time to run and needed to be moved to a more performant platform.


  • eSage Group evaluated the initial approach taken by the prior vendor and found that it was using only a fraction of available features for prediction. It also performed poorly in early week (5 weeks out or greater) scenarios, and it had no flexibility for small vs tentpole movies.
  • A feature selection analysis was undertaken using all available data – social media, YouTube trailer views, movie awareness and interest tracking, web traffic data, and more. eSage Group used front-line features as well as created and evaluated thousands of derived features that crossed weekly boundaries, produced flags of achievement (20% week over week growth in Twitter negative sentiment) and created ratios between metrics that uncovered new behavior.  Certain new features, such as metrics surrounding a movie release in the same genre the week before, became more important than features directly related to the target movie that week.
  • Weekly models were built in R and fed weekly data via Drake, a computational engine library in R, that used to handle transformations and aggregations (retiring IBM DataStage). This platform is responsible for aggregating raw data into weekly datasets, creating inputs for new predictions, and sending out warnings for missing data or metrics in violation of thresholds. It also gave the client the ability to change each week’s predictors via a configuration file if needed.
  • Finally, eSage Group stood up a Shiny application, which is an R-based website, so that the client could upload new movies and their attributes, set a new prediction date range, and view model training and prediction progress.


  • This tool was adopted as the central product for the client’s box office prediction efforts. Errors were significantly improved and more consistent throughout the 10 weeks leading up to the movie’s premiere.   
  • The charts below show that the new model delivered massive improvement on box office prediction errors for both tentpole and medium scale (e.g., “regular”) movie titles. 

Errors in Box Office Predictions


  • R model in production 
  • R Shiny Application 
  • Presentation of results to Client