The following analysis is an exploration of a dataset related to movie releases between the years 1986-2016. The aim of this project was to use data visualization to illustrate trends and gain insights about drivers of box-office revenue (measured here by domestic box- office gross), in order to provide recommendations to movie studios and production companies about what types of projects they should invest in to produce films that have a high box-office gross, or films that make the best use of their budget resources. The project uses the data visualization tools Tableau and R Studio.
THE DATA: The dataset examined in this analysis was downloaded from Kaggle. This particular set was obtained by scraping IMDb (Internet Movie Database) via a Python script, pulling the top 200 most popular (as measured by the IMDb popularity rating) movies in the year. IMDb is an online database which houses information about movies, television shows, cast members, crew, user ratings, and more.
The dataset contains: 6,820 rows of movies, spanning 30 years, from 1986 to 2016.
The dataset includes both discrete and continuous variables —
THE ANALYSIS: To begin, I first conducted some summary analysis on the data. As this project focuses on revenue drivers, I found it beneficial to review how different variables played into gross, and how top gross broke down by these variables. In this dataset, the variable “gross” refers to a film’s domestic box office. The graph below shows the top 24 films with the highest domestic box office gross from the years 1986 – 2016. Star Wars: The Force Awakens is the highest with a gross of $936,662,225.
Looking at this distribution of top grossing films, we can see that the general top gross hovers between $350 million and $500 million, with a few notable outliers, such as Star Wars: The Force Awakens and Avatar. Another notable theme is that many of the films are part of franchises. The following table is a breakdown of top grossing films separated by franchise status. Franchise status is defined as a film that is part of a series. Independent is defined as the first film in a franchise, or a film independent of prequels or sequels. The Force Awakens is a franchise film as it is the seventh film in the Star Wars series, while Avatar would be categorized as independent. As depicted in the table, 17 out of the top 24 grossing films are part of a franchise, while only 7 are independent.
Based on this breakdown, and the monetary success of franchise films, a recommendation would be for studios to invest in these properties as their tent-pole films. Because many of these films are not the first in a franchise, studios should keep in mind that the first film might not be the most profitable at the outset, but is worth the effort to invest in the following releases, as these will likely be the biggest generators of box-office revenue.
Another point of interest in the data is one that brings in trend information. The following is a bar plot of top grossing films by month. [See Tableau Visualization here]. The gross is calculated as an average to control for varying number of releases throughout the years. Between the years 1986 and 2016, June is the top grossing month for films, followed by December, then July, and May. This tracks with the general release schedule of studios, as the summer blockbuster season is when heavily anticipated, big-budget films are released, as well as the holiday season (November and December). While this release scheduling is already known, this plot gives additional insight as to which months in particular might be better for these types of releases. There is over a $10 million dollar impact difference between May and June. Studios should reserve their biggest budget releases for the summer movie season during this month.
Next, I plotted information on gross related to a film’s production company. This dataset very specifically labels this information, and looking at this variable can shed light on which studios might be the most profitable, or producing the most profitable hits [See the Tableau Visualization here]. Lucasfilm is the production company with the highest gross, this is the company that produced the Star Wars films. Marvel Studios follows next, which is the producer of films in the Marvel Cinematic Universe including: Avengers, Avengers: Age of Ultron, Iron Man 3, and Captain America: Civil War, 4 out of the top 24 highest grossing films. One interesting point to note is that Lucasfilm and Marvel Studios are both subsidiaries of the Walt Disney Company (Gringer). So while “Walt Disney” as a production company is not listed independently on the list, it encompasses Lucasfilm, Marvel Studios, Pixar Animation Studios, and Walt Disney Animated Studios, which are all within the top 25 production companies with the highest grossing films.
While Lucasfilm is the producer of top-grossing features, it is important to note that it is by no means the most prolific production company. “Number of Films by Production Company” depicts the number of films released by each production company. Universal Pictures is the leader, followed by Warner Bros. and Paramount Pictures. This bar plot is quite different than “Gross by Production Company,” and it illustrates that while a company might produce many films, this does not guarantee the highest domestic box-office returns.
For example, Color Force is the fifth highest producer of top grossing films, yet it does not even crack the top 23 companies by number of films produced. Meaning, production companies do not need to be the most prolific to be successful. The popularity of the properties is what ultimately drives success. The four Hunger Games films in the franchise were enough to solidify Color Force’s status as a top grossing production company. Studios should look to similar types of source material to make the most out of one property. As the Hunger Games were a popular book series before becoming a movie franchise, it would be beneficial to concentrate on the book market, specifically trending young adult books, as a mine for profitable ideas (Roback). [See the Tableau Visualization comparing the two here].
Digging Deeper: These summary visualizations can indicate trends and themes in the data, in particular how certain variables compare against gross. However, they do not necessarily offer statistically significant output of what variables have the most direct impact on a film’s domestic box-office gross. One way to analyze this is by running a regression analysis on the different variables. Running this regression in R Studio, the following are the result of the regression output. The regression was run using the variables of gross, budget, runtime, score, rating, and genre. Fig. 9 lists the coefficients in relation to the intercept, which was the baseline of movie gross or variable “gross_mm” that was created in R Studio.
The output from the regression highlights which variables are significant, or those variables with a p-value less than 0.05. For this dataset these include the intercept and: Budget, Score, Genre of Biography, Genre of Comedy, genre of Crime, Genre of Drama, and Genre of Horror. “Score” refers to average user rating of the film on IMDb. The following plot shows the regression coefficients in relation to one another, generated in R Studio. Those which are significant are indicated by the red dot. We can see that budget, score, comedy, and horror have positive coefficients, while biography, crime, and drama have negative coefficients, indicating they have a negative impact in relation to a film’s domestic box-office gross.
Taking a look at content by using the variable genre, the following plots are visualizations of how genre relates to gross. The following visualization in R plots the raw data of gross over year, with the variable genre included. Many of the outliers, in terms of gross, are films classified in the action genre.
“Top Grossing Genres” plots the genres with the films that achieved the highest gross [See the Tableau Visualization here]. Animation is in the lead, with Action and Adventure in the second and third positions, with a relatively similar gross. The average of gross was taken to account for the number of films produced in each genre. The next plot, “Top Grossing Films and Genre” is of the top grossing films, and the genres they are classified under. The majority are Action, with Animation capturing five of the top 24 spots, followed by Adventure with 3 spots.
Lastly, one other point of analysis I found interesting was looking at how a studio might best allocate their resources. Meaning, how might a studio invest in films that have a low budget, but return high profits? For this analysis I created a new variable in Tableau, Gross/Budget, and plotted this by production company. The first plot below shows these results, while the next plot shows the same results, with the log of Gross/Budget. In both cases, the top spots go to Solana Films, followed by Haxan Films, Brothers McMullen Productions, Plunge Pictures, and Can I Watch.
This last plot depicts the top Gross/Budget films by Production Company, with color indicating genre [See Tableau Visualization here]. The top two, from Solana and Haxan Films are Paranormal Activity, which had a budget of only $15,000 and a domestic gross of $10,7918,810, and The Blair Witch Project which had a budget of $60,000 and a domestic gross of $140539099. If smaller production companies or studios are looking to allocate their resources more judiciously, horror is the genre to choose. For larger studios who want to invest in more films but are risk averse and reluctant to spend on CGI or stars for every project, investing in horror films with low budgets would be a good place to start.
TAKEAWAYS: While the movie industry may be a creative industry with no hard and fast rules for what makes a box-office hit, there are certain factors that can influence success. Drawing from this data visualization and analysis, I would advise a production company or studio to focus on the following to criteria for more opportunities to create a top box-office success:
Invest in properties that can be turned into multiple films, creating franchises
Invest big budgets in films in the genre of Action, Adventure, and Animation
For limited resources, invest smaller budgets in the Horror genre to see a more significant return on investment
Some considerations for this analysis going forward would be to bring in data from 2017 and 2018, to see if the same trends hold up for these years. Another next step would be to bring in a global box office column, to see if there are any patterns for global hits that are different than those for a domestic hit. Additionally, it would be interesting to visualize a dataset of movies from another website that has scores (or user ratings) to see if the top grossing films are rated as highly as they are from this dataset using IMDb data.