Introduction
I just watch movies if my favourite actors are on there. Even if it is not hit. I just love movie stuff. so I'm truly excited being on this project!. even though I've got to predict the revenue haha. at this project, we will explore the data(EDA) and train a model! let's get started. Kaggle.
Basically, the data is quite tidy, not many missing values. and I believe some data are not important to the target variable.
The columns are title, budget, language, cast, revenue, release date, runtime, and so forth..
EDA
We will look at the data based on the revenue column. So, the relationship between the revenue and other features would be like
-revenue VS budget, popularity, production_companies, genres, release_date, runtime, cast
First of all, let's see the correlation with the revenue.
I do not think there is highly correlated with the target variable except budget!.
Visualization
Top 11 companies by release-movies.
revenue VS Top 11 companies.
The red graph told us that the companies have released not many movies but earned pretty good money.
What happened to 2017!. Maybe it is not all counted.
Friday is the hottest day.!
Pretty busy on July, August, September but looks quite flat.
Normally, a movie's runtime is around 100 ~ 150 min.
I actually supposed two movies which are over 240 min for runtime as outliers, but it is true. Cleopatra (1963 film), Carlos.
Carlos is over 330 min.. insane..
Preprocessing
Missing value processing
genres, runtime,
spoken_languages, production_companies, production_countries, Keywords are filled by mode
tagline, crew, cast, overview are filled by 0
production_companies_count is filled by 1
I dropped 'belongs_to_collection', 'homepage','status' columns.
poster_path is filled by backfill(fillna)
Thank you for watching!
'Kaggle, Dacon, 공모전' 카테고리의 다른 글
# Project: EDA and visualization for good books (0) | 2020.12.16 |
---|---|
# Project : Melbourne Housing Market (0) | 2020.12.16 |
# Dacon: EDA and Visualization for Jeju_bus_arrival_information (0) | 2020.07.19 |
NLP_Real_or_not (0) | 2020.07.11 |
# Project: The comparison World_University_Rankings. (0) | 2020.06.30 |