Hello everyone, welcome back to another new blog where we will explore different ideas and concept one could perform while performing an EDA. In simple words, this blog is a simple walk-through of an average EDA process which might include (in top down order):
While walking through these major steps, one will try to answer different questions of analysis like how many times some categorical data has appeared, what is the distribution over a date, what is the performance over certain cases and so on.
!pip install autoviz
!pip install seaborn
!pip install plotly
!pip install cufflinks
!pip install pandas
If you do not have these libraries installed, please install them like below:
import autoviz
from autoviz.AutoViz_Class import AutoViz_Class
from pandas_profiling import ProfileReport
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import warnings
from plotly.offline import init_notebook_mode, iplot
import plotly.figure_factory as ff
import cufflinks
import plotly.io as pio
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='pearl')
pio.renderers.default = "notebook" # should change by looking into pio.renderers
pd.options.display.max_columns = None
%matplotlib inline
Alert! from autoviz version 0.1.35, after importing, you must %matplotlib inline to display charts in Jupyter Notebooks. AV = AutoViz_Class() AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=0, lowess=False, chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30, save_plot_dir=None) Note: verbose=0 or 1 generates charts and displays them in your local Jupyter notebook. verbose=2 does not display plots but saves them in AutoViz_Plots folder in local machine. Updated: chart_format='bokeh' generates and displays charts in your local Jupyter notebook. chart_format='server' generates and displays charts in the browser - one tab for each chart. chart_format='html' silently saves charts HTML format - they are also interactive!