Hello everyone, welcome back to another new blog where we will explore different ideas and concept one could perform while performing an EDA. In simple words, this blog is a simple walk-through of an average EDA process which might include (in top down order):
While walking through these major steps, one will try to answer different questions of analysis like how many times some categorical data has appeared, what is the distribution over a date, what is the performance over certain cases and so on.
!pip install autoviz !pip install seaborn !pip install plotly !pip install cufflinks !pip install pandas
If you do not have these libraries installed, please install them like below:
import autoviz from autoviz.AutoViz_Class import AutoViz_Class from pandas_profiling import ProfileReport import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import numpy as np import warnings from plotly.offline import init_notebook_mode, iplot import plotly.figure_factory as ff import cufflinks import plotly.io as pio cufflinks.go_offline() cufflinks.set_config_file(world_readable=True, theme='pearl') pio.renderers.default = "notebook" # should change by looking into pio.renderers pd.options.display.max_columns = None %matplotlib inline
Alert! from autoviz version 0.1.35, after importing, you must %matplotlib inline to display charts in Jupyter Notebooks. AV = AutoViz_Class() AV.AutoViz(filename, sep=',', depVar='', dfte=None, header=0, verbose=0, lowess=False, chart_format='svg',max_rows_analyzed=150000,max_cols_analyzed=30, save_plot_dir=None) Note: verbose=0 or 1 generates charts and displays them in your local Jupyter notebook. verbose=2 does not display plots but saves them in AutoViz_Plots folder in local machine. Updated: chart_format='bokeh' generates and displays charts in your local Jupyter notebook. chart_format='server' generates and displays charts in the browser - one tab for each chart. chart_format='html' silently saves charts HTML format - they are also interactive!