This blog is a part of our series Python for Stock Market Analysis.
Disclaimer: This blog is for educational purpose only and we do not recommend taking the knowledge gained from this blog to implement in real financial exercises.
This blog tries to implement preliminary metrics that are used in the stock market analysis. The dataset we will be using is available via yahoofinance
.
Please install:
pip install yfinance
for downloading data of stock's history.pip install pandas
for data analysis.pip install plotly
for interactive visualizations.pip install cufflinks
for using interactive plots in pandas DataFrame.You might need to install pip install -U kaleido
if you need to save plots as png image.
If you are new into plotly, then we have an awesome blog about it where we have done plots based on COVID 19 dataset.
!pip install yfinance
Requirement already satisfied: yfinance in c:\programdata\anaconda3\lib\site-packages (0.1.63) Requirement already satisfied: numpy>=1.15 in c:\users\dell\appdata\roaming\python\python38\site-packages (from yfinance) (1.19.5) Requirement already satisfied: requests>=2.20 in c:\users\dell\appdata\roaming\python\python38\site-packages (from yfinance) (2.26.0) Requirement already satisfied: multitasking>=0.0.7 in c:\programdata\anaconda3\lib\site-packages (from yfinance) (0.0.9) Requirement already satisfied: pandas>=0.24 in c:\programdata\anaconda3\lib\site-packages (from yfinance) (1.2.4) Requirement already satisfied: lxml>=4.5.1 in c:\programdata\anaconda3\lib\site-packages (from yfinance) (4.6.3) Requirement already satisfied: pytz>=2017.3 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.24->yfinance) (2021.1) Requirement already satisfied: python-dateutil>=2.7.3 in c:\programdata\anaconda3\lib\site-packages (from pandas>=0.24->yfinance) (2.8.1) Requirement already satisfied: six>=1.5 in c:\programdata\anaconda3\lib\site-packages (from python-dateutil>=2.7.3->pandas>=0.24->yfinance) (1.15.0) Requirement already satisfied: charset-normalizer~=2.0.0 in c:\users\dell\appdata\roaming\python\python38\site-packages (from requests>=2.20->yfinance) (2.0.7) Requirement already satisfied: urllib3<1.27,>=1.21.1 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.20->yfinance) (1.26.4) Requirement already satisfied: certifi>=2017.4.17 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.20->yfinance) (2020.12.5) Requirement already satisfied: idna<4,>=2.5 in c:\programdata\anaconda3\lib\site-packages (from requests>=2.20->yfinance) (2.10)
import pandas as pd
import plotly.express as px
import cufflinks
import plotly.io as pio
import yfinance as yf
cufflinks.go_offline()
cufflinks.set_config_file(world_readable=True, theme='pearl')
pio.renderers.default = "notebook" # should change by looking into pio.renderers
pd.options.display.max_columns = None
By default, we are allowed to download data from 1900-01-01
symbols = ["AAPL"]
df = yf.download(tickers=symbols)
df.head()
[*********************100%***********************] 1 of 1 completed
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
1980-12-12 | 0.128348 | 0.128906 | 0.128348 | 0.128348 | 0.100326 | 469033600 |
1980-12-15 | 0.122210 | 0.122210 | 0.121652 | 0.121652 | 0.095092 | 175884800 |
1980-12-16 | 0.113281 | 0.113281 | 0.112723 | 0.112723 | 0.088112 | 105728000 |
1980-12-17 | 0.115513 | 0.116071 | 0.115513 | 0.115513 | 0.090293 | 86441600 |
1980-12-18 | 0.118862 | 0.119420 | 0.118862 | 0.118862 | 0.092911 | 73449600 |
It seems that data is only available from 1980-12-12. The column names in the above fields are:
EDA or Exploratory Data Analysis is the first step in any Data Analysis and lets do that in our Stock Data too. We have blogs about doing EDA, Statistical and Inferential Analysis please check them out for more about EDAs.
# convert column names into lowercase
df.columns = [c.lower() for c in df.columns]
df.rename(columns={"adj close":"adj_close"},inplace=True)
ndf = pd.DataFrame()
for c in df.columns:
nc = df[c].isna().sum()
tr = len(df[c])
rate = nc/tr
ndf = ndf.append({"col_name":c,"total_rows": tr,
"null_rows": nc,
"rate": rate},ignore_index=True)
ndf
col_name | null_rows | rate | total_rows | |
---|---|---|---|---|
0 | open | 0.0 | 0.0 | 10390.0 |
1 | high | 0.0 | 0.0 | 10390.0 |
2 | low | 0.0 | 0.0 | 10390.0 |
3 | close | 0.0 | 0.0 | 10390.0 |
4 | adj_close | 0.0 | 0.0 | 10390.0 |
5 | volume | 0.0 | 0.0 | 10390.0 |
It seems that we do not have any null rows present on the data.
It gives us the frequency of value's some range. It is simply a histogram.
fig = df.iplot(kind="hist",subplots=True, title="Distribution of All Variables", asFigure=True)
fig.write_image("stock_analysis/dist.png")
fig.show()