Python for Stock Market Analysis: Alpaca API

8 minute read

Introduction

Alpaca Trading API is an API using which we can retrieve stock data in realtime. It provides various APIs and even streaming services. Please read about it in their docs. What is most exciting about this API and the librariy is that it returns the data as a Pandas Dataframe or even simple Dict object. And here we will focus on how to do real time stock market data retrieval.

Getting API Keys

To access API, we need to have an account. So lets signup in https://app.alpaca.markets/signup. Then fill up the form and go to the home page.

png

In order to make first ever API request, we need to have API keys and secret. So go to papertrading and signup for keys. It requires real identity card and live camera at some point.
Save your API Keys and Secret somewhere safe.

png

Making First Request

Install the library first using pip3 install alpaca-trade-api. Or your own package manager in Python.

!pip3 install alpaca-trade-api

import pandas as pd

key="PK5OLX1GAB5TYELHCV1J"
secret="LFdvHRFf93NydXF2kDAskTKDjHtBPyBHnszciU4G"

Please insert your generated keys in above variable.

import alpaca_trade_api as tradeapi

# to which url we will send request
base_url="https://paper-api.alpaca.markets"

# instantiate REST API
api = tradeapi.REST(key, secret, base_url, api_version='v2')

# obtain account information
account = api.get_account()
print(account)

Account({   'account_blocked': False,
    'account_number': 'PA3BQI9QBTE3',
    'accrued_fees': '0',
    'buying_power': '200000',
    'cash': '100000',
    'created_at': '2022-04-30T14:29:12.722985Z',
    'crypto_status': 'ACTIVE',
    'currency': 'USD',
    'daytrade_count': 0,
    'daytrading_buying_power': '0',
    'equity': '100000',
    'id': 'ddeb8bc4-87d3-4ade-8b4a-8a40c7207fed',
    'initial_margin': '0',
    'last_equity': '100000',
    'last_maintenance_margin': '0',
    'long_market_value': '0',
    'maintenance_margin': '0',
    'multiplier': '2',
    'non_marginable_buying_power': '100000',
    'pattern_day_trader': False,
    'pending_transfer_in': '0',
    'portfolio_value': '100000',
    'regt_buying_power': '200000',
    'short_market_value': '0',
    'shorting_enabled': True,
    'sma': '0',
    'status': 'ACTIVE',
    'trade_suspended_by_user': False,
    'trading_blocked': False,
    'transfers_blocked': False})

It shows that the fresh new account that I just created.

Getting List of Symbols

We can get the list of assets like below.

active_assets=api.list_assets(status='active')
for asset in active_assets:
  print(asset)
  break

Asset({   'class': 'us_equity',
    'easy_to_borrow': False,
    'exchange': 'OTC',
    'fractionable': False,
    'id': 'f377f9ef-4b3b-425d-890d-dc7698edc623',
    'marginable': False,
    'name': 'GWG HLDGS INC Common Stock',
    'shortable': False,
    'status': 'active',
    'symbol': 'GWGHQ',
    'tradable': False})

Rest APIs

Alpaca provides various REST APIs that handles the most common requests like getting trade, quote, floorsheet.

Get Floorsheet Data

Now is the time to make first request to get data. So lets define a list of stock symbols that we will be focusing on. We will focus on Tesla and Apple stocks. And we want to see floorsheet value of each minute.

In terms of Alpaca, its bars.

All At one time

We can get all data at one time or iterate over it. Getting all data is little bit slow and might require extra memory.

symbols = ["TSLA", "AAPL"]
resp = api.get_bars(symbols,timeframe='1Min', start="2022-04-30",limit=10000)
resp.df.reset_index()

	index

Today is Friday and hence there will not be any data at the moment. Thus, we will need to change start date to yesterday.

symbols = ["TSLA", "AAPL"]
resp = api.get_bars(symbols,timeframe='1Min', start="2022-04-29",limit=None)
resp.df.reset_index()

	timestamp	open	high	low	close	volume	trade_count	vwap	symbol
0	2022-04-29 08:00:00+00:00	161.00	161.50	160.36	160.36	6107	116	160.736835	AAPL
1	2022-04-29 08:01:00+00:00	160.74	160.84	160.70	160.77	3929	87	160.773235	AAPL
2	2022-04-29 08:02:00+00:00	160.84	160.84	160.77	160.77	2929	69	160.803947	AAPL
3	2022-04-29 08:03:00+00:00	160.75	160.77	160.70	160.77	3335	71	160.734747	AAPL
4	2022-04-29 08:04:00+00:00	160.78	160.98	160.77	160.98	1496	42	160.826170	AAPL
...	...	...	...	...	...	...	...	...	...
1726	2022-04-29 23:55:00+00:00	876.68	876.68	876.68	876.68	432	36	876.634097	TSLA
1727	2022-04-29 23:56:00+00:00	876.64	877.00	876.64	877.00	3178	65	876.884078	TSLA
1728	2022-04-29 23:57:00+00:00	877.00	877.00	876.80	877.00	1599	87	877.110913	TSLA
1729	2022-04-29 23:58:00+00:00	878.00	878.02	877.58	877.58	1431	100	877.933099	TSLA
1730	2022-04-29 23:59:00+00:00	877.88	877.88	877.50	877.50	1299	90	877.808984	TSLA

1731 rows × 9 columns

Now looking over the dataframe above, we can see that there are 1731 rows with floorsheet value of each minute for each stock.

Custom Timeframe

We can even get floorsheet data of every 45, 10, 30 minute or even more customized using TimeFrame and TimeFrameUnit given by its package.

from alpaca_trade_api.rest import TimeFrame, TimeFrameUnit

df = api.get_bars(symbols, TimeFrame(13, TimeFrameUnit.Minute), "2022-04-29").df.reset_index()
df

	timestamp	open	high	low	close	volume	trade_count	vwap	symbol
0	2022-04-29 07:57:00+00:00	161.00	161.50	160.36	160.62	46766	847	160.805158	AAPL
1	2022-04-29 08:10:00+00:00	160.63	160.78	160.52	160.56	53331	755	160.655755	AAPL
2	2022-04-29 08:23:00+00:00	160.56	160.63	159.63	160.00	72334	1059	160.035541	AAPL
3	2022-04-29 08:36:00+00:00	159.92	160.00	159.85	160.00	15276	349	159.939272	AAPL
4	2022-04-29 08:49:00+00:00	159.96	160.00	159.74	159.80	14169	347	159.896077	AAPL
...	...	...	...	...	...	...	...	...	...
145	2022-04-29 23:07:00+00:00	877.26	877.71	876.09	876.24	6727	335	877.222977	TSLA
146	2022-04-29 23:20:00+00:00	875.20	875.60	875.01	875.60	2143	187	875.371510	TSLA
147	2022-04-29 23:33:00+00:00	875.57	876.50	875.57	876.50	9723	443	876.288069	TSLA
148	2022-04-29 23:46:00+00:00	876.50	878.02	876.49	877.58	11632	507	876.892889	TSLA
149	2022-04-29 23:59:00+00:00	877.88	877.88	877.50	877.50	1299	90	877.808984	TSLA

150 rows × 9 columns

In above example, we are looking for floorsheet of every 13 minutes. We can even use Hour there.

Iterating Over

Instead of getting entire data, it might be a good idea to loop through if we have many rows. Thus we can get a iterable and iterate over it like below.

fls = api.get_bars_iter(symbols, TimeFrame(10, TimeFrameUnit.Hour), "2022-04-29")
for fs in fls:
  print(fs)

Bar({   'S': 'AAPL',
    'c': 160.01,
    'h': 166.2,
    'l': 158.97,
    'n': 699101,
    'o': 161,
    't': '2022-04-29T08:00:00Z',
    'v': 75360694,
    'vw': 162.450706})
Bar({   'S': 'AAPL',
    'c': 157.93,
    'h': 160.39,
    'l': 157.25,
    'n': 323819,
    'o': 160,
    't': '2022-04-29T18:00:00Z',
    'v': 56041611,
    'vw': 158.373787})
Bar({   'S': 'TSLA',
    'c': 891.86,
    'h': 934.3999,
    'l': 884.46,
    'n': 651589,
    'o': 899,
    't': '2022-04-29T08:00:00Z',
    'v': 20826462,
    'vw': 910.364386})
Bar({   'S': 'TSLA',
    'c': 877.5,
    'h': 892.81,
    'l': 868,
    'n': 222398,
    'o': 891.88,
    't': '2022-04-29T18:00:00Z',
    'v': 8468767,
    'vw': 878.750304})

Getting Quotes

Quotes are the biddings that has been done.

df = api.get_quotes(symbols, start="2022-04-29",limit=1000).df.reset_index()
df

	timestamp	ask_exchange	ask_price	ask_size	bid_exchange	bid_price	bid_size	conditions	tape	symbol
0	2022-04-29 08:00:00.001169445+00:00		0.00	0	Q	160.66	4	[Y]	C	AAPL
1	2022-04-29 08:00:00.001241377+00:00	Q	160.93	4	Q	160.66	4	[R]	C	AAPL
2	2022-04-29 08:00:00.005655+00:00	Q	160.93	4	K	161.50	1	[R]	C	AAPL
3	2022-04-29 08:00:00.005655+00:00	Q	160.93	4	Q	160.66	4	[R]	C	AAPL
4	2022-04-29 08:00:00.005655+00:00	Q	160.93	4	K	161.00	1	[R]	C	AAPL
...	...	...	...	...	...	...	...	...	...	...
995	2022-04-29 08:22:05.119679488+00:00	P	160.55	2	Q	160.52	1	[R]	C	AAPL
996	2022-04-29 08:22:05.119927355+00:00	Q	160.59	6	Q	160.55	1	[R]	C	AAPL
997	2022-04-29 08:22:05.119937180+00:00	Q	160.59	6	P	160.52	1	[R]	C	AAPL
998	2022-04-29 08:22:05.120335360+00:00	P	160.55	1	Q	160.52	1	[R]	C	AAPL
999	2022-04-29 08:22:05.120487168+00:00	P	160.55	3	Q	160.52	1	[R]	C	AAPL

1000 rows × 10 columns

There are various fileds and there is a well described documentation in docs.

Getting Trade

To view the trades that has actually happened.

df = api.get_trades(symbols, start="2022-04-29",limit=1000).df.reset_index()
df

	timestamp	exchange	price	size	conditions	id	tape	symbol
0	2022-04-29 04:00:13.553000+00:00	D	160.09	1	[@, T, I]	927	C	AAPL
1	2022-04-29 04:00:43.779000+00:00	D	160.09	220	[@, T]	871	C	AAPL
2	2022-04-29 04:02:52.730000+00:00	D	160.10	1	[@, T, I]	912	C	AAPL
3	2022-04-29 04:02:52.730000+00:00	D	160.10	4	[@, T, I]	940	C	AAPL
4	2022-04-29 04:06:04.276000+00:00	D	160.10	20	[@, T, I]	941	C	AAPL
...	...	...	...	...	...	...	...	...
995	2022-04-29 08:09:30.317838547+00:00	Q	160.63	1	[@, F, T, I]	430	C	AAPL
996	2022-04-29 08:09:30.617365760+00:00	P	160.61	137	[@, F, T]	258	C	AAPL
997	2022-04-29 08:09:30.617365760+00:00	P	160.60	74	[@, F, T, I]	259	C	AAPL
998	2022-04-29 08:09:30.756697344+00:00	P	160.60	2	[@, T, I]	260	C	AAPL
999	2022-04-29 08:09:32.422441+00:00	K	160.60	10	[@, F, T, I]	129	C	AAPL

1000 rows × 8 columns

Alpaca provides other Rest APIs too which can be seen in their module https://github.com/alpacahq/alpaca-trade-api-python/blob/master/alpaca_trade_api/rest.py. Most of the REST APIs have iterable functions too like get_bars_iter, get_quote_iter.

Streaming Data

Streaming Data is done to get realtime data and Alpaca provides websockets to do that. To do so, we will use async keyword in front of the function that will be awaitable. Below code is taken from the GitHub repo of Alpaca package.

from alpaca_trade_api.stream import Stream
from alpaca_trade_api.common import URL

async def trade_callback(t):
    print('trade', t)


async def quote_callback(q):
    print('quote', q)


# Initiate Class Instance
stream = Stream(key,
                secret,
                base_url=URL('https://paper-api.alpaca.markets'),
                data_feed='iex') 

# subscribing to event
stream.subscribe_trades(trade_callback, 'AAPL')
stream.subscribe_quotes(quote_callback, 'IBM')

stream.run()

There are some other streamming APIs also like subscribe_bars which can be seen inside a module stream.py.

How to stream bars data?

By default there is not a possibility to stream a bars data but one can do it by using asyncio in Python.

This section will be updated soon. Sorry.

async def get_bars(symbols, start_time,timeframe="1Min"):
    tm = int(timeframe.split("Min")[0])
    end_time=None
    sym=None
    dump_table=f"bars_{timeframe.split('Min')[0]}m"
    try:
        while True:
            print(f"Start Time: {start_time} End Time {end_time} Timeframe: {timeframe}")
            resp = api.get_bars_iter(symbols,start=start_time,end=end_time, timeframe=timeframe)
            
            for r in resp:
                print(r)
                sym=r.symbol

            sym=symbol
            if start_time==timestamp.isoformat() and sym==symbol:
                logger.info(f"Start Time == End Time: {start_time} == {timestamp}. Sleeping for {tm} minutes")
                await asyncio.sleep(tm*60)
            start_time=timestamp.isoformat()
            end_time=timestamp+timedelta(days=2)
            end_time=end_time.isoformat()

            print("-----------------------------------------------------")
            print("Setting start time to: ", timestamp)
            
            clock = api.get_clock()
            next_open = clock.next_open
            curr_day = fdate.strftime("%A")            

            if clock.is_open==False:
                sleep_seconds = (clock.next_open-clock.timestamp).total_seconds()
                sleep_hours = sleep_seconds/(3600)
                logger.info(f"Market is closed. Market Opens on {clock.next_open}. Sleeping for {sleep_hours} hrs.")
                await asyncio.sleep(sleep_seconds)
            else:
                logger.info(f"No new data available. Sleeping for {tm} Minutes")
                await asyncio.sleep(tm*60)              
            
    
    except Exception as e:
            print(f"Error {e} in {timeframe}, {start_time}")

async def stream_mbars():
    rdate=date.today()-timedelta(days=2)
    allow_symbols = ["TSLA", "AAPL"]
    
    await asyncio.gather(
                        get_bars(allow_symbols,rdate,timeframe="1Min"),
                        get_bars(allow_symbols,rdate, timeframe="2Min"),
                        get_bars(allow_symbols,rdate, timeframe="3Min"),
                        get_bars(allow_symbols,rdate, timeframe="5Min"),
                        
                        )

def main_stream():
    asyncio.run(stream_mbars())
    loop = asyncio.get_event_loop()
    loop.run_until_complete(stream_mbars())

In above code, there are 3 functions and we will call the main_stream() function. It runs the stream_mbars event and it also runs other events like get_bars. It is little bit messy but lets explain it little:

The program never ends as there is while True is in the get_bars function.
However, get_bars is allowed to sleep when market is close, or no new data is available. Which means that if we are going to get 5m bars, then running it for each minute is not a good idea instead we need to run it every 5 minutes. So it will sleep for 5mins.
The data is retrieved from the iterator because it is much safer as it takes less memory.
After an iterator completes, we set the end time to two days from last timestamp. Its sure that there will not be any data for 2 days from last timestamp. But still.

In my original production side code, I have stored all these data into table and later used it to calculate indicators and based on the indicators, I make alerts and for the alerts, I have number of rules. If the alert is True in any time then I get mail of it. Phew…..

Twitter Facebook LinkedIn

Quassarian Viper