Polygon.io for Stock Market Data

4 minute read


Hello everyone, welcome back to our new blog about getting Stock data in realtime using Polygon.io. Few blogs ago, I’ve shared how can we use Alpaca API to stream Stock data. But in this blog, we will use Polygon.io and choosing Polyon over Alpaca has its own pros and cons.

  • Alpaca is little bit better than Polygon in terms of the documentation in GitHub and the APIs. Which can be seen in Alpaca Trade API and Polygon IO Client Python.
  • Alpaca could give data in dataframe as well as JSON format but Polygon gives only in JSON. However we could make dataframe from JSON as well.
  • The Updated candles in Alpaca were arriving little slower than Polygon. It was found that at least 3sec is taken from the Polygon to send corrected bars whereas Alpaca was taking more than 30secs.

So, to choose between Alpaca and Polygon, one should focus if the 30 seconds delay in corrected data is acceptable or not. If it is not then Polygon is best choice else Alpaca wins the race for me as it provides some great modules like get_clock().

Getting User and Key: Polygon.io

Its easy to get API Key. Just sign up for free version to see the dashboard and then the key will be there somewhere.


Installing Polygon API Client

I am using version 0.2.11 because this version was working as my requirements was for the project.

!pip install polygon-api-client==0.2.11
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting polygon-api-client==0.2.11
  Downloading polygon_api_client-0.2.11-py3-none-any.whl (22 kB)
Collecting websocket-client>=0.56.0
  Downloading websocket_client-1.3.3-py3-none-any.whl (54 kB)
     |████████████████████████████████| 54 kB 2.2 MB/s 
[?25hRequirement already satisfied: requests>=2.22.0 in /usr/local/lib/python3.7/dist-packages (from polygon-api-client==0.2.11) (2.23.0)
Collecting websockets>=8.0.2
  Downloading websockets-10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (112 kB)
     |████████████████████████████████| 112 kB 12.1 MB/s 
[?25hRequirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->polygon-api-client==0.2.11) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->polygon-api-client==0.2.11) (2.10)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->polygon-api-client==0.2.11) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests>=2.22.0->polygon-api-client==0.2.11) (2022.6.15)
Installing collected packages: websockets, websocket-client, polygon-api-client
Successfully installed polygon-api-client-0.2.11 websocket-client-1.3.3 websockets-10.3

Rest API to Get Aggregate Data

Aggregate Data means the data that is aggregated with time. There could be OHLC in every minute and in any time frame. Open is always a Opening price of first candle in the timeframe, close is always a closing price of last candle in a timeframe and high is high price among all candles in timeframe and low is low price among all candles in timeframe.

from polygon import RESTClient
import pandas as pd

client = RESTClient(key)

Polygon uses timestamp to mention datetime FROM and TO. So we need to get timestamp first. And its also worth mentioning that the timestamp should be using milliseconds. We got the timestamp below but its not upto milliseconds so we added 3 0s in the end of bothn while calling api.

int(pd.to_datetime("2022-06-10 01:22").timestamp()),int(pd.to_datetime("2022-06-21 06:22").timestamp())
(1654824120, 1655792520)

The data we are looking for is 1minute candle and it is available using stocks_equities_aggregates in this version. More about this function can be found in documentation here.

res=client.stocks_equities_aggregates(ticker='AAPL', multiplier=1, 
                                      timespan="minute", from_="1654824120000", 

The result will be in JSON but we can use pandas to make it dataframe.


df = pd.DataFrame(res.results)

v vw o c h l t n
0 2292.0 142.9749 143.03 142.99 143.03 142.90 1654848000000 64
1 817.0 143.0116 143.02 143.03 143.03 143.02 1654848060000 53
2 513.0 143.0704 143.10 143.10 143.10 143.10 1654848120000 34
3 940.0 143.2342 143.23 143.25 143.25 143.23 1654848240000 30
4 1802.0 143.1876 143.20 143.15 143.20 143.15 1654848300000 58
... ... ... ... ... ... ... ... ...
5033 536.0 131.5226 131.52 131.52 131.52 131.52 1655509860000 18
5034 706.0 131.5344 131.52 131.55 131.55 131.52 1655509920000 11
5035 2014.0 131.5570 131.55 131.57 131.57 131.55 1655510040000 27
5036 647.0 131.5287 131.53 131.52 131.53 131.52 1655510220000 21
5037 902.0 131.5651 131.55 131.56 131.56 131.55 1655510340000 32

5038 rows × 8 columns

Columns in above table are using initials, o for open, h for high and so on.

Web Socket to Get Realtime Data

Polygon also provides WebSocket which allows us to get data in near realtime.

from polygon import WebSocketClient, STOCKS_CLUSTER,RESTClient
import json

We need to create a handler inorder to handle a response. In our case, we need to create a handler that will write the data in our database or our desired place.

def close_handler(ws):
    for w in ws:
        if w["ev"]=="AM":

In above function, we are receiving websocket’s response as ws and we convert it into dictionary using json.loads. Polygon sends bunch of responses in a same response if the system is slow or the result is too many to send one by one. So we loop through them, if there is a event (ev) named AM then we pring our data. AM means aggregated minute (I guess).

my_client = WebSocketClient(STOCKS_CLUSTER, key, close_handler)
my_client.subscribe(*[f"AM.{s}" for s in symbols])
  • We prepare symbols in a list.
  • Then prepare a object of WebSocketClient by passing STOCKS_CLUSTER, key and our handler.
  • We run a Async and finally subscribe to the symbols. The AM there is responsible for getting realtime Aggregated data per minute.