Polygon.io Stock Market Data in Python: REST Aggregates, WebSockets, pandas, and Realtime Streams

9 minute read

In this tutorial, we will learn how to get stock market data in Python using Polygon.io. We will look at both REST API data and WebSocket streaming data.

The original version of this blog was written in 2022. At that time, I was comparing Alpaca and Polygon for realtime stock data. Polygon was useful because the updated minute bars were arriving faster in my experiments.

Today, the ecosystem has changed. Polygon.io has rebranded as Massive, but many developers still know the platform as Polygon.io. Existing Polygon-style API concepts are still important, so this tutorial keeps the original name while also mentioning the newer client library direction.

This post is for educational use only. It is not financial advice and should not be used as the only basis for trading or investing decisions.

What This Tutorial Covers

We will cover:

  • what Polygon.io is used for
  • REST API vs WebSocket API
  • how to keep API keys safe
  • installing the Python client
  • getting aggregate OHLC bars
  • converting JSON results to a pandas DataFrame
  • cleaning timestamp columns
  • plotting stock candles
  • using WebSocket for realtime minute aggregates
  • common errors and fixes
  • when to use REST and when to use WebSocket

Polygon.io vs Alpaca

A few blogs before the original post, I wrote about using Alpaca API to stream stock data. Polygon.io and Alpaca both provide market data, but they are useful in slightly different ways.

In my earlier experiment:

  • Alpaca had very convenient Python tools.
  • Alpaca could return data in friendly formats.
  • Polygon returned JSON, but it was easy to convert to pandas.
  • Polygon’s corrected aggregate bars arrived faster in my use case.
  • Alpaca had useful trading-related helpers such as market clock functions.

So, the choice depends on what you need.

Use Polygon.io or Massive-style APIs when:

  • you mostly need market data
  • you need REST and WebSocket data streams
  • you want aggregates, trades, quotes, snapshots, or reference data
  • you are building dashboards, research tools, or data pipelines

Use Alpaca when:

  • you also need trading APIs
  • you want broker-style account features
  • you want paper trading and market-data features together

Current Note: Polygon.io Rebrand

Polygon.io has rebranded as Massive. If you are updating older code, you may see two styles:

Old style:

pip install polygon-api-client

Newer style:

pip install -U massive

Old imports may look like this:

from polygon import RESTClient

Newer imports may look like this:

from massive import RESTClient

If you are maintaining an older project, check which package version the project uses before changing imports.

Getting an API Key

To use Polygon.io or Massive market data APIs, you need an API key.

General steps:

  1. create an account
  2. open the dashboard
  3. find your API key
  4. copy it into an environment variable
  5. never commit it to GitHub

In the original notebook, the API key was written directly as:

key = ""

That is not safe. A better approach is to use environment variables.

Store API Key Safely

Create a .env file:

POLYGON_API_KEY=your_api_key_here

Add .env to .gitignore:

.env
__pycache__/
*.pyc

Install python-dotenv:

pip install python-dotenv

Load the key:

import os

from dotenv import load_dotenv


load_dotenv()

API_KEY = os.environ["POLYGON_API_KEY"]

This keeps the key out of your source code.

Install Python Client

For newer projects, install the current official client:

pip install -U massive

Then import:

from massive import RESTClient

If you are following older code from 2022, you may see:

pip install polygon-api-client==0.2.11

and:

from polygon import RESTClient

The old version may still be useful if you are reproducing an old notebook, but for new projects, start from the latest client and official documentation.

REST API vs WebSocket API

There are two common ways to get market data.

API Type Best For
REST API Historical data, one-time queries, backtesting, dashboards
WebSocket API Realtime streams, live dashboards, alert systems

Use REST when you want data for a date range.

Use WebSocket when you want to listen for live updates.

What Is Aggregate Data?

Aggregate data means OHLCV data grouped by time.

OHLCV means:

Column Meaning
Open first price in the period
High highest price in the period
Low lowest price in the period
Close last price in the period
Volume traded volume in the period

For example, one-minute aggregate bars contain one row per minute.

Get Aggregate Stock Data with REST API

First, create the client.

import os

import pandas as pd
from dotenv import load_dotenv
from massive import RESTClient


load_dotenv()

API_KEY = os.environ["POLYGON_API_KEY"]

client = RESTClient(api_key=API_KEY)

Now request one-minute aggregate bars for Apple.

ticker = "AAPL"

aggs = []

for bar in client.list_aggs(
    ticker=ticker,
    multiplier=1,
    timespan="minute",
    from_="2022-06-10",
    to="2022-06-21",
    limit=50000
):
    aggs.append(bar)

This returns aggregate bar objects.

Convert Aggregates to pandas DataFrame

The exact object format can depend on the client version. A safe approach is to convert each item to a dictionary.

records = []

for bar in aggs:
    if hasattr(bar, "__dict__"):
        records.append(bar.__dict__)
    else:
        records.append(dict(bar))

df = pd.DataFrame(records)

df.head()

If your client already returns dictionaries, this can be simpler:

df = pd.DataFrame(aggs)

Understand Aggregate Columns

Depending on the client version, columns may be long names or short names.

Old JSON-style columns often looked like this:

Column Meaning
v volume
vw volume weighted average price
o open
c close
h high
l low
t timestamp in milliseconds
n number of transactions

A cleaned version can use readable column names:

rename_map = {
    "v": "volume",
    "vw": "vwap",
    "o": "open",
    "c": "close",
    "h": "high",
    "l": "low",
    "t": "timestamp",
    "n": "transactions",
}

df = df.rename(columns=rename_map)

Convert Timestamp to Datetime

Polygon-style aggregate timestamps are often in milliseconds.

df["datetime"] = pd.to_datetime(
    df["timestamp"],
    unit="ms",
    utc=True
)

df = df.sort_values("datetime")

df.head()

If you want New York time:

df["datetime_ny"] = df["datetime"].dt.tz_convert("America/New_York")

Timezone handling is important in market data because exchanges operate in specific timezones.

Select Useful Columns

columns = [
    "datetime",
    "open",
    "high",
    "low",
    "close",
    "volume",
    "vwap",
    "transactions"
]

df = df[[column for column in columns if column in df.columns]]

df.head()

Now the data is ready for analysis.

Plot Close Price

import matplotlib.pyplot as plt

plt.figure(figsize=(12, 5))
plt.plot(df["datetime"], df["close"])

plt.title("AAPL 1-Minute Close Price")
plt.xlabel("Time")
plt.ylabel("Close Price")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

This gives a simple line plot of close prices.

Save Data to CSV

df.to_csv("aapl_1min_bars.csv", index=False)

Read it later:

df = pd.read_csv("aapl_1min_bars.csv")

Saving data is useful for debugging, analysis, and backtesting.

Raw REST Request Alternative

You can also use the REST API manually with requests.

import os

import pandas as pd
import requests
from dotenv import load_dotenv


load_dotenv()

API_KEY = os.environ["POLYGON_API_KEY"]

ticker = "AAPL"

url = (
    "https://api.polygon.io/v2/aggs/ticker/"
    f"{ticker}/range/1/minute/2022-06-10/2022-06-21"
)

params = {
    "adjusted": "true",
    "sort": "asc",
    "limit": 50000,
    "apiKey": API_KEY,
}

response = requests.get(url, params=params, timeout=30)
response.raise_for_status()

data = response.json()

df = pd.DataFrame(data.get("results", []))

This is useful when you want to understand what the client library is doing internally.

WebSocket for Realtime Data

REST is good for historical data. But if we want realtime market data, we need WebSockets.

WebSockets keep a connection open and send new events as they arrive.

In the old 2022 code, the WebSocket usage looked like this:

from polygon import WebSocketClient, STOCKS_CLUSTER, RESTClient
import json

Then:

symbols = ["AAPL"]
my_client = WebSocketClient(STOCKS_CLUSTER, key, close_handler)
my_client.run_async()
my_client.subscribe(*[f"AM.{s}" for s in symbols])

This was based on the older client style.

For newer projects, check the current WebSocket client examples in the official client repository because the API style may change between versions.

WebSocket Event Types

For stock streams, common channels include:

Channel Meaning
T.SYMBOL trades
Q.SYMBOL quotes
AM.SYMBOL minute aggregates
A.SYMBOL second aggregates, depending on plan/support

For example:

AM.AAPL

means minute aggregate bars for Apple.

Simple WebSocket Handler Concept

A WebSocket handler receives messages from the stream.

Example concept:

def handle_message(message):
    """Handle incoming WebSocket messages."""
    print(message)

In practice, the message format depends on the client version.

A response may contain several events at once, so it is common to loop through messages.

def handle_events(events):
    for event in events:
        event_type = getattr(event, "event_type", None)

        if event_type == "AM":
            print(event)

If the client gives dictionaries:

def handle_events(events):
    for event in events:
        if event.get("ev") == "AM":
            print(event)

Store Realtime Bars

For a real project, printing is not enough. We may want to store bars.

Possible storage options:

  • CSV file
  • SQLite
  • PostgreSQL
  • Redis
  • TimescaleDB
  • Parquet files
  • message queue

For a simple CSV example:

def append_bar_to_csv(bar, file_path="bars.csv"):
    row = pd.DataFrame([bar])
    row.to_csv(
        file_path,
        mode="a",
        header=not Path(file_path).exists(),
        index=False
    )

Remember to import Path:

from pathlib import Path

Realtime Data Pipeline Idea

A simple realtime pipeline can look like this:

WebSocket stream
      |
      v
message handler
      |
      v
clean event
      |
      v
append to database
      |
      v
dashboard or alert system

This is useful for:

  • live dashboards
  • alert systems
  • paper trading experiments
  • market monitoring
  • data collection
  • strategy research

REST vs WebSocket Example Use Cases

Task Better Choice
Download last 6 months of 1-minute bars REST
Show live AAPL updates in dashboard WebSocket
Backtest a strategy REST
Monitor realtime trade events WebSocket
Build a daily report REST
Trigger alert when price moves WebSocket

Common Problems and Fixes

Problem 1: Authentication Error

Check:

  • API key is correct
  • API key is active
  • environment variable is loaded
  • subscription plan supports the endpoint

Problem 2: No Data Returned

Possible reasons:

  • ticker is wrong
  • market was closed
  • date range has no trading data
  • plan does not support the data
  • endpoint parameters are wrong

Problem 3: WebSocket Connects but No Messages Arrive

Possible reasons:

  • market is closed
  • subscribed channel is not supported by your plan
  • wrong ticker symbol
  • wrong channel name
  • connection is blocked
  • you are using old client code with a newer package

Problem 4: Too Many Requests

REST APIs can have rate limits.

Solutions:

  • cache results
  • reduce request frequency
  • use pagination properly
  • use a paid plan if needed
  • use flat files for large historical downloads

Problem 5: Timestamp Looks Wrong

Check the timestamp unit.

If timestamp is in milliseconds:

pd.to_datetime(df["timestamp"], unit="ms", utc=True)

If timestamp is in seconds:

pd.to_datetime(df["timestamp"], unit="s", utc=True)

Good Practices

Here are some good practices when working with stock market APIs:

  • keep API keys in environment variables
  • do not commit keys to GitHub
  • understand your data plan and limits
  • store raw data before cleaning
  • convert timestamps carefully
  • use UTC internally
  • convert to exchange timezone for display
  • handle empty API responses
  • add retry logic for network errors
  • log WebSocket disconnects
  • do not make trading decisions from untested data

Minimal Project Structure

A small project can look like this:

polygon-stock-data/
│
├── rest_bars.py
├── websocket_stream.py
├── config.py
├── requirements.txt
├── .env
├── .gitignore
└── data/

Example requirements.txt:

massive
pandas
python-dotenv
requests
matplotlib

If using old code:

polygon-api-client==0.2.11
pandas
python-dotenv
requests
matplotlib

Full REST Example

import os

import pandas as pd
from dotenv import load_dotenv
from massive import RESTClient


load_dotenv()

API_KEY = os.environ["POLYGON_API_KEY"]

client = RESTClient(api_key=API_KEY)

ticker = "AAPL"

aggs = []

for bar in client.list_aggs(
    ticker=ticker,
    multiplier=1,
    timespan="minute",
    from_="2022-06-10",
    to="2022-06-21",
    limit=50000
):
    aggs.append(bar)

records = []

for bar in aggs:
    if hasattr(bar, "__dict__"):
        records.append(bar.__dict__)
    else:
        records.append(dict(bar))

df = pd.DataFrame(records)

rename_map = {
    "v": "volume",
    "vw": "vwap",
    "o": "open",
    "c": "close",
    "h": "high",
    "l": "low",
    "t": "timestamp",
    "n": "transactions",
}

df = df.rename(columns=rename_map)

if "timestamp" in df.columns:
    df["datetime"] = pd.to_datetime(
        df["timestamp"],
        unit="ms",
        utc=True
    )

df = df.sort_values("datetime")

df.to_csv("data/aapl_1min_bars.csv", index=False)

print(df.head())
print(df.shape)

Final Thoughts

In this post, we learned how to use Polygon.io stock market data in Python. We covered REST aggregate bars, pandas conversion, timestamp cleaning, and the idea of WebSocket realtime streams.

The main ideas are:

  • use REST API for historical data
  • use WebSocket API for realtime data
  • keep API keys secure
  • convert JSON responses into pandas DataFrames
  • clean timestamps carefully
  • check the current client library before copying old code

The original 2022 code used polygon-api-client==0.2.11, which was useful at that time. For new projects, start from the current official client and documentation, then adapt the code based on your package version and subscription plan.

Comments