Database Design Project

Summary

For this project, I designed and implemented a MySQL database hosted on AWS to store data about US stocks. I also created a Python script to clean and transfer financial data from my broker into the MySQL database.

A complete description of the project can be found below. The Python script can be accessed in the project's GitHub repository.

Motivation

My primary motivation for this project was to practice using the skills and tools listed below. Secondarily, I wanted to improve my stock trading system by storing my data in a relational database rather than in CSV files.

Skills and Tools Used

Skills

  • Database design
  • Data cleaning
  • Software development
  • Version control
  • Project documentation

Tools

  • Python
  • SQL
  • MySQL
  • AWS
  • Regular expressions
  • Git & GitHub
  • Jupyter Notebook

Database Design Process

Requirements Specification

I started by specifying the requirements for my database.

The database must be able to:

  • store data about thousands of US stocks
  • store price/volume data, fundamental data, and general identifying data for each stock
  • retroactively allow the addition of other types of financial instruments and fundamental data
  • easily join all data about a stock when queried
  • easily update to reflect new information
  • be queried with SQL and Python
  • run on a cloud computing platform such as AWS

ER Diagram

I then created an ER diagram to help me design a database that meets the specified requirements. Due to the highly structured nature of the data, a relational database model was chosen over a non-relational model.

There are four entities (exchange, contract, price_volume, and fundamental) which correspond to the four tables in the database. Each stock contract has an associated exchange on which it is listed as well as price/volume and fundamental data that characterize the performance of the stock. The database is normalized so it can be easily modified to support additional contract types and performance metrics.

Database Instantiation

I chose MySQL as the relational database management system as it is open-source, widely used, and extensively documented.

Next, I created a MySQL database instance using Amazon's Relational Database Service (RDS), one of the many services available through the Amazon Web Services (AWS) platform. I took advantage of the RDS free tier, which provides 20GB of SSD storage for one year. The database could have been instantiated on my local computer instead of through RDS but I opted for RDS to get further acquainted with AWS and to have a simple solution for scaling up my database in the future.

Schema

To create the schema for my database, I connected the database to MySQL Workbench, the official MySQL client. I then defined the schema below based on my ER diagram.

Data Cleaning and Insertion Process

Version Control

Before writing the Python scripts to add data to the database, I specified the version control strategy that I would use for this project and my trading system as a whole. I chose to use Git as my distributed version control system because it is open-source, robust, and well documented. I also decided to push my local commits to GitHub as a way to remotely backup my code and facilitate collaboration in the future if the need arises.

When designing a workflow for my trading system, I was inspired by the widely used Gitflow workflow detailed by Vincent Driessen. Gitflow is well documented and effectively handles many of the issues that may arise in a software project, but it is too complex for this relatively simple project. The workflow I designed for this project utilizes the same "master", "develop", and "feature" branches that are specified in Gitflow but simplifies the branching and semantic versioning process. This simplified workflow is illustrated below.

Python Development Environment

As the rest of my trading system was developed in Python, I decided to use Python to query and insert data into my database. This allowed me to interface with my broker and the MySQL database via the same Python script.

I used Anaconda and its virtual environments to manage my Python packages, making sure to create a new virtual environment for every semantic version upgrade of my trading system. The MySQL Connector Python package was installed in my virtual environment to provide an interface with MySQL via Python.

Exchange Data

I started by inserting data into the exchange table of the database. As my trading system focuses on only two exchanges, NASDAQ and NYSE, I performed this insertion manually using the code snippet below. The variables "cursor" and "cnx" refer to objects of the MySQL Connector module and the INSERT statement is standard SQL syntax combined with Python string formatting syntax. Refer to "data_acquisition.py" to see this code snippet in context.

insert_statement = ('INSERT INTO exchange '
                    '(abbrev, name, ib_name, state, city) '
                    'VALUES (%s, %s, %s, %s, %s)')
exchanges = [('NASDAQ', 'Nasdaq', 'ISLAND', 'New York', 'New York'),
             ('NYSE', 'New York Stock Exchange', 'NYSE', 'New York', 'New York')]

for exchange in exchanges:
    self.cursor.execute(insert_statement, exchange)
    self.cnx.commit()

Contract Data

I then used the data in the exchange table to populate the contract table. The data for the contracts was acquired from Interactive Brokers (IB) using their Python API. I followed the steps below, as seen in "data_acquisition.py."

  1. Select the id and abbrev attributes from the exchange table using MySQL Connector.

  2. For every exchange in the result set, use the IB API to scan for the 50 most actively traded stocks on that exchange priced between 10 and 20 USD.

  3. Use the IB API to request the full contract details for every contract returned by the scan.

  4. Parse the ContractDetail objects returned by the IB API to create a list of dictionaries, where each dictionary corresponds to a contract and the keys of each dictionary are the attributes of the contract table.

  5. Use MySQL Connector to iteratively insert the data in each contract dictionary into the contract table of the MySQL database.

Price/Volume Data

Next, I used the data in the contract table in conjunction with the IB API to populate the price_volume table. I followed the steps below, as seen in "data_acquisition.py."

  1. Use MySQL Connector to select the attributes from the contract table that are necessary to define a contract in the IB API.

  2. For every contract in the result set, use the IB API to get the historical 30min price/volume data for that contract.

  3. Parse the BarData objects returned by the IB API to create a list of dictionaries, where each dictionary corresponds to a row in the price_volume table and the keys of each dictionary are the attributes of the price_volume table.

    • Make sure to convert the Unix time string returned by IB into a standard ISO 8601 formatted datetime object.
  4. Use MySQL Connector to iteratively insert the data in each price_volume dictionary into the price_volume table of the MySQL database.

Fundamental Data

Finally, similar to my approach for the price_volume table, I used the data in the contract table in conjunction with the IB API to populate the fundamental table. I followed the steps below, as seen in "data_acquisition.py."

  1. Use MySQL Connector to select the attributes from the contract table that are necessary to define a contract in the IB API.

  2. For every contract in the result set, use the IB API to get the fundamental data for that contract.

  3. Parse the string objects returned by the IB API to create a list of dictionaries, where each dictionary corresponds to a row in the fundamental table and the keys of each dictionary are the attributes of the fundamental table.

    • The string returned for each contract by the IB API contains all available fundamental attributes in the format "FUND1=####;FUND2=####;..."
    • For every dictionary key, use a regular expression like the one below to extract the corresponding fundamental attribute's value from the string returned by IB. In the code snippet below, "re" is the built-in Python regular expression module, "value" is the string object returned by IB, and "MKTCAP" is the specified fundamental attribute.

      re.search('MKTCAP=(.+?);', value)
    • If a fundamental attribute is not present in the string object, specify None for the value of the corresponding dictionary key.

  4. Use MySQL Connector to iteratively insert the data in each fundamental dictionary into the fundamental table of the MySQL database.

Future Work

The functionality of the database can be extended by incorporating other data sources, accommodating additional contract types, and adding additional performance metrics such as the social sentiment towards a stock.