In this note, I show how to set up a local Mongo DB database to house CRSP-COMPUSTAT data. I grew tired of having to use SAS to access these data on the WRDS server. The coding language is difficult to use and the server is not particularly responsive. By using Mongo DB, I can access the data in a variety of languages such as Python and R as well as parallelize the queries to speed up any computations.
I downloaded monthly CRSP data and annual COMPUSTAT data from WRDS over the time period from 1950-2010. To set up the new database system, I downloaded MongoDB, PyMongo and Python. I am running Ubuntu 11.04 (“Natty Narwhal”) so just selected “mongodb” and “python-pymongo” from the Synaptic Package Manager. Downloading all of the software took less than 5 minutes. I then used the short piece of Python code located here to populate a new database.