How I Setup my Macs for Python Development
I've been asked about this a few times already this month so I've decided to write down what I do to set up a new Mac for (mainly) Python Development.
First, Install Dropbox and Sync Folders
I keep my development folders on Dropbox as this gives me some degree of machine independence. More recently, I've also started to keep my dotfiles on Dropbox and use MackUp to handle the syncing of the dotfiles between machines. MackUp is a pretty straight forward symbolic linking system and it basically does what I would have done if I had rolled my own utility. Be aware though that Mackup's default behavior is to put your .ssh folder and its contents on Dropbox. If you don't like this behavior then it must be overridden manually. I always start by installing Dropbox and syncing my Development folders, as I typically have lots of stuff of on Dropbox so it takes a while to sync everything. You may want to consider using the Selective Sync feature to control the syncing process.
Upgrade your Terminal: Install Iterm 2
You're going to need a decent terminal application for your command line work. While the bundled OS X terminal application has gotten a lot better with recent releases of the OS X it still has a way to go to match Item2. Some of the many features of Iterm are listed here. So use Item2 as your terminal.
Install XCode: You'll need at least the Command Line Tools
You need to install either the full version Xcode or at least to the Command line tools. These can be actually installed from the command line by entering the following in you terminal window
You will then be prompted to either install the full Xcode or just the command line developer tools, with the latter being the default. Once you’ve installed one or the other, you can proceed to installing Homebrew.
Homebrew is a package management system that simplifies the installation of libraries, tools and utilities you typically need for development. Homebrew is actually a Ruby application and to install it just visit the Homebrew homepage copy and paste the code listed under Install Homebrew into your terminal. The install snippet changes occasionally and it is currently
ruby -e "$(curl -fsSL https://raw.github.com/Homebrew/homebrew/go/install)"
But be remember to visit Homebrew homepage to get the correct code snippet. Next run
Then to install some packages to make sure everything works
brew install ssh-copy-id git mercurial mackup
Set Up Your Shell Environment
I'm a convert to Zsh or really oh-my-zsh. However, the same would apply if you used bash or any other shell If you are using the bash shell then edit your bash_profile
or for oh-my-zsh
Add the following lines
# set the architecture flags export ARCHFLAGS="-arch x86_64" # Ensure that local bin is first in path export PATH=/usr/local/bin:$PATH
These changes will take effect the next time you restart the shell. Rather, than restarting your shell simply type the following in the terminal.
or if you are using bash
Use the Homebrew Python
Apple has always bundled Python along with its OS releases. However, we advise you not to use the bundled Python for development and instead to install Python via homebrew. We prefer the brew Python for a number of reasons including :
- Apple makes changes to its bundled Python and this can gives rise to bugs and incompatibilities.
- If you use the bundled Python you can upgrade to the latest and the greatest version of OS X you may find that your packages, virtualenvs etc. may all need to be reinstalled and recreated
- As new versions of Python are released, the bundled Python becomes more and more out-of-date. Homebrew will allow you the upgrade to the most recent versions of Python (2 or 3).
- Homebrew allows you to easily play around with Python 3 as it comes with pip3 for installing packages
To install Python with homebrew enter
brew install python
And optionally install Python 3
brew install python3
Install VirtualEnv and VirtualEnv Wrapper
A Virtual Environment, is an isolated working copy of Python and installed packages that allows you to work on a specific project without the worry of affecting other projects.
The virtualenv package allows you to create virtual environments while the virtualenvwrapper package provides a framework for organizing your virtual environments. Install them using pip
pip install virtualenv pip install virtualenvwrapper
Next, create a directory to store your virtual environments
mkdir -p ~/virtualenvs
We’ll then edit the ~/.zshrc file or (.bashrc or .bash_profile)
and add the following lines to it
# cache pip-installed packages to avoid re-downloading export PIP_DOWNLOAD_CACHE=$HOME/.pip/cache export WORKON_HOME=$HOME/virtualenvs source /usr/local/bin/virtualenvwrapper.sh
Restart your terminal or type the following in the terminal window
Next, we want to test out your setup by building a C based python package. Start by installing some shared libraries via brew
brew install libjpeg lcms libtiff libpng freetype
Next create a test virtualenv and install the C based python package in this case Pillow which needs to be built against the shared libraries that you just installed using brew
mkvirtalenv test-env pip install Pillow
Everything is fine if you get a post install message like similar to the one shown below
PIL SETUP SUMMARY -------------------------------------------------------------------- version Pillow 2.4.0 platform darwin 2.7.6 (default, Apr 9 2014, 11:48:52) [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] -------------------------------------------------------------------- --- TKINTER support available --- JPEG support available *** OPENJPEG (JPEG2000) support not available --- ZLIB (PNG/ZIP) support available --- LIBTIFF support available --- FREETYPE2 support available *** LITTLECMS2 support not available *** WEBP support not available *** WEBPMUX support not available
You can of course use homebrew to install PostgreSQL. However, I've found that that it takes a lot of manual tweaking to get all the procedural languages (PL/pgSQL, PL/Python, PLV8 etc.) as well as extensions like PostGIS, hstore etc. working. In contrast, Postgresapp pretty much works out of the box and gives you stable, up-to-date version PostgreSQL/PostGIS pretty much configured for any kind of development that you are likely to do. Postgresapp also comes with the full set of PostgeSQL/PostGIS command line tools however, you need to add the bin directory to your path. As before, depending on you shell of choice add the following to your .zshrc , .bash_profile or .bashrc
Restart you terminal app or type the following in the terminal window
or if you use another shell example bash
Finally, test out your setup by typing
createdb test-db #create a new database psql test-db dropdb test-db
Install a GUI Version Control Client
While we can use Homebrew to install source control tools (i.e git and or mercurical) many new developers, or those coming from backgrounds in design can sometimes feel a bit intimidated by using git and mecurical from the command line. So to help them get over the hump we usually recommend they also install a free git and mercurial GUI client such as SourceTree.
Python Data Tools
Python is becomming more and more popular for statistics, data analysis and data science tasks. In particular you may be interested in in developing solutions using the SciPy stack and using tools like IPython, Matplotlib, Pandas , scikit-learn , etc. The usual recommendation is to use a binary distribution such as Anaconda or Enthought Canopy, but as a developer you will want/need to use the source distribution. Start by adding these lines to your .zshrc or .bash_profile.
export CFLAGS="-arch i386 -arch x86_64" export FFLAGS="-m32 -m64" export LDFLAGS="-Wall -undefined dynamic_lookup -bundle -arch i386 -arch x86_64" export CC=gcc export CXX="g++ -arch i386 -arch x86_64"
This should take care of problems you can sometimes encounter in building some numpy and scipy extensions.
Next install some of the perquisites libraries and applications using homebrew.
brew install gfortran pkg-config zeromq readline
Then install numpy and scipy using pip. We will also install nose so we can run the test suite.
pip install numpy pip install scipy pip install nose
Note, I'm installing these packages directly to the system site packages i.e. in /usr/local/lib/python-2.7/site-packages as opposed to a particular virtualenv as I often use pydata packages for ad-hoc hacking and experimentation. You can run the test suite by starting Python and running
import scipy import numpy numpy.test() scipy.test()
Almost all the numpy test should pass but interestingly there will be quite a few known failures with scipy. This should not be a problem
Next install pandas
pip install pandas
Then we can install IPython (including support for the IPython notebook) , as well as matplotlib and the ipython sql extension which allows you to to connect to a database and then issue SQL commands within IPython or IPython Notebook.
pip install jinga2 pip install ipython pyzmq tornado pygments pip install matplotlib pip install pycopg2 pip install ipython-sql
Check that everything is setup properly by launching the IPython notebook with matplotlib integration.
ipython notebook --pylab=inline
Here is a sample session from notebook I created
Start by importing pandas and load the sql magics
import pandas as pd %load_ext sql
Connect to a PosgreSQL database with data from the Trinidad and Tobago stock exchange and execute sql
%%sql postgresql://localhost/mass-db select sd.dateix, s.ticker, sd.close_price, sd.volume from markets_symboldata sd, markets_symbol s where s.id=sd.symbol_id order by sd.dateix desc limit 10;
10 rows affected.
Execute another query using the connection established above to retrieve the last 100 days of data for GHL. Store the result in a variable called results
result = %sql select sd.dateix, s.ticker, sd.close_price, sd.volume from markets_symboldata sd, markets_symbol s where s.id=sd.symbol_id and s.ticker='GHL' order by sd.dateix desc limit 100
100 rows affected.
Because pandas is available we can use the DataFrame method to create a DataFrame from the resultset
df = result.DataFrame() df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 100 entries, 0 to 99 Data columns (total 4 columns): dateix 100 non-null object ticker 100 non-null object close_price 100 non-null float64 volume 100 non-null int64 dtypes: float64(1), int64(1), object(2)
Convert the df to a time series by setting the index to the dateix
Statistical summary of the closing price and volume
8 rows × 2 columns
Use matplotlib to plot the closing price
<matplotlib.axes.AxesSubplot at 0x11650bcd0>