VIM shortcuts

Basic navigation

2h - 2 steps LEFT
H - home (top screen)
10j - 10 steps DOWN
J - join this line to the line below
{ - UP a section 
} - DOWN a section
o O - open line BELOW open line ABOVE
b - beginning of current word
w - beginning of next word
e - end of current word
$ - end of line 
^ - start of line
H - TOP of screen
L - BOTTOM of screen
v - select text
V - select line
v3j - select 3 lines DOWN

Modifying text

i - insert
a - append
dh - DELETE left
dl - DELETE right
dd - DELETE line
dw - delete word
D - DELETE to end of line
x - DEL
X - BACKSPACE - INDENT line LEFT (use visual for block) 
>> - INDENT line RIGHT  (use visual for block)
p - paste after
P - paste before
3p - paste 3 copies
yy - yank whole line
yw - yank word
y$ - yank to end of line
4yy - yank four lines
Y - yank whole line
J - join current line to next

Save Exit and help

K - HELP (bash, req select word)
ZZ- save and quit
ZQ - quite without saving
z - save


[ - enter command mode
, - leader key
v - visual
: - enter command mode
i - insert mode
a - append mode
:tabnew - open new tab
,n - left tab
,m - right tab
/ - search # press enter, then n to repeat, N repeat up 
:s/old/new/g - globally change


F2 - paste code blocks from external (press again to end)
n - remove highlight last search
,e - quit
,E - qa! quit all windows
ctl j - move to lower window
ctl k - move to upper window
ctl h - move to left window etc
,s - sort selected
f - fold code block (again to undo)
F - fold all (again to undo)
ctl space - auto complete
shift p - filesearch


:Gstatus - status / stage by "-"
:Gcommit - message 'wq' save

Cool combinations

cH, dH - copy/delete to top of screen
d3+ - delete line + 2 lines below
y3+ - copy lien + 2 lines below
d/test - delete all words 'test'

Insert breakpoint and ipdb PDB bit…


Setting up VIM as a python IDE

If you are looking for a lightweight but feature rich editor you can’t go past VIM! Sure those keyboard shortcuts are difficult remember and it’s frustratingly hard trying to stop reaching for that bloody mouse (the mouse will actually work in this set-up BTW). However, what I have found is the best way to learn VIM (I don’t think you ever really learn vim) is to make it useful enough that you want to start using it; this helps build up the muscle memory required to become efficient [enough]. The ultimate payback is that with practise you can become really really efficient relative to a user stuck with a mouse and complex GUI. The mouse is so 1980’s anyway 😛


So this post is about making VIM useful! The steps are based around Ubuntu/Debian Linux but should be adaptable for OSX and other NIX distros. Windows users, like yourselves I have no idea (-;?

Of course nothing I ever do is original and this especially goes for this post which borrows heavily from an excellent (but old) PyCon Asia talk by Martin Brochhaus. The .vimrc and much of the instructions are taken from the talk so feel free to watch it on youtube. I’ve also tried to update some of the info and stick to the minimum to get you coding away ASAP.

Section 1: Basic VIM install

These steps will clone the current version of VIM from source and enable additional features. If you already have a copy installed (ie vim starts when you type ‘vim’) I would first fun ‘apt-get remove vim’ (or your equivalent). First dependencies and some tools.

sudo apt-get update
sudo apt-get install mercurial curl python-pip flake8
sudo apt-get build-dep vim

#install python packages
sudo pip install ipdb
sudo pip install jedi
sudo pip install flake8

As we are going to use a bin folder in our home directory we need to put this location into our path so that our new version of vim will run from the command line.

#add a home bin dir to your path
nano ~/.bashrc

#add the following to the end of the file
if [ -d $HOME/bin ]; then

#save and exit

Now we make the ~/bin dir as well as an ~/opt for the vim install. Eventually we will use a simlink from ~/bin (now in our path) to the ~/opt folders in our home directory, the -p flag says ‘don’t complain if this dir is already there’. The final command reloads your bashrc without you having logout. Cool.

mkdir -p ~/bin
mkdir -p ~/opt
source ~/.bashrc

Now to install VIM.

cd ~/Desktop
hg clone vim
cd vim/src
./configure --enable-pythoninterp --with-features=huge --prefix=$HOME/opt/vim
make install
cd ~/bin
ln -s ~/opt/vim/bin/vim

#the following should return /home/YOUR_USERNAME/bin/vim
which vim


Section 2: The vimrc settings file

Now we need a “.vimrc file”. This file will contain our customisations (the dot says I’m a hidden file). You can get mine from here (download to your desktop or copy contents into a blank txt file). It is worth look at this file in a text editor as it contains some information on the key short-cuts and the plugins we are going to use (BTW in .vimrc files a single quote ” indicates a comment (aka #) a double quote “” indicates a bit of code that can be uncommented to enable something). The next bit of code simply copies the file to your home dir and the gets a custom colour scheme (mostly just to show how to do it).

mv ~/Desktop/vimrc.txt ~/.vimrc
mkdir -p ~/.vim/colors
cd ~/.vim/colors
wget -O wombat256mod.vim

Section 3: Getting set-up for the plugins with pathogen

This next part is important. To manage plugins we will use a bit of kit called pathogen. The plugins can then be installed (mostly by a git clone) right into the “~/.vim/bundle/” (this will result in a folder structure like this: .vim/bundle/plugin-name) and pathogen will handle everything for us – awesome! Most plugin developers set-up their folders to work nicely with pathogen to make life easy.

mkdir -p ~/.vim/autoload ~/.vim/bundle
curl -so ~/.vim/autoload/pathogen.vim

Section 4: Installing plugins

The first plugin we will use is called powerline and it adds features and makes a better looking status line.

#install powerline plugin
cd ~/.vim/bundle
git clone git://

The next plugin allows code folding to make it easier to look through long blocks of code. Its simple to use, just type f to collapse a section of code of F to collapse it all. Type again to reverse the folding – sweet as.

# install folding flugin us f to fold block of F to fold all
mkdir -p ~/.vim/ftplugin
wget -O ~/.vim/ftplugin/python_editing.vim

The next one installs ctrlp which allows nice fuzzy file searches from directly inside vim.

#install ctrlp
cd ~/.vim/bundle
git clone

Install the jedi plugin which allows awesome auto-completion of commands and imports using [ctl][space].

#install jedi plugin
git clone --recursive

Install syntastic is another awesome plugin that does syntax checking and will check your code for compliance with PEP8.

#install syntastic
git clone

Forget tabbing into the terminal to stage/commit/merge etc just install git support with fugitive.

# install git support
cd ~/.vim/bundle
git clone git://
vim -u NONE -c "helptags vim-fugitive/doc" -c q

So that is it! Your editor should look something like this:


Well that’s the install. Next post will be the shortcuts I use all the time and some that I want to learn as a bookmark for myself. Otherwise checkout the links to the plugins for detailed descriptions of what they do. But now you should be up and running with a pretty nice looking IDE!

Make an android app in minutes with the App Inventor 2


The App Inventor 2 (AI2) is a really cool bit of software from Google and [now] MIT. It contains two GUI tools to make creating Android apps a “relatively” easy task. The first tool is a designer that helps create your screen layouts, pretty much like the layout designers that come with the Android SDK. The really cool and unique part is the “blocks” designer, which helps you create code by moving object blocks around (rather than typing java). These GUI’s dramatically speed up development, whilst also providing an excellent framework to teach people how to code in a fun and accessible way (ie no language syntax).

The following quick run through of a very simple conference app that I’m using for a demo at our teaching symposium. The goal is to do this demo in 10 min, so really this is bare-bones stuff! If you want more I highly recommend the book App Inventor, edition 2, which covers AI2 in much more detail through a series of fun (and useful) projects.

Its worth pointing out now that AI2 has some additional cool features. Firstly, its all on the cloud so you don’t need any software except chrome or firefox (ha ha IE is NOT supported). Secondly, if you are on WIFI, by installing a simple application on your phone you can test the app instantly by scanning a QT code from the computer screen. If you have ever created an app with eclipse and used the emulator for testing you will be delighted by how much simpler and quicker this process is, its super cool –  really!

The conference app is just a listview of names and session times that are clickable. If a item is selected a photo and brief bio pops up for that session. No fancy formatting at this stage, anyway.

1. Go to the AI2 page and login with your google account

The first screen will recommend you install some additional applications to allow easy testing on your phone. By far the simplest is to choose option one, as long as you are sharing a wifi network with your computer and phone. For the rest of this guide I’ll assume this is how you will test your app.


2. On the menu goto “Projects” – “Start new project” and call it something informative

After a few seconds the designer window will appear. The designer page allows you to layout all of the elements your require for your app. The left hand “palette” contains elements for layout as well as user interaction widgets. These elements can be dragged onto the middle view pain or alternatively if they are a invisible element (a database for example) to the bottom of the viewer screen. The “components” pane allows you to customise each element and finally the properties pane lets you modify size, layout, and text aspects.


3. Re-title

Just to to see how simple things can get, lets give a more informative title. All we need to do is change the “title text” in the property pane.

fig34. Add the list-view

Ok the meat and potatoes of the app, the listview will be populated with speaker names and times that are clickable. Just drag the listview icon from the palette window to the viewer.


 5. Adjust the placement and re-sizing to suit your tastes

We can now adjust the listview properties, such as the width and height so that this element fills the available screen space.

fig56. Go to the blocks page and start creating the logic for your app

Now the nuts and bolts (and the cool part). We have to populate our listview with the speaker names when the app is opened, as well as specify some behaviour when the user clicks on the element in the listview. For this work we will need to use the code blocks on the left hand side of the “blocks” screen.


7. Initialise by populating the list-view with strings once the screen is loaded

Clicking on the “Screen1” icon will bring up all the blocks associated with this choice. We want to drag the “when screen initializes” block to the work area. This “block” of code will execute when the app opens. Notice the gap in the middle, this is where we place the “do” part. In this case what we want to “do” is populate the listview with text for the speaker names.


To do this we click on the listview block and select the “listview elements” block, notice how this block will fit nicely inside the previous block. Notice also that the open jigsaw like parts at the end of this new block, this is where we add our new list that will hold the speaker string elements. I hope you notice this “feels” like coding nix with objects all the typing.


Clicking on the list block allows us to make a new list, we click this into our listview and now we populate the list with items that will be displayed.


So it should look like this (below). The little blue gear lets us increase the number of spots for elements to populate our list. For this app we will have four speakers for this section, so just drag the item block into the right “list block” twice to create two more spaces.


To actually populate the list we need some text. So we grab the empty text fields and drag them to the open slots on the list.


By typing between the quotes we can add our custom text. By right clicking on this first text element we can duplicate it for the remaining 3 spaces (or we could just drag 3 more empty elements across).


We modify each element to be a speaker name. Ok, now when the app opens the listview will be populated by string elements found in the list. Next we need to specify some behaviour when the user clicks on the speakers name.


At this stage our basic app should be “working”, as in the listview should display, although nothing will happen when we click on each item. Now this is where the simple testing functions of AI2 come to the fore. Just click on the build and then “provide QR code for apk”.

Screenshot 2014-12-04 23.00.00

Screenshot 2014-12-04 23.13.48

Once the QR code pops up just open up the App inventor 2 companion app and scan the code, its bloody brilliant!


Now we create an action once an element (speaker) in the list is selected by the user. As this is a listview action we click back on the listview icon and grab the “afterPicking” block; thats logical don’t you think?

One thing you might notice about these blocks, if you have coded before, is that they are very much like a normal object that we would specify using code syntax, its just its GUI based.

fig14fig15Our after picking behaviour will be opening a new screen that contains a photo of the speaker and a brief bio. So we need to setup a new screen to contain this new information. I’m being really lazy here and I’m just going to hard wire the page so that only one of the listview items actually works as expected (mine of course). However, in real life you would setup some logic that would ask for the index item that was clicked, and use this to send the user to the correct screen.
 fig16fig17So we have setup the method to open the new screen (above), now we need the screen. We’ll create the “DaveScreen” (as specified by another purple text element block ) that will contain my mock bio and a great photo. To do this just click the “add screen” button at the top of the screen and re-name it.fig18Back in the designer we can populate this new screen with a photo and some text, using a vertical alignment is simple just drag it onto the window and specify its attributes in the right hand panel.fig19Now Drag an image icon onto the panel and then by clicking on “image1” in the components plane we can upload an image and resize it using the properties. You really can’t get much easier than that really, can you!

fig20 fig21

fig23Wow, what a good looking bloke, finally, we create a label and populate it with text in the blocks screen. We could do this in the properties section of the designer page, but since I’m feeling guilty for hard coding the list view, I thought I would use the blocks to set the text.


Thats it. The app should provide a list of speakers, and when you click on my name (any name actually due to the hard wireing) the new page should open with my bio and a photo.

You can download the files associated with the app from here. These include the APK’s that can be installed on any android phone as well as the *aia files that can be imported into AI2 if you want to play with the code. I’ve included a slightly improved (although still broken) version that replaces the hard-wired listview actions with a call to specific speaker pages (that don’t exist but could be easily made based on the above information).

New Zealanders and their sheep – part 2

Ok, based on the graphs in the last post NZ is slowly being cow-a-fyed, so whats driving this trend. Well google will tell you that its …

WARNING: This data is dodgy, but I’m really just using it to demonstrate how cool pandas is. So I found some information on milk and lamb meet prices, we’ll load them up as dataframes and work out the percent change since 1994 like we did before. We’ll try out the datetime functionality of pandas, which is really quite nice. But first just to import our table from the last post and make the year the index so we can easily merge the new data.

import pandas as pd
per_decline = pd.DataFrame(pd.read_csv('percent_decline.csv'))
cols = per_decline.columns.values
cols[0] = 'Year'
per_decline.columns = cols
per_decline.index = per_decline['Year']
per_decline = per_decline.ix[:,1:] #all rows, skip first column (date is now the index)

	Total beef cattle	Total dairy cattle	Total sheep	Total deer	Total pigs	Total horses
1994	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000
2002	-11.025827	 34.444950	-20.002034	 33.858009	-19.100637	 11.811093
2003	 -8.344764	 32.882482	-20.041908	 37.229441	-10.766476	 18.504488
2004	-11.895128	 34.207998	-20.609926	 42.707754	 -8.072078	 13.376472
2005	-12.366101	 32.506699	-19.379727	 38.499840	-19.230733	-100.000000

Now we are going to create a table from the dodgy lamb price data, this table is in a slightly different format so we will have to use the groupby method to wrangle it into the shape we need.

lamb_data = pd.DataFrame(pd.read_excel('lamb_usd.xlsx',sheetname='lamb'))
	Month	Price	Change
0	 Apr 1994	 130.00	 -
1	 May 1994	 126.59	 -2.62 %
2	 Jun 1994	 127.03	 0.35 %
3	 Jul 1994	 126.11	 -0.72 %
4	 Aug 1994	 119.62	 -5.15 %

Now to use datetime to make an index based on the month data.

lamb_data.index = pd.to_datetime(lamb_data['Month'])
lamb_data=lamb_data.ix[:,1:2] #just grab the price
1994-04-02	 130.00
1994-05-02	 126.59
1994-06-02	 127.03
1994-07-02	 126.11
1994-08-02	 119.62

Pandas did a good job of converting the date format into a datetime index. As you’ll see in a second this datetime object has some extra functionality that makes dealing with dates a breeze. Although this new data has the date and price information we need, its divided into quarterly amounts. As you can see by the commented out code, initially I made a mistake and summed these values, but really we want the mean to get the average yearly price. I left the mistake code there as it shows how easy it would have been to get the sum using groupby.

#wrong! lamb_prices = lamb_data.groupby(lamb_data.index.year)['Price'].sum()
lamb_prices = lamb_data.groupby(lamb_data.index.year)['Price'].mean()
lamb_prices = pd.DataFrame(lamb_prices[:-1]) #get rid of 2014
1994	 124.010000
1995	 113.242500
1996	 145.461667
1997	 150.282500
1998	 116.013333

We pass the year index to groupby and get it to do its magic on the price column (our only column in this case, but you get the idea), we then just call the mean method to return the mean price per year. The datetime object made specifying the year easy. Now we are going to write a quick function to calculate the percent change since 1994.

def percent(start,data):
    '''calculate percent change relative to first column (1994), better than previous attempt )-:'''
    ans = 100*((start - data)/start)
    return 0-ans

lamb_change = percent(lamb_prices.Price[1994],lamb_prices)
1994	 0.000000
1995	 -8.682768
1996	 17.298336
1997	 21.185791
1998	 -6.448405

Great! Now just add that column to our original dataframe. Notice how only the intersect of the dates are used, very handy (ie it drops 1995-2001 from the lamb price data as these dates are not in our stock number table)!

per_decline['Lambprice'] = lamb_change
	Total beef cattle	Total dairy cattle	Total sheep	Total deer	Total pigs	Total horses	Lambprice
1994	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000
2002	-11.025827	 34.444950	-20.002034	 33.858009	-19.100637	 11.811093	 17.768056
2003	 -8.344764	 32.882482	-20.041908	 37.229441	-10.766476	 18.504488	 28.869984
per_decline.index=per_decline.index.astype(int) #lamb2
plt.title('Percent change in stock in NZ since 1994')
plt.xlabel('Percent change since 1994')



The next series of code and graphs adds in milk and lamb prices to try and see why farmers are moving from ovines to bovines!

milk_data = pd.DataFrame(pd.read_excel('milk_prices_usd.xlsx',sheetname='milk'))
	year	thousand head	pounds	mill lbs	price_cwt
1989	 1989	 10046	 14323	 143893	 13.56
1990	 1990	 9993	 14782	 147721	 13.68
1991	 1991	 9826	 15031	 147697	 12.24
1992	 1992	 9688	 15570	 150847	 13.09
1993	 1993	 9581	 15722	 150636	 12.80
#get rid of the info we don't need
milk_data = pd.DataFrame(milk_data.ix[5:,:])
milk_change = percent(milk_data.price_cwt[1994],milk_data)
per_decline['milk_price'] = milk_change
	Total beef cattle	Total dairy cattle	Total sheep	Total deer	Total pigs	Total horses	Lambprice	milk_price
1994	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000
2002	-11.025827	 34.444950	-20.002034	 33.858009	-19.100637	 11.811093	 17.768056	 -6.630686
2003	 -8.344764	 32.882482	-20.041908	 37.229441	-10.766476	 18.504488	 28.869984	 -3.469545
2004	-11.895128	 34.207998	-20.609926	 42.707754	 -8.072078	 13.376472	 33.670672	 23.747109
2005	-12.366101	 32.506699	-19.379727	 38.499840	-19.230733	-100.000000	 29.762385	 16.653816

lamb_3 These graphs are a little busy, lets just concentrate on the important stuff.

<pre>animals=['Total dairy cattle','Total sheep','Lambprice','milk_price']





Pandas data analysis – New Zealanders and their sheep!


As an Aussie I cop a little bit of flack living in New Zealand. It helps that since I’m from the South of Australia I follow Aussie Rules footy, not this rugby shenanigans, so I don’t bother too much with the constant focus on how much better the All Blacks are relative to our poor Wallabies (Australia has won as many world cups as New Zealand – Take that!).

They make bloody good gum boots (multi-purpose - inside joke (-; )

Kiwi’s have bloody good gum boots; they have to do more than keep the mud off! – inside joke (-;

That said as a bit of fun I thought I would do a post on the Pandas python module using data from StatsNZ. It has too be about sheep right! The actual spreadsheet I downloaded from StatsNZ and used in this is here.

I’ll be using pandas, ipython, and matplotlib to create some graphs to show the decline of sheep in NZ. In the next post I’ll try and work out why (yes I know you could find this out in 3 seconds with google, but thats no fun).

First we will read the file using pandas and make a dataframe. Pandas can import in all kinds of file formats, with excel you need to include a sheet name (that little tab at the bottom of the excel sheet).

import pandas as pd
import matplotlib.pyplot as plt
#sheetname required for xlsx
data_file = pd.read_excel('livestock1.xlsx',sheetname='livestock')

#make a dataframe
data = pd.DataFrame(data_file)
                                                 1994          2002      2003  \
Total beef cattle                             5047848       4491281   4626617
Calves born alive to beef heifers/cows        1262522       1083485   1079334
Total dairy cattle                            3839184       5161589   5101603
Calves born alive to dairy heifers/cows       2455975       3225238   3115897
Total sheep                              4.946605e+07  3.957184e+07  39552113
                                             2010      2011      2012
Total beef cattle                         3948520   3846414   3734412
Calves born alive to beef heifers/cows     901258    901375    827749
Total dairy cattle                        5915452   6174503   6445681
Calves born alive to dairy heifers/cows   3640914   3884257   3879543
Total sheep                              32562612  31132329  31262715

So we made a dataframe, the head method acts like bash head in that it shows the start of the frame rather than the whole thing. Currently the years are columns and the stock type are rows, lets flip the table, which is super easy!

 #we really want the dates as the index
data = data.T
     Total beef cattle Calves born alive to beef heifers/cows  \
1994           5047848                                1262522
2002           4491281                                1083485
2003           4626617                                1079334
2004           4447400                                1013893
2005           4423626                                1018730

Now that we have the years as rows, actually an index, it will be much easier to do our plotting. But the column names are overly informative, as in long, lets shorten them.

#the column names are pretty long, lets fix that now
Out[13]: Index([u'Total beef cattle', u'Calves born alive to beef heifers/cows', u'Total dairy cattle', u'Calves born alive to dairy heifers/cows', u'Total sheep', u'Total lambs marked and/or tailed', u'Total deer', u'Fawns weaned', u'Total pigs', u'Piglets weaned', u'Total horses'], dtype=object)
sub_data = data.loc[:,['Total beef cattle','Total dairy cattle','Total sheep','Total deer',
'Total pigs','Total horses']]
sub_data = sub_data.replace('..',0)#replace .. with 0
sub_data = sub_data.astype('float')
      Total beef cattle  Total dairy cattle  Total sheep  Total deer  \
1994            5047848             3839184     49466054     1231109
2002            4491281             5161589     39571837     1647938
2003            4626617             5101603     39552113     1689444
2004            4447400             5152492     39271137     1756888
2005            4423626             5087176     39879668     1705084
2006            4439136             5169557     40081594     1586918
2007            4393617             5260850     38460477     1396023
2008            4136872             5578440     34087864     1223324
2009            4100718             5860776     32383589     1145858
2010            3948520             5915452     32562612     1122695
2011            3846414             6174503     31132329     1088533
2012            3734412             6445681     31262715     1060694

Great! Now to make the plot easier to look at lets divide the dataframe by 1 million.

#now divide by a million
sub_data = sub_data.divide(1000000)
#first plot everything
plt.ylabel('Total stock (millions)')
plt.title('NZ farm stock')

Yes that is correct, back in the 90’s there were 50 million sheep in NZ, not bad for a country with a population of ~3 million people. Baa. But their numbers have been in serious decline since then, replaced by their bigger brothers the cows.
Poor sheepies!

Lets face it, NZ is all cows and sheep, lets just look at that data.

#lets just plot cows and sheep, that being the first 3 columns
cow_sheep = sub_data.ix[:,[0,1,2]]
cow_sheep.plot(label=True,title="Numbers of cattle and sheep in NZ")
plt.ylabel('Total stock (millions)')

fig2The data has meat cows and milk cows as separate columns, its easy to combine them.

#a milk cow and meat cow are still cattle
cow_sheep['Total cows'] = cow_sheep.ix[:,0] +  cow_sheep.ix[:,1]
cow_sheep.ix[:,2:].plot(legend=True,title="Total cattle and sheep in NZ")
plt.ylabel('Total stock (millions)')

fig_3So the country is cowafying, mostly dairy replacing beef cattle. Although the increase looks slight, basically its 25% up! That’s a lot of grass, cow shit, and methane!

Now lets look at the change in the numbers of each stock since 1994 (the start of our data). We do this by passing all the data as well as all the data minus 1994 to a function that handles the calculation. Pandas handles this all in the back end and parses the data in the frame through the function, nice.

def percent(dataframe,d1):
    '''calculate percent change relative to first column (1994)'''
    a = 100*((dataframe.ix[0,:]-d1)/dataframe.ix[0,:])
    return 0-a

#pass the entire data frame to this function
percent_data = percent(sub_data,sub_data.ix[0:,:])

      Total beef cattle  Total dairy cattle  Total sheep  Total deer  \
1994           0.000000            0.000000     0.000000    0.000000   
2002         -11.025827           34.444950   -20.002034   33.858009   
2003          -8.344764           32.882482   -20.041908   37.229441   
2004         -11.895128           34.207998   -20.609926   42.707754   
2005         -12.366101           32.506699   -19.379727   38.499840   

      Total pigs  Total horses  
1994    0.000000      0.000000  
2002  -19.100637     11.811093  
2003  -10.766476     18.504488  
2004   -8.072078     13.376472  
2005  -19.230733   -100.000000  
#the years are plotted as floats, its easy to convert them!
#figure 4
plt.title('Percent change in stock in NZ since 1994')
plt.xlabel('Percent change since 1994')

fig_4I really want the graph the other way around, so lets re-index. Also, lets get rid of the nags, they look funny because the frame was missing some data. More proof that Far Lap was an Australian horse.

horseless = sub_data.ix[:,:-1]
horseless_per = percent(horseless,horseless.ix[0:,:])
#flip the axis
horseless_per = horseless_per.reindex( index=data.index[ ::-1 ] )
plt.title('Percent change in stock in NZ since 1994')
plt.xlabel('Percent change since 1994')


Finally, its silly having the 1994 data as its zero, but it was a nice sanity check to make sure the % function was working correctly. But lets get rid of it now by just plotting part of the frame.

plt.title('Percent change in stock in NZ since 1994')
plt.xlabel('Percent change since 1994')

The ‘barh’ is just bar graph horizontal.
fig_9So it looks like there was a venison craze in the early 00’s, but mostly just more and more dairy.

#save the file

Ok, so although there are still a LOT of sheep in NZ, it really is the country of the cow now. What might be driving that? Lets look at some commodity prices in the next post!

Arch linux on an eeepc 900

This is a summary of my attempt to install the Arch linux distro on my eeepc 900. I had some problems with the login manager and the desktop, but appart from that the base install was not that difficult if you follow the beginners guide. Arch is a hard core distro in that you have to set everything up yourself, but by doing that you get a better understanding of what is going on under the hood.

My little old eeepc 900 running on Arch

Download the Arch iso, I used an external CD-drive to install but alternatively a USB thumb drive could be used. To make it easier, initially I used gparted ( GUI to set the / to the 4gb SSD and /home to the 15gb SSD, I formatted both as ext4, which is not recommended because it will destroy your ssd, but apparently ext4 is SSD aware, I just need to work out how to turn journaling off.

Boot to the CD, chose “Boot Arch Linux”. This first section is almost word for word from the beginers guide, and that has a lot more detail, I have skipped the parts that are not relevant to the eeepc.

There was no wireless at startup, on the eeepc it is identified as “wlan0” and the interphase can be checked with, then bring it up and scan

#ip link set wlan0 up
#iwlist wlan0 scan

Look for the ESSID:”yourwirelessnetworkname”

Backup the original file then modify and set it up for wpa encription.
# mv /etc/wpa_supplicant/wpa_supplicant.conf /etc/wpa_supplicant/wpa_supplicant.conf.original
# wpa_passphrase linksys "my_secret_passkey" > /etc/wpa_supplicant/wpa_supplicant.conf
#wpa_supplicant -B -Dwext -i wlan0 -c /etc/wpa_supplicant/wpa_supplicant.conf

Wait a few seconds for it to associate, now check and requiest an ip address, then ping google to check
#iwconfig wlan0
# dhcpcd wlan0
#ping -c 3

Now the hard drive
# fdisk -l

If you formatted your drives like I did (using gpartted) the output should indicate that you have sda1 which is your base and home is sdb1, since we used gparted I can skip all the ugly manual partitioning, so we just need to format them as ext4 (bad idea, see below).

# mkfs.ext4 /dev/sda1
# mkfs.ext4 /dev/sdb1

Remember that this is a bad idea, using ext2 would be safer, but I figure I can work out how to turn of journaling. Save your SSD, don’t make a swap partition!

Mount the base and the home and install the base and devel packages, then setup fstab
# mount /dev/sda1 /mnt
# mkdir /mnt/home && mount /dev/sdb1 /mnt/home
# pacstrap /mnt base base-devel
# genfstab -p /mnt >> /mnt/etc/fstab
#cat nano /mnt/etc/fstab

Modify the mount flags for drive to avoid unessary writes, more infomation at

Now install the graphics environment

pacman -S xorg-server xorg-xinit xorg-utils xorg-server-utils xterm

Now let’s enter the new system, start configuration by creating a hostname, or a name for your computer
#arch-chroot /mnt
#nano /etc/hostname

And add “daves-eeepc” or something like that, no quotes! Save [ctl][o] an exit [ctl][x], now modify the /etc/hosts and add the new name
#nano /etc/hosts
add the name as below   localhost.localdomain   localhost daves-eeepc
::1         localhost.localdomain   localhost daves-eeepc

Now configure the time, first find out the region and local time and settings
# ls /usr/share/zoneinfo/
America and then for the subdomain
# ls /usr/share/zoneinfo/America
I’m New_York
Now create a sim link to local time directory
# ln -s /usr/share/zoneinfo/America/New_York /etc/localtime
Now set up local settings and uncomment the en_US.UTF-8 UTF-8
# nano /etc/locale.gen
Then generate the file
# locale-gen

Setup system wide preferences by adding “LANG=en_US.UTF-8” to (dont inc quotes) the file shown using nano.
#nano /etc/locale.conf
Then export the settings and set the clock to utc
# export LANG=en_US.UTF-8
# hwclock --systohc --utc

Now set the network, check the /etc/rc.conf file make sure “network” is between the brackets, also add “net-auto-wireless” for our wireless.
DAEMONS=(syslog-ng network net-auto-wireless netfs crond)

Since we are wireless, we need to do a little work!
# pacstrap /mnt wireless_tools netcfg
# pacstrap /mnt wpa_supplicant wpa_actiond
# pacstrap /mnt zd1211-firmware
# arch-chroot /mnt

Set the interface in /etc/conf.d/netcfg to

Finally create the ramdisk environment, and install the grub boot loader on the root drive, and get rid of any error messages
# mkinitcpio -p linux
# pacman -S grub-bios
# grub-install --target=i386-pc --recheck /dev/sda
# mkdir -p /boot/grub/locale
# cp /usr/share/locale/en\@quot/LC_MESSAGES/ /boot/grub/locale/

Now set up grub and create a root password
# pacman -S os-prober
# grub-mkconfig -o /boot/grub/grub.cfg
# passwd

Nearly done with the base install, finally lets reboot

# exit
# umount /mnt/{boot,home,}
# reboot

Once we are back refresh pacman and lets add you as a user and install sudo

# pacman -Syy
# pacman -Syu
# adduser

Add the following, blank is just hit return to accept defaults, add contact details when asked if you like. If you make a mistake just delete and try again (# userdel -r [username])

Login name for new user []: dave
User ID ('UID') [ defaults to next available ]:
Initial group [ users ]:
Additional groups (comma separated) []: audio,games,lp,optical,power,scanner,storage,video
Home directory [ /home/dave ]:
Shell [ /bin/bash ]:
Expiry date (YYYY-MM-DD) []:

Now install and setup sudo, we need to edit the sudoer file with a special editor that has vi underneath, only use this editor as it has failsafes.

#pacman -S sudo

Scroll through the file, find the “root  ALL=(ALL) ALL” and on the next line add yourself (without quotes) “dave   ALL=(ALL) ALL”
Get tab complete working

#pacman -S bash-completion

Now install the X-server for graphics

#pacman -S xorg-server xorg-xinit xorg-utils xorg-server-utils xterm
#pacman -S xf86-video-intel xf86-input-synaptics

This is where I ran into trouble following the online guides, we are going to install the XDM login manager and the LXDE desktop both are light and work well on the eeepc.

Firstly, login as a user then

#sudo pacman -S xorg-xdm
#cp /etc/skel/.xsession /etc/skel/.xinitrc ~

These are the launch scripts, xsession calls xinitrc, change the permissions on them
#chmod 744 ~/.xinitrc ~/.xsession

Now we need to increase the run level, in /ect/inittab edit the top and bottom lines by
removing the quotes:
…. Unhash (may already be) the line that refers to xdm which we will be using as our desktom manager
x:5:respans:/usr/sbin/xdm -nodeamon

Now install dbus

#sudo pacman -S dbus
#sudo nano /etc/rc.conf

add dbus to the array
DAEMONS=(syslog-ng dbus network net-auto-wireless netfs crond)

Now install lxde and configure open box

#pacman -S lxde
#mkdir -p ~/.config/openbox
#cp /etc/xdg/openbox/menu.xml ~/.config/openbox
#cp /etc/xdg/openbox/rc.xml /etc/xdg/openbox/autostart ~/.config/openbox
# pacman -S gamin
# pacman -S leafpad obconf epdfview

finally add a exec command to the .xinitrc you created during the XDM, add at the bottom (no quotes) “exec startlxde”

That should be it! Reboot and hopefully you will have a simple graphics login page and a functioning (if not light) desktop,


Pandas, matplotlib and Ipython – all you need for great data anaylsis

I’ve been playing around a little with the stats-centric programming language R, mostly to get a better handle on the Bioconductor differential gene expression analysis packages edgeR and DEseq. The language is designed to allow for easy manipulation of tabular data as well as providing access to a rich library of statistical packages and graphing tools. One thing I learned was that often it was more efficient (at least for me) to spend a little time pre-formatting the data using python/perl before even getting started with R. The charting features of R are pretty cool, but once again I missed my familiar Python environment)-:

But, as always there is a Pythonic way, I came across a great book called Python for Data Analysis by Wes McKinney. The book introduced me to the pandas library, which contains R-like tools for tabular data manipulations and analyses. The book also introduced the Ipython development environment; basically a souped up feature rich but light weight “python IDLE”. The best features of Ipython for me are the logging capabilities, advanced history and de-bugging utilities – very cool! Ipython has been designed to work well with the matplotlib, thus allowing production and manipulation of nice looking graphs and charts within an interactive python environment. I’m still very much learning this (and I’m a hack programmer), but here is some fun data crunching based on a USDA food database wrangled into into Json format by Ashley Williams.

Ok, I think it would look better on one chart. Here is the top 50 protein containing foods with their fat content as part of one graph.

Note that I chopped the y-axis labels off, doh, no problem just modify them in real time using the slider editor!

Not too bad at all for a newbee. Hopefully by next post I will feel a little more comfortable to share some hard learned lessons on data manipulation using pandas.

My Review of Python for Data Analysis

Data processing heaven

By dave_au from Rochester on 7/7/2012


4out of 5

Pros: Well-written, Easy to understand, Helpful examples

Best Uses: Intermediate

I’ve only made it through the first few chapters, and the book is not complete, but I have been getting regular updates which is great. So far though it is well written and as far as I can tell there are very few technical errors.

It starts with a “Whetting your appetite” chapter where the author takes you on a journey through the possibilities of data analysis in Python; its a lot of fun.

The chapter on getting the most our of IPython has made this book worth the purchase price alone, its my new editor and I love it.

Highly recommended for Bioinformaticians or biologists who have to use R, but would prefer to stick with their beloved Python as much as possible.


Why perl for Science should die!

I think its fair to say the most popular programming/scripting language for Bioinformatics is Perl. My language of choice Python would certainly be coming in second. I reallly really don’t like perl, and I think it is a poor language for Science in general, here’s why!

1. Its hard to read

This is the biggest issue for me. Well commented Perl is actually pretty fun to read, its a punchy language so you don’t have to do too much scrolling to work out what is going on. However, the key point of that last comment is the word “comment”, poorly commented Perl scripts are a nightmare to read. And frankly, most of the scripts that I have read fall into this latter category. Turnover of people in Science is pretty high, so the idea using a language that makes it really hard for other people to just pick up where the other person left of is such a silly idea. The other key thing is that Science should be about transparency, and Perls silly syntax just adds another layer dust that hides mistakes and errors in logic. This is latter point is especially important when bioinformaticians are working with biologists who don’t know a computer language, essentially no oversight or review because the language is so god dam hard to understand unless you know the syntax.

Making MrBays run on a mulitcore machine

So what makes the excellent phylogenetic program MrBays even better, multicore support!

MrBays itself is pretty easy to install on a linux machine just by following the configure file notes, however I found it a little more tricky to get it to run in multicore mode. Others might find this useful, so I through I would add it to my blog

First download and extract a copy of MrBays and naviage to the /src directory in the terminal.

Install the required libraries
>sudo apt-get install mpich2 libmpich2-dev libmpich2-1.2 libreadline6-dev

Run the following to configure.
>./configure --enable-mpi=yes --with-beagle=no

As beagle is for graphics processors we want to turn that off for a normal PC like system.

Now in your home directory make a file called “.mpd.conf” add this line to the file but change the ‘secretword’ to what ever you like: “MPD_SECRETWORD=<secretword>”

Change the permissions so that only you can read and write
>chmod 700 .mpd.conf

Run the mpd in the background, it shouldn’t complain, but if it does do what it asks.
>mpd &

Now run the program on 6 cores (or how many you have available), stdout will be written to GT.txt, all this will run in the background due to “&”
>mpirun -np 6 mb trimmed_nex.txt > GT.txt &

You can check the progress by opening the output file or just typing:
>tail -f HGT.txt