SAM / BAM files explained?

Featured

Posted on February 28, 2015 by Dave Wheeler

Lets face it there aren’t too many HTS pipelines that don’t touch a SAM/BAM files somewhere along the way. The name SAM comes from the acronym Sequence Alignment/Map format and they are really nothing more than tab delineated text files describing mapping information for short read data. Of course with the millions of reads generated by next gen sequencers these days the files become really really big. That is where BAM files come in, these contain the same information as SAM files but are encoded in condensed computer readable binary format to save disk space.

If you open up a SAM file in a text editor (or use “head” in bash) the first part of the file typically contains lines beginning with AT (@) symbol, which represents the “header”. The header contains general information about the alignment, such as the reference name, reference length, MD5 checksum, the program used for the alignment, etc etc. Every line after the header represents a single read, with 11 mandatory tab separated fields of information. Lets see how this works with a really simple example.

What I’ve done is mapped four ‘pairs’ of reads to the alcohol dehydrogenase gene (my favourite gene) using bowtie2. The figures below shows the stranded nature of the read pairs as well as where they map on the gene.

I have set this up so that the reads fall into these three groupings.

Two pairs of high quality reads that concordantly map
A read pair where only one read is high quality
An orphan read (the R1 read is only 1 nt long aka trimming)

The next figure describes each of the mapping scenario’s using the colour encoding. The red set of reads behave as we would expect for paired end data.

Where the reads map and the different read scenarios

For the alignment I’m going to use bowtie2, all of the sequences are here if you want to do this yourself.

#index the reference and specify the prefix as reference
bowtie2-build reference.fa reference

#map the reads
bowtie2 reference -1 test_R1.fq -2 test_R2.fq &gt; adh.sam

####bowtie output###
Warning: skipping mate #1 of read 'HWI-7001326F:39:C7N3UANXX:1:1101:10000:62296/1' because length (1) &lt;= # seed mismatches (0)
Warning: skipping mate #1 of read 'HWI-7001326F:39:C7N3UANXX:1:1101:10000:62296/1' because it was &lt; 2 characters long 4 reads; of these: 4 (100.00%) were paired; of these: 1 (25.00%) aligned concordantly 0 times 3 (75.00%) aligned concordantly exactly 1 time 0 (0.00%) aligned concordantly &gt;1 times
    ----
    1 pairs aligned concordantly 0 times; of these:
      0 (0.00%) aligned discordantly 1 time
    ----
    1 pairs aligned 0 times concordantly or discordantly; of these:
      2 mates make up the pairs; of these:
        1 (50.00%) aligned 0 times
        1 (50.00%) aligned exactly 1 time
        0 (0.00%) aligned &gt;1 times
87.50% overall alignment rate

The “adh.sam” file can be opened with any text editor, but for most of this post I’m going to use the spreadsheet version just to easily work with all the columns. Note the header lines indicated with the “@”. In this case the @SQ relates to the reference and @PG relates to the program parameters. Each header line has fields in the format tag:value. For example, the @SQ line has SN (SN=reference sequence name) and LN (LN=reference sequence length). A full set of these header can be found in the SAM specs.

 
@HD	VN:1.0	SO:unsorted
@SQ	SN:gi|568815594:c99321442-99306387	LN:1890
@PG	ID:bowtie2	PN:bowtie2	VN:2.1.0

Below the header each read is represented by a line, it is worth nothing that even read “62296/1” which shouldn’t map since it is only 1 nt long also gets a line. Two other things to note are that firstly the reads can be ordered by reference coordinate, this is the default Tophat behaviour. Alternatively, they can be listed in pairs. Samtools can be used to switch between formats using the “sort” command and why does this matter? Some software needs to know how the reads are sorted, for example htseq-count expects reads ordered in pairs because read pairs can map different distances away from each other and finding them in the file increases memory requirements.

OK the fist four column. The QNAME (seq name) and RNAME (reference name) are pretty self explanatory. The FLAG uses information dense bit-wise encoding containing information on the alignment, such as the strand, if the pair mapped, if this if the first or second pair. If your dumb like me you’ll be thankful for the Broads has a nifty website that translates these numbers into plain English. For example, the 83 represents “read paired”, “read mapped in proper pair”, “read reverse strand”, “first in pair”. The 163 is “read paired”, “read mapped in proper pair”, “mate reverse strand” and “second in pair”. I think that makes sense. For reference the other two numbers are 137 (read paired, mate unmapped) and 69 indicating an unmapped read pair. Very cool. The final flag here POS corresponds to the alignment coordinates. We can confirm the reads are ordered R1 then R2 based on the mapping coordinates (see the figure at the top of the page).

The next flags worth looking at are the MAPQ and CIGAR strings. The mapping quality is represented as a log transformed probability of mapping to the wrong position. Interestingly, the low quality (fastq quality “$”) has the same mapping score as the other high quality reads. I gather this is because the read really can only go one place when there is a 100% match and the reference is only short like this. The CIGAR string starts with a letter, in this case “M” for “alignment match” and 50 indicating the length of the alignment. The “*” indicates that no information is available. PNEXT is the position of the mate pair and TLEN is the observed template length, in other words the length of the sequenced fragment represented by the pairs.

Finally, we have sets of tags that provide other information on the alignment. Things like the number of ambiguous bases or gaps etc. Checkout the documentation of your aligner to find out what each tag defines if you are interested.

Make an android app in minutes with the App Inventor 2

Featured

Posted on December 10, 2014 by Dave Wheeler

The App Inventor 2 (AI2) is a really cool bit of software from Google and [now] MIT. It contains two GUI tools to make creating Android apps a “relatively” easy task. The first tool is a designer that helps create your screen layouts, pretty much like the layout designers that come with the Android SDK. The really cool and unique part is the “blocks” designer, which helps you create code by moving object blocks around (rather than typing java). These GUI’s dramatically speed up development, whilst also providing an excellent framework to teach people how to code in a fun and accessible way (ie no language syntax).

The following quick run through of a very simple conference app that I’m using for a demo at our teaching symposium. The goal is to do this demo in 10 min, so really this is bare-bones stuff! If you want more I highly recommend the book App Inventor, edition 2, which covers AI2 in much more detail through a series of fun (and useful) projects.

Its worth pointing out now that AI2 has some additional cool features. Firstly, its all on the cloud so you don’t need any software except chrome or firefox (ha ha IE is NOT supported). Secondly, if you are on WIFI, by installing a simple application on your phone you can test the app instantly by scanning a QT code from the computer screen. If you have ever created an app with eclipse and used the emulator for testing you will be delighted by how much simpler and quicker this process is, its super cool – really!

The conference app is just a listview of names and session times that are clickable. If a item is selected a photo and brief bio pops up for that session. No fancy formatting at this stage, anyway.

1. Go to the AI2 page and login with your google account

The first screen will recommend you install some additional applications to allow easy testing on your phone. By far the simplest is to choose option one, as long as you are sharing a wifi network with your computer and phone. For the rest of this guide I’ll assume this is how you will test your app.

2. On the menu goto “Projects” – “Start new project” and call it something informative

After a few seconds the designer window will appear. The designer page allows you to layout all of the elements your require for your app. The left hand “palette” contains elements for layout as well as user interaction widgets. These elements can be dragged onto the middle view pain or alternatively if they are a invisible element (a database for example) to the bottom of the viewer screen. The “components” pane allows you to customise each element and finally the properties pane lets you modify size, layout, and text aspects.

3. Re-title

Just to to see how simple things can get, lets give a more informative title. All we need to do is change the “title text” in the property pane.

4. Add the list-view

Ok the meat and potatoes of the app, the listview will be populated with speaker names and times that are clickable. Just drag the listview icon from the palette window to the viewer.

5. Adjust the placement and re-sizing to suit your tastes

We can now adjust the listview properties, such as the width and height so that this element fills the available screen space.

6. Go to the blocks page and start creating the logic for your app

Now the nuts and bolts (and the cool part). We have to populate our listview with the speaker names when the app is opened, as well as specify some behaviour when the user clicks on the element in the listview. For this work we will need to use the code blocks on the left hand side of the “blocks” screen.

7. Initialise by populating the list-view with strings once the screen is loaded

Clicking on the “Screen1” icon will bring up all the blocks associated with this choice. We want to drag the “when screen initializes” block to the work area. This “block” of code will execute when the app opens. Notice the gap in the middle, this is where we place the “do” part. In this case what we want to “do” is populate the listview with text for the speaker names.

To do this we click on the listview block and select the “listview elements” block, notice how this block will fit nicely inside the previous block. Notice also that the open jigsaw like parts at the end of this new block, this is where we add our new list that will hold the speaker string elements. I hope you notice this “feels” like coding nix with objects all the typing.

Clicking on the list block allows us to make a new list, we click this into our listview and now we populate the list with items that will be displayed.

So it should look like this (below). The little blue gear lets us increase the number of spots for elements to populate our list. For this app we will have four speakers for this section, so just drag the item block into the right “list block” twice to create two more spaces.

fig10

To actually populate the list we need some text. So we grab the empty text fields and drag them to the open slots on the list.

fig11

By typing between the quotes we can add our custom text. By right clicking on this first text element we can duplicate it for the remaining 3 spaces (or we could just drag 3 more empty elements across).

fig12

We modify each element to be a speaker name. Ok, now when the app opens the listview will be populated by string elements found in the list. Next we need to specify some behaviour when the user clicks on the speakers name.

At this stage our basic app should be “working”, as in the listview should display, although nothing will happen when we click on each item. Now this is where the simple testing functions of AI2 come to the fore. Just click on the build and then “provide QR code for apk”.

Once the QR code pops up just open up the App inventor 2 companion app and scan the code, its bloody brilliant!

Now we create an action once an element (speaker) in the list is selected by the user. As this is a listview action we click back on the listview icon and grab the “afterPicking” block; thats logical don’t you think?

One thing you might notice about these blocks, if you have coded before, is that they are very much like a normal object that we would specify using code syntax, its just its GUI based.

Our after picking behaviour will be opening a new screen that contains a photo of the speaker and a brief bio. So we need to setup a new screen to contain this new information. I’m being really lazy here and I’m just going to hard wire the page so that only one of the listview items actually works as expected (mine of course). However, in real life you would setup some logic that would ask for the index item that was clicked, and use this to send the user to the correct screen.
So we have setup the method to open the new screen (above), now we need the screen. We’ll create the “DaveScreen” (as specified by another purple text element block ) that will contain my mock bio and a great photo. To do this just click the “add screen” button at the top of the screen and re-name it. fig18 Back in the designer we can populate this new screen with a photo and some text, using a vertical alignment is simple just drag it onto the window and specify its attributes in the right hand panel. fig19 Now Drag an image icon onto the panel and then by clicking on “image1” in the components plane we can upload an image and resize it using the properties. You really can’t get much easier than that really, can you!

Wow, what a good looking bloke, finally, we create a label and populate it with text in the blocks screen. We could do this in the properties section of the designer page, but since I’m feeling guilty for hard coding the list view, I thought I would use the blocks to set the text.

Thats it. The app should provide a list of speakers, and when you click on my name (any name actually due to the hard wireing) the new page should open with my bio and a photo.

You can download the files associated with the app from here. These include the APK’s that can be installed on any android phone as well as the *aia files that can be imported into AI2 if you want to play with the code. I’ve included a slightly improved (although still broken) version that replaces the hard-wired listview actions with a call to specific speaker pages (that don’t exist but could be easily made based on the above information).

Install Arch Linux on virtualbox – the nuts and bolts (pt1)

Featured

Posted on July 25, 2014 by Dave Wheeler

Installing Arch is incredibly satisfying (maybe thats code for frustrating) as it really does introduce you to the flexibility offered by the modular nature of Linux. And with virtualisation software like virtualbox, we don’t have to worry about turning our computer into a fancy doorstop while we madly google a solution to that frozen black screen.

With the excellent Arch beginners guide installation wiki things really are not that difficult, however, there certainly are some got-yas (hair pulling), especially when installing inside a VM. Hopefully this step by step guide will help someone out.

Step 1. Download the latest Arch ISO, remember to use a torrent to save the arch servers bandwidth (and its a good FU to the MPAA).

Step 2. While you’re waiting, lets setup the VM. Open up virtualbox and click on ‘new’.

Typing Arch into the name box should populate the other options.

Set the memory to something sensible (1gb), remember you need to save some for the host, so stay away from the red section of the sliding bar.

We want to create a virtual hard drive now (the 8gb default is probably a little too frugal).

Select the top option on the next screen (VDI) then allow it to be dynamically allocated. Even though the drive will dynamically allocate space, we need to set the maximum size, at 8gb (the default on my machine) we will probably run out of space pretty quickly in real world desktop use. So lets make it something sensible. An alternative would be to do a minimum base install and then hook-up a shared folder to the host and keep data there (more on that latter).

Press “Create” and now we can setup the options on our new virtual machine. On the [general][advanced] tab setup bi-directional clipboard and file sharing.

vm4

Also tick the “show top of screen” or you’ll get driven crazy by the dropup menu destroying your life! On the system setting change the processors to suite your system. Display setting, might as well enable 3D. Now for the storage settings we can virtually stick our downloaded ISO into a imaginary CD-drive.

Click the ‘add CD’ button and point the file option to your freshly downloaded Arch ISO. The will allow the VM to boot from the ISO rather than; well nothing! Finally, lets setup a shared folder with the host. Select the folder path and click the auto-mount check box.

Now we are ready. Click “start” and we should get a boot screen, press enter (boot into Arch) and after a few seconds we should have a flashing cursor.

Bingo! Now I’m pretty much following along with the beginners guide.

Step 3 – Terminal action!

Firstly, I’m assuming you are using a standard keyboard layout (if not follow the guide). Secondly, lets check that the internet is working. A great thing about using a VM is that we don’t have to fiddle with wireless (it can be a pain) as the host just provides the link as if it was wired (remember for this to work the host must be connected to the internet). Lets check by pinging google, the “-c 3” flag just says to do it three times, if you forget it just [ctl][c] to kill ping.

ping -c 3 www.google.com

If you get a path not found error, check the host connection (then check the beginners guide if that doesn’t work)! Otherwise you’ll exchange some packets with google to verify everything is working dandy.

Now we use the “lsblk” command to check the name of the disk we are going to partition, in my case its “sda” (this is important), note I can tell that because its 20Gb big (remember we set that before).

So assuming your drive is sda we’ll use fdisk to make our new partitions.

 fdisk /dev/sda

fdisk will present us with some options. **NOTE** if at any time you want to back out just hit “q”!

Here I’m setting “n” for new partition table, “p” for primary, “1” for 1st partition, then accept the default for the start sector, finally “+10G” (this is all one word, it just slipped over the screen on the screenshot) to specify that we want this first partition to take up half our drive, we could probably make this less ( “+6G”?) if you wanted more space for the home. Now the home partition. Same as before really, but this time we specify “2” for the second partition and accept the defaults for start and end sectors to take up the remainder of the drive.

Now we can preview our new partitions with “p” and if we are happy write the table “w”, note that the partitions are called “/dev/sda1” and “/dev/sda2”, make note of this for latter. I’ve got plenty of RAM so I’m not going to worry about a swap partition (-:.

Next we format our new partitions as ext4 and then create a mount point for root and then mount a home directory on the other partition (remember to change sdx depending on your information above).

#format the drives
mkfs.ext4 /dev/sda1
mkfs.ext4 /dev/sda2
#mounting!
mount /dev/sda1 /mnt
mkdir /mnt/home
mount /dev/sda2 /mnt/home

Finally lets install the base system.

pacstrap -i /mnt base base-devel

Accepted the defaults and “y” to install. Take note of some of the things that are being installed here, it really is the base system (ie base programs like “which” etc, all those little bash utilities you just take for granted).

Now we organise the boot partition.

genfstab -U -p /mnt >> /mnt/etc/fstab
#now check the table
nano /mnt/etc/fstab

This is what my boot table looks like (the $ signs at the end mean that there is more text outside the screen view, if you scroll to the right you should see the numbers 1 for the “/” partition and 2 for the “/home” partition.

Now its time to setup the base system ie set locale and make the internet persistent. For this we will be entering change root, which is a special environment (read about it here). Then we will change to files that contain information on our location and character set, the guide suggest we stick to UTF-8. In my case I’m in New Zealand, so I’ll uncomment that line in the file (note you can search in nano by using [ctl][w]) set that (note we only have one time zone in NZ). You can enable multiple languages here by uncommenting them, this would be handy if you are working on a system with multiple users, or you like to curse in foreign languages!

arch-chroot /mnt /bin/bash
nano /etc/locale.gen
##in the nano editor##
#........
#en_NG ISO-8859-1
en_NV.UTF-8 UTF-8
#en_NV ISO-8859-1
#........

Save this file with [ctl][o] and [ctl][x], then run the following to generate the locale, it should report your chosen setting.

locale-gen

Now that we have enabled the language we need to set the system wide settings in a new file called /etc/locale.conf that contains our chosen default system setting. In my case the command would be below, but if you were in the USA you probably use “echo LANG=en_US.UTF-8 > /etc/locale.conf”. All we are doing is echo’ing some text to a new file. Then we just export that setting (substitute your lacale).

echo LANG=en_NZ.UTF-8 > /etc/locale.conf
export LANG=en_NZ.UTF-8

The timezone and subzone files are in a folder called “/usr/share/zoneinfo/Zone/SubZone”, with “Zone” and “SubZone” being replaced by your region. For example, if you lived in Rochester NY (eastern US time), the folder you would point to would be called “/usr/share/zoneinfo/US/Eastern”. You can see below how I use “ls” to list the directories, then I create a simlink that to “/etc/localtime” (remember you probably have regions so your directory path will be one longer (as shown for the Rochester NY example).

ln -s /usr/share/zoneinfo/NZ /etc/localtime

Now we set the clock to UTC (Coordinated Universal time).

hwclock --systohc --utc

Then set our hostname (that will be seen on a network) to whatever we like (oneword). First we create a file called “/etc/hostname” containing the name, then we use nano to edit as second file (as shown), note the “DavesArch is a [tab] from the “hostname” string on that line (the one starting with 127.0.0.1).

echo DavesArch > /etc/hostname
nano /etc/hosts
##in the nano editor!##
#
# /etc/hosts: static lookup table for host names
#

#
127.0.0.1 localhost.localdomain localhost DavesArch
::1 localhost.localdomain localhost
# End of file

We are nearly there! Lets setup the network, before configuring grub and setting a root password. For the network we will use netctl. We can copy an example file from “/etc/netctl/examples” to a new filename “/etc/netctl/my_network”; the one to use is pretty obvious since we have a virtual wired connection via our host.

cp /etc/netctl/examples/ethernet-dhcp /etc/netctl/my_network

Next we edit this file, replacing the “Interface=eth0” line shown by the “ip a” command (see below).

Now just use nano to edit the file we just created and replace “eth0” with our interface (in my case “enp0s3”) from the ip a output.

netctl enable my_network

We will quickly set a root password, make it hard to guess and easy to remember (ha ha).

passwd

Now to setup the grub bootloader.

pacman -S grub
grub-install --target=i386-pc --recheck /dev/sda
grub-mkconfig -o /boot/grub/grub.cfg

exit
umount /mnt/home
umount /mnt/
shutdown

We use shutdown here so that we can eject the virtual ISO before we reboot, otherwise we will be back to where we started.

This time select the ISO and click the small minus sign icon at the bottom of this window to delete the ISO drive (otherwise we will boot back into the live CD)

Once our ISO is removed we can hit the start button again on Virtualbox, and fingers crossed we’ll boot into arch!

What now? Unless you are a terminal jedi we’re going to have to install some GUI tools. For me this was the most challenging part of the install especially with the added complication of working with a VM. Since we are already up to 1600 words I think I might take a break, but to get your desktop GUI up and running click here!

Pandas data analysis – New Zealanders and their sheep!

Featured

Posted on May 28, 2014 by Dave Wheeler

As an Aussie I cop a little bit of flack living in New Zealand. It helps that since I’m from the South of Australia I follow Aussie Rules footy, not this rugby shenanigans, so I don’t bother too much with the constant focus on how much better the All Blacks are relative to our poor Wallabies (Australia has won as many world cups as New Zealand – Take that!).

They make bloody good gum boots (multi-purpose - inside joke (-; )

Kiwi’s have bloody good gum boots; they have to do more than keep the mud off! – inside joke (-;

That said as a bit of fun I thought I would do a post on the Pandas python module using data from StatsNZ. It has too be about sheep right! The actual spreadsheet I downloaded from StatsNZ and used in this is here.

I’ll be using pandas, ipython, and matplotlib to create some graphs to show the decline of sheep in NZ. In the next post I’ll try and work out why (yes I know you could find this out in 3 seconds with google, but thats no fun).

First we will read the file using pandas and make a dataframe. Pandas can import in all kinds of file formats, with excel you need to include a sheet name (that little tab at the bottom of the excel sheet).

#imports
import pandas as pd
import matplotlib.pyplot as plt
#sheetname required for xlsx
data_file = pd.read_excel('livestock1.xlsx',sheetname='livestock')

#make a dataframe
data = pd.DataFrame(data_file)
data.head()

Out[8]:
                                                 1994          2002      2003  \
Total beef cattle                             5047848       4491281   4626617
Calves born alive to beef heifers/cows        1262522       1083485   1079334
Total dairy cattle                            3839184       5161589   5101603
Calves born alive to dairy heifers/cows       2455975       3225238   3115897
Total sheep                              4.946605e+07  3.957184e+07  39552113
-----
                                             2010      2011      2012
Total beef cattle                         3948520   3846414   3734412
Calves born alive to beef heifers/cows     901258    901375    827749
Total dairy cattle                        5915452   6174503   6445681
Calves born alive to dairy heifers/cows   3640914   3884257   3879543
Total sheep                              32562612  31132329  31262715

So we made a dataframe, the head method acts like bash head in that it shows the start of the frame rather than the whole thing. Currently the years are columns and the stock type are rows, lets flip the table, which is super easy!

 #we really want the dates as the index
data = data.T
data.head()

-------
     Total beef cattle Calves born alive to beef heifers/cows  \
1994           5047848                                1262522
2002           4491281                                1083485
2003           4626617                                1079334
2004           4447400                                1013893
2005           4423626                                1018730

Now that we have the years as rows, actually an index, it will be much easier to do our plotting. But the column names are overly informative, as in long, lets shorten them.

#the column names are pretty long, lets fix that now
data.columns

Out[13]: Index([u'Total beef cattle', u'Calves born alive to beef heifers/cows', u'Total dairy cattle', u'Calves born alive to dairy heifers/cows', u'Total sheep', u'Total lambs marked and/or tailed', u'Total deer', u'Fawns weaned', u'Total pigs', u'Piglets weaned', u'Total horses'], dtype=object)

sub_data = data.loc[:,['Total beef cattle','Total dairy cattle','Total sheep','Total deer',
'Total pigs','Total horses']]
sub_data = sub_data.replace('..',0)#replace .. with 0
sub_data = sub_data.astype('float')
sub_data.head()

Out[17]:
      Total beef cattle  Total dairy cattle  Total sheep  Total deer  \
1994            5047848             3839184     49466054     1231109
2002            4491281             5161589     39571837     1647938
2003            4626617             5101603     39552113     1689444
2004            4447400             5152492     39271137     1756888
2005            4423626             5087176     39879668     1705084
2006            4439136             5169557     40081594     1586918
2007            4393617             5260850     38460477     1396023
2008            4136872             5578440     34087864     1223324
2009            4100718             5860776     32383589     1145858
2010            3948520             5915452     32562612     1122695
2011            3846414             6174503     31132329     1088533
2012            3734412             6445681     31262715     1060694

Great! Now to make the plot easier to look at lets divide the dataframe by 1 million.

#now divide by a million
sub_data = sub_data.divide(1000000)
#first plot everything
sub_data.plot(legend=True)
plt.xlabel('Year')
plt.ylabel('Total stock (millions)')
plt.title('NZ farm stock')

Yes that is correct, back in the 90’s there were 50 million sheep in NZ, not bad for a country with a population of ~3 million people. Baa. But their numbers have been in serious decline since then, replaced by their bigger brothers the cows.

Lets face it, NZ is all cows and sheep, lets just look at that data.

#lets just plot cows and sheep, that being the first 3 columns
cow_sheep = sub_data.ix[:,[0,1,2]]
cow_sheep.plot(label=True,title="Numbers of cattle and sheep in NZ")
plt.xlabel('Year')
plt.ylabel('Total stock (millions)')

The data has meat cows and milk cows as separate columns, its easy to combine them.

#a milk cow and meat cow are still cattle
cow_sheep['Total cows'] = cow_sheep.ix[:,0] +  cow_sheep.ix[:,1]
cow_sheep.ix[:,2:].plot(legend=True,title="Total cattle and sheep in NZ")
plt.xlabel('Year')
plt.ylabel('Total stock (millions)')

So the country is cowafying, mostly dairy replacing beef cattle. Although the increase looks slight, basically its 25% up! That’s a lot of grass, cow shit, and methane!

Now lets look at the change in the numbers of each stock since 1994 (the start of our data). We do this by passing all the data as well as all the data minus 1994 to a function that handles the calculation. Pandas handles this all in the back end and parses the data in the frame through the function, nice.

def percent(dataframe,d1):
    '''calculate percent change relative to first column (1994)'''
    a = 100*((dataframe.ix[0,:]-d1)/dataframe.ix[0,:])
    return 0-a

#pass the entire data frame to this function
percent_data = percent(sub_data,sub_data.ix[0:,:])

percent_data.head()

Out[38]: 
      Total beef cattle  Total dairy cattle  Total sheep  Total deer  \
1994           0.000000            0.000000     0.000000    0.000000   
2002         -11.025827           34.444950   -20.002034   33.858009   
2003          -8.344764           32.882482   -20.041908   37.229441   
2004         -11.895128           34.207998   -20.609926   42.707754   
2005         -12.366101           32.506699   -19.379727   38.499840   

      Total pigs  Total horses  
1994    0.000000      0.000000  
2002  -19.100637     11.811093  
2003  -10.766476     18.504488  
2004   -8.072078     13.376472  
2005  -19.230733   -100.000000

#the years are plotted as floats, its easy to convert them!
percent_data.index=percent_data.index.astype(int)
#figure 4
percent_data.index=percent_data.index.astype(int)
percent_data.plot(kind='barh')
plt.title('Percent change in stock in NZ since 1994')
plt.xlabel('Percent change since 1994')
plt.ylabel('Year')

I really want the graph the other way around, so lets re-index. Also, lets get rid of the nags, they look funny because the frame was missing some data. More proof that Far Lap was an Australian horse.

horseless = sub_data.ix[:,:-1]
horseless_per = percent(horseless,horseless.ix[0:,:])
#flip the axis
horseless_per = horseless_per.reindex( index=data.index[ ::-1 ] )
horseless_per.index=horseless_per.index.astype(int)
horseless_per.plot(kind='barh')
plt.title('Percent change in stock in NZ since 1994')
plt.xlabel('Percent change since 1994')
plt.ylabel('Year')

Finally, its silly having the 1994 data as its zero, but it was a nice sanity check to make sure the % function was working correctly. But lets get rid of it now by just plotting part of the frame.

horseless_per[:-1].plot(kind='barh')
plt.title('Percent change in stock in NZ since 1994')
plt.xlabel('Percent change since 1994')
plt.ylabel('Year')

The ‘barh’ is just bar graph horizontal.
So it looks like there was a venison craze in the early 00’s, but mostly just more and more dairy.

#save the file
horseless_per.to_csv('stock_num_diff_average.csv')

Ok, so although there are still a LOT of sheep in NZ, it really is the country of the cow now. What might be driving that? Lets look at some commodity prices in the next post!

Setting up a Lubuntu virtual machine with virtual box [and get guest additions working]

Featured

Posted on January 25, 2014 by Dave Wheeler

BTW – Learn more about your linux system; Try my how-to on Arch Linux via virtualbox page

Disclaimer – this is totally safe, but please backup your computers before trying it, I take no responsibility for anything that might happen

Updated 2015 and tested – let me know if something is broken

If you haven’t tried Linux then you are really missing out. Although I run windows in a virtual machine, I’m working my way to weaning myself of that dark beast (my colleagues doing the same would certainly hasten the split).

Although I use Xubuntu on my work machine, Lubuntu is another favourite of mine due to its small footprint and minimalist approach. Both are great because they don’t have that abomination that is Unity now shipping with Ubuntu.

For a python class that I am developing I wanted to put together a how-to install Lubuntu in a virtual machine using virtual box. At work I use vmware workstation to run windows as a vm and it works pretty well. However, I had trouble using the free player, mostly due our wonderful proxy creating problems with the download and install of vmware tools (there were display problems too for some reason)! Anyway, long story short I settled on virtual box (dam you Oracle for stealing our Americas cup).

In this example I’m installing a linux vm on a windows 7 host.

(Note to self: tur-cache1.massey.ac.nz and 8080 proxy)

Get virtual box from here and install it on your host system.
Get the Lubuntu from here, if you can use the torrent to save the foundation some $
Set-up your new virtual machine by starting virtual box.

First set up the name and the OS type. The name can be anything, but best call it “lubuntu” (no quotes). Type is “linux” and the version is either ubuntu64 or ubuntu32. Click OK or Next. [Note: Most modern machines are 64 bit, use that unless you have something old so you can access more ram].

Give as much as you can, don’t forget to leave some for your host, but 2gb is a good start on my 8gb system (but in real life I might give it 3-4 gb). This can be changed latter so leave it as the default if worst comes to worst.

STOP! Although dynamic resizing is probably a good way to save disc space – I want speed! but for this course just use dynamic sizing ie not what I have shown below!

This is only a demo, normally I would make the drive considerably bigger (>20gb) to allow for software installs. But you can get away with 10 Gb as a minimum. You can share folders on your host machine once the system is up and running and this is where I would store all my important data. As an aside, I normally don’t back up my vm’s and if you do the same then, yeah, don’t store anything on the virtual machine that is important!

Click on the headings to access the different settings, its worth looking as the defaults are likely to make your machine run sub-optimally. OR JUST LEAVE IT FOR NOW

Accept the defaults if this is all too confusing!

Hovering the mouse over the warning lets you know what is wrong, here it says I need to select another option if I use duel core processors, it also says it will fix it for me so I didn’t have to do anything. Display issues are a problem for me when using vms, so I’ve given a bit of RAM to the video to hopefully minimise the chance of problems. Accept the defaults if this is too confusing.

Lets set up the shared folders with the host machine now, make it mount on startup so it acts as a removable drive once the vm is running.

Now the machine is ready to go, but first we need to “insert” the ISO image into the virtual CD-drive. Click on the “Storage” settings option and select the little disk+ icon as shown below, then when prompted navigate to the ISO using “choose disk” .

Click on the newly created drive and select “live CD” option.

With the ISO file sitting in the virtual drive its time to boot up our new computer!

Here we have the option to try out the system first using the live CD. This does not install anything, but lets us know if we likely to run into compatibility issues. Its only a vm, so I’m just going to install.

Be patient here as it might not look like it is doing anything, there even might be some error messages, sit tight and hopefully you’ll see the screen below!

Hopefully you see the image below will load, or a new lubuntu desktop will load. If the desktop loads double click on the “install” icon on the desktop and you should see the image below (and next time follow instructions ie click the live CD option (-; ).

Non-open codecs like mp3 etc can be installed here and you can download updates on the fly during install. These options are up to you. But if you don’t have Internet leave the “download updates while installing” boxes unchecked.

If this were a normal install (dual boot for example) I would start to sweat at this point, whilst double checking my back drives (you have one right?)! If it goes wrong here you end up with a blinking cursor, where once you had a computer full of data and software (backup backup backup). But……The great advantage of VMs is that it doesn’t really matter! If things go wrong, its only a virtual computer after all, delete it and just start again (that said: backup backup backup). Also, as you will be running the vm inside a host, there is no need to worry about the hassle of setting up a dual booting (though its not actually that hard to do). So lets go ahead and format our virtual drive.

Fu*k the NSA, encrypt if you can! Click continue when asked unless you want to set up more complex swap and partition options.

Good to see NZ just made it onto the map! Choose the US keyboard (yes this is for you know who) unless you got your computer from some strange dodgy foreign online store.

The parts the matter for you here are your username and password (no spaces in username or password). Choosing NOT to log in automatically is certainly recommended from a security standpoint.

If this were a real install on a live CD you would need to eject the CD otherwise it will just boot back to the live image again (see below).

we need to “eject” the Lubuntu startup/install disk from the virtual CD, otherwise the VM will just boot from that again [and thats not what we want]. So goto VB settings, storage, then right-click “remove disk from virtual drive”. Now you should be good to press enter and reboot.

Great!

Hopefully the display issues are minor. Next we will install guest tools that should make things run more smoothly.

Neat Trick t: When you have your VM up and running make it full screen then use the switch desktop [ctl][alt][arrow] to change between the host and the VM. This is so much easier than minimising the VM all the time; you end up with two computers for the price of one!

——————-SKIP THIS SECTION IF YOU ARE NOT BEHIND A PROXY—————–

Dealing with the wonderful lovely helpful joyful PROXY!

Skip this if you are not behind a proxy (nearly everyone). Open a terminal using [ctl][alt] t (the first part is just for the proxy (insert your usrname password in place of ‘password’), ignore if you are just using a normal network.)

#the fu*king proxy, ignore if you live in a normal universe 
#make a new file, copy some text into it, then send this to where apt looks for settings
echo 'Acquire::http::Proxy "http://Username:Password@tur-cache.massey.ac.nz:8080";' > ~/Desktop/99proxy
sudo cp ~/Desktop/99proxy /etc/apt/apt.conf.d

But the fun doesn’t stop there, when you get home you have to disable this option

sudo nano /etc/apt/apt.conf.d/99proxy

#to save click [ctl][o] then [ctl][x]

When nano opens comment out ie add a # to the start of the “Acquire” line.

#Acquire::http::Proxy "http://Username:Password@tur-cache.massey.ac.nz:8080";

You have to undo this when you get back to the work proxy environment again”, for the love of the flying spaghetti monster!!!

Now to get copy paste working, full screen, shared folders, etc

We are going to install guest additions which helps copy and paste between the host and the linux system work, as well as display issues and shared folders etc. VB comes with its own package for helping with this, but the opensource community has its own version, we’ll use that. Open a terminal and type (copy and paste is unlikely to work yet) (via see (http://askubuntu.com/questions/22743/how-do-i-install-guest-additions-in-a-virtualbox-vm and https://forums.virtualbox.org/viewtopic.php?t=15679).

sudo apt-get install dkms build-essential linux-headers-generic virtualbox-guest-x11

Hit “Y” when prompted and download off the internet

Now reboot

sudo reboot

If the above worked you should be able to go to full screen mode [view][switch to full screen] — if that works yipee your done (but see optimisation steps below)!

———————————————-skip if things look OK——————————————-

If you are still having problems you can try this alt way

From the virtual box menu at the top or bottom of the screen (if the menu is hidden just put your mouse at the top or bottom and should appear), choose DEVICE the option “install guest addition”. (see bottom of device drop down).

A box should pop up, click open or ok. If it says something like problem mounting, GOOGLE IS YOUR FRIEND, I googled this problem and this looks like a solution http://askubuntu.com/questions/80341/unable-to-mount-virtualbox-guest-additions-as-a-guest-win7-host

Now! Open a terminal window using [CTL][ALT][DEL] and type.


cd /media/[username]/VBOX*

ie if your username is ‘fred’ it will be:


cd /media/fred/VBOX*

Type “ls” and make sure you can see a file called “VBoxLinuxAdditions.run”. IF so then use sudo to install.


sudo ./VBoxLinuxAdditions.run

Now reboot and hopefully any display issues you may have are fixed! This part below is to get the Internet working for apt-get.

*************************Final optimisation*************************

Look and feel

1) USE FULL SCREEN – it drives me crazy how people leave the VM as a window, you go to click something and all of a sudden you’re back in some sucky OS like windows! Just make the screen full screen and use [right ctl][F] to graciously switch between the host and the VM.

By default the settings “options bar” are hidden (click the pin if you want to lock it in place), move the mouse to the bottom middle of the screen and they should unhide themselves. Then click “view” “switch to full screen”.

2) Move the settings bar to the top of the screen so you don’t keep accidentally bringing it up when you go to the bottom of the screen. Goto the setting bar again, click “machine” “settings” in the general section click “advanced” and “show at top of screen” (note this will only start working when you shutdown the VM, then close then reopen virtual-box)

Functionality

1) Copy and paste, options bar, “device” -“shared clip-board” – “biodirectional”, do the same with “drag and drop” also in the “device” section.

2) shared folders with host (via http://ubuntuforums.org/showthread.php?t=2083709). Open a terminal and type (replace dwheeler with your username)

sudo adduser dwheeler vboxsf

This will only work after you reboot the guest OS!

The shared folder will be in “/media/XXXX”, if you type “/media/” into the file-manager it should show up. Once you are their click “bookmarks” and add the folder for quick reference.

Last update Feb 2015

VIM shortcuts

Posted on November 19, 2015 by Dave Wheeler

Basic navigation


h j k l - LEFT DOWN UP RIGHT
2h - 2 steps LEFT
H - home (top screen)
10j - 10 steps DOWN
J - join this line to the line below
{ - UP a section 
} - DOWN a section
o O - open line BELOW open line ABOVE
b - beginning of current word
w - beginning of next word
e - end of current word
$ - end of line 
^ - start of line
H - TOP of screen
L - BOTTOM of screen
v - select text
V - select line
v3j - select 3 lines DOWN

Modifying text


i - insert
a - append
dh - DELETE left
dl - DELETE right
dd - DELETE line
dw - delete word
D - DELETE to end of line
x - DEL
X - BACKSPACE - INDENT line LEFT (use visual for block) 
>> - INDENT line RIGHT  (use visual for block)
p - paste after
P - paste before
3p - paste 3 copies
yy - yank whole line
yw - yank word
y$ - yank to end of line
4yy - yank four lines
Y - yank whole line
J - join current line to next

Save Exit and help


K - HELP (bash, req select word)
ZZ- save and quit
ZQ - quite without saving
z - save

SPECIAL


[ - enter command mode
, - leader key
v - visual
: - enter command mode
i - insert mode
a - append mode
:tabnew - open new tab
,n - left tab
,m - right tab
/ - search # press enter, then n to repeat, N repeat up 
:s/old/new/g - globally change

Additional/Plugins


F2 - paste code blocks from external (press again to end)
n - remove highlight last search
,e - quit
,E - qa! quit all windows
ctl j - move to lower window
ctl k - move to upper window
ctl h - move to left window etc
,s - sort selected
f - fold code block (again to undo)
F - fold all (again to undo)
ctl space - auto complete
shift p - filesearch

Git


:Gstatus - status / stage by "-"
:Gcommit - message 'wq' save
:Gmerge

Cool combinations


cH, dH - copy/delete to top of screen
d3+ - delete line + 2 lines below
y3+ - copy lien + 2 lines below
d/test - delete all words 'test'

Insert breakpoint and ipdb PDB bit…

Setting up VIM as a python IDE

Posted on November 18, 2015 by Dave Wheeler

If you are looking for a lightweight but feature rich editor you can’t go past VIM! Sure those keyboard shortcuts are difficult remember and it’s frustratingly hard trying to stop reaching for that bloody mouse (the mouse will actually work in this set-up BTW). However, what I have found is the best way to learn VIM (I don’t think you ever really learn vim) is to make it useful enough that you want to start using it; this helps build up the muscle memory required to become efficient [enough]. The ultimate payback is that with practise you can become really really efficient relative to a user stuck with a mouse and complex GUI. The mouse is so 1980’s anyway 😛

vim_terminal_example1

So this post is about making VIM useful! The steps are based around Ubuntu/Debian Linux but should be adaptable for OSX and other NIX distros. Windows users, like yourselves I have no idea (-;?

Of course nothing I ever do is original and this especially goes for this post which borrows heavily from an excellent (but old) PyCon Asia talk by Martin Brochhaus. The .vimrc and much of the instructions are taken from the talk so feel free to watch it on youtube. I’ve also tried to update some of the info and stick to the minimum to get you coding away ASAP.

Section 1: Basic VIM install

These steps will clone the current version of VIM from source and enable additional features. If you already have a copy installed (ie vim starts when you type ‘vim’) I would first fun ‘apt-get remove vim’ (or your equivalent). First dependencies and some tools.

sudo apt-get update
sudo apt-get install mercurial curl python-pip flake8
sudo apt-get build-dep vim

#install python packages
sudo pip install ipdb
sudo pip install jedi
sudo pip install flake8

As we are going to use a bin folder in our home directory we need to put this location into our path so that our new version of vim will run from the command line.

#add a home bin dir to your path
nano ~/.bashrc

#add the following to the end of the file
if [ -d $HOME/bin ]; then
PATH=$PATH:$HOME/bin
fi

#save and exit

Now we make the ~/bin dir as well as an ~/opt for the vim install. Eventually we will use a simlink from ~/bin (now in our path) to the ~/opt folders in our home directory, the -p flag says ‘don’t complain if this dir is already there’. The final command reloads your bashrc without you having logout. Cool.

mkdir -p ~/bin
mkdir -p ~/opt
source ~/.bashrc

Now to install VIM.

cd ~/Desktop
hg clone https://vim.googlecode.com/hg/ vim
cd vim/src
./configure --enable-pythoninterp --with-features=huge --prefix=$HOME/opt/vim
make
make install
cd ~/bin
ln -s ~/opt/vim/bin/vim

#the following should return /home/YOUR_USERNAME/bin/vim
which vim

Great!

Section 2: The vimrc settings file

Now we need a “.vimrc file”. This file will contain our customisations (the dot says I’m a hidden file). You can get mine from here (download to your desktop or copy contents into a blank txt file). It is worth look at this file in a text editor as it contains some information on the key short-cuts and the plugins we are going to use (BTW in .vimrc files a single quote ” indicates a comment (aka #) a double quote “” indicates a bit of code that can be uncommented to enable something). The next bit of code simply copies the file to your home dir and the gets a custom colour scheme (mostly just to show how to do it).

mv ~/Desktop/vimrc.txt ~/.vimrc
mkdir -p ~/.vim/colors
cd ~/.vim/colors
wget -O wombat256mod.vim http://www.vim.org/scripts/download_script.php?src_id=13400

Section 3: Getting set-up for the plugins with pathogen

This next part is important. To manage plugins we will use a bit of kit called pathogen. The plugins can then be installed (mostly by a git clone) right into the “~/.vim/bundle/” (this will result in a folder structure like this: .vim/bundle/plugin-name) and pathogen will handle everything for us – awesome! Most plugin developers set-up their folders to work nicely with pathogen to make life easy.

mkdir -p ~/.vim/autoload ~/.vim/bundle
curl -so ~/.vim/autoload/pathogen.vim https://raw.githubusercontent.com/tpope/vim-pathogen/master/autoload/pathogen.vim

Section 4: Installing plugins

The first plugin we will use is called powerline and it adds features and makes a better looking status line.

#install powerline plugin
cd ~/.vim/bundle
git clone git://github.com/Lokaltog/vim-powerline.git

The next plugin allows code folding to make it easier to look through long blocks of code. Its simple to use, just type f to collapse a section of code of F to collapse it all. Type again to reverse the folding – sweet as.

# install folding flugin us f to fold block of F to fold all
mkdir -p ~/.vim/ftplugin
wget -O ~/.vim/ftplugin/python_editing.vim http://www.vim.org/scripts/download_script.php?src_id=5492

The next one installs ctrlp which allows nice fuzzy file searches from directly inside vim.

#install ctrlp
cd ~/.vim/bundle
git clone https://github.com/kien/ctrlp.vim.git

Install the jedi plugin which allows awesome auto-completion of commands and imports using [ctl][space].

#install jedi plugin
git clone --recursive https://github.com/davidhalter/jedi-vim.git

Install syntastic is another awesome plugin that does syntax checking and will check your code for compliance with PEP8.

#install syntastic
git clone https://github.com/scrooloose/syntastic.git

Forget tabbing into the terminal to stage/commit/merge etc just install git support with fugitive.

# install git support
cd ~/.vim/bundle
git clone git://github.com/tpope/vim-fugitive.git
vim -u NONE -c "helptags vim-fugitive/doc" -c q

So that is it! Your editor should look something like this:

vim_terminal_example2

Well that’s the install. Next post will be the shortcuts I use all the time and some that I want to learn as a bookmark for myself. Otherwise checkout the links to the plugins for detailed descriptions of what they do. But now you should be up and running with a pretty nice looking IDE!

Install Arch Linux on virtualbox – the nuts and bolts (pt2)

Posted on July 27, 2014 by Dave Wheeler

**This continues on from the previous post that can be found here**

Now that we have our base system installed, its time to add some tools that will give us a nice GUI desktop. But first off we will setup sudo so we can stop being root. First we’ll create a user. This first command creates a home directory called “dwheeler” using the -m flag, adds this user to the administrator group (wheel) with the -G flag and links us to bash. Enter a password after the passwd command. Obviously substitute for your username and password.

useradd -m -G wheel -s /bin/bash dwheeler
passwd dwheeler

Next we setup sudo. Sudo has a special editor to change it called “visudo”, we should always use this modify the config file. After typing the visudo command scroll down to the line that contains “root ALL=(ALL) ALL”, and underneath that add your username and the “ALL=(ALL) ALL” part. Note that visudo uses VI, which can be a little tricky to use for the uninitiated. Once you are at the line you want to insert your username, type “i” to insert, once you have finished type [esc] and then colon “:” and [wq] to save and exit (ie “:wq”).

root  ALL=(ALL) ALL
dwheeler  ALL=(ALL) ALL

Type “reboot” and now login as yourself. Now to install some graphics tools.

sudo pacman -S xorg-server xorg-xinit xorg-server-utils mesa
sudo pacman -S xorg-twm xorg-xclock xterm

Because we are using virtualbox, we need to install some helper tools that will allow the graphics to work properly. I got some good ideas from this post (http://wideaperture.net/blog/?p=3851), it might be worth checking it out for a second way of doing this.

sudo pacman -S virtualbox-guest-utils
sudo nano /etc/modules-load.d/virtualbox.conf

After nano opens, add these to these lines to the virtualbox.conf file:

vboxguest
vboxsf
vboxvideo

Then get this too load.

sudo systemctl enable vboxservice.service

Reboot using “sudo reboot”, once you are back into the environment type “startx”, and some very basic windows should open confirming that x is working! Type exit in these windows to return to the terminal.

Finally, we will install the desktop environment. This consists of two parts, first the display manager that will log us in and kick off the desktop, and then the desktop environment itself. You can choose from a bunch of different environments, from fancy feature rich to bare-bones, see the arch desktop page for the options. I’m all for saving resources so I’m installing the lightweight LXDE. Installing the lxde group using pacman also installs the display manager (called lxdm).

pacman -S lxde

Accept the defaults. Now we need to get the display manager to load automatically at boot, and set the default desktop (note you can install multiple desktops environments lxdm will give you options to boot into them instead of the default if you should wish)

sudo systemctl enable lxdm
nano /etc/lxdm/lxdm.conf

In nano uncomment the desktop environment you want to be the default, in my case it was this line ”
session=/usr/bin/startlxde”

Reboot, login, and bobs your uncle!

Note: If full screen doesn’t work, try typing “sudo depmod -a” in a terminal and then reboot again

Last job is just to change the permissions on the shared folder so that we can access the host (replace username with your username).

sudo usermod -a -G vboxsf username

New Zealanders and their sheep – part 2

Posted on July 25, 2014 by Dave Wheeler

Ok, based on the graphs in the last post NZ is slowly being cow-a-fyed, so whats driving this trend. Well google will tell you that its …

WARNING: This data is dodgy, but I’m really just using it to demonstrate how cool pandas is. So I found some information on milk and lamb meet prices, we’ll load them up as dataframes and work out the percent change since 1994 like we did before. We’ll try out the datetime functionality of pandas, which is really quite nice. But first just to import our table from the last post and make the year the index so we can easily merge the new data.

import pandas as pd
per_decline = pd.DataFrame(pd.read_csv('percent_decline.csv'))
cols = per_decline.columns.values
cols[0] = 'Year'
per_decline.columns = cols
per_decline.index = per_decline['Year']
per_decline = per_decline.ix[:,1:] #all rows, skip first column (date is now the index)
per_decline.head()

	Total beef cattle	Total dairy cattle	Total sheep	Total deer	Total pigs	Total horses
Year						
1994	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000
2002	-11.025827	 34.444950	-20.002034	 33.858009	-19.100637	 11.811093
2003	 -8.344764	 32.882482	-20.041908	 37.229441	-10.766476	 18.504488
2004	-11.895128	 34.207998	-20.609926	 42.707754	 -8.072078	 13.376472
2005	-12.366101	 32.506699	-19.379727	 38.499840	-19.230733	-100.000000

Now we are going to create a table from the dodgy lamb price data, this table is in a slightly different format so we will have to use the groupby method to wrangle it into the shape we need.

lamb_data = pd.DataFrame(pd.read_excel('lamb_usd.xlsx',sheetname='lamb'))
lamb_data.head()

	Month	Price	Change
0	 Apr 1994	 130.00	 -
1	 May 1994	 126.59	 -2.62 %
2	 Jun 1994	 127.03	 0.35 %
3	 Jul 1994	 126.11	 -0.72 %
4	 Aug 1994	 119.62	 -5.15 %

Now to use datetime to make an index based on the month data.

lamb_data.index = pd.to_datetime(lamb_data['Month'])
lamb_data=lamb_data.ix[:,1:2] #just grab the price
lamb_data.head()

	Price
Month	
1994-04-02	 130.00
1994-05-02	 126.59
1994-06-02	 127.03
1994-07-02	 126.11
1994-08-02	 119.62

Pandas did a good job of converting the date format into a datetime index. As you’ll see in a second this datetime object has some extra functionality that makes dealing with dates a breeze. Although this new data has the date and price information we need, its divided into quarterly amounts. As you can see by the commented out code, initially I made a mistake and summed these values, but really we want the mean to get the average yearly price. I left the mistake code there as it shows how easy it would have been to get the sum using groupby.

#wrong! lamb_prices = lamb_data.groupby(lamb_data.index.year)['Price'].sum()
lamb_prices = lamb_data.groupby(lamb_data.index.year)['Price'].mean()
lamb_prices = pd.DataFrame(lamb_prices[:-1]) #get rid of 2014
lamb_prices.head()

	Price
1994	 124.010000
1995	 113.242500
1996	 145.461667
1997	 150.282500
1998	 116.013333

We pass the year index to groupby and get it to do its magic on the price column (our only column in this case, but you get the idea), we then just call the mean method to return the mean price per year. The datetime object made specifying the year easy. Now we are going to write a quick function to calculate the percent change since 1994.

def percent(start,data):
    '''calculate percent change relative to first column (1994), better than previous attempt )-:'''
    ans = 100*((start - data)/start)
    return 0-ans

lamb_change = percent(lamb_prices.Price[1994],lamb_prices)
lamb_change.head()

	Price
1994	 0.000000
1995	 -8.682768
1996	 17.298336
1997	 21.185791
1998	 -6.448405

Great! Now just add that column to our original dataframe. Notice how only the intersect of the dates are used, very handy (ie it drops 1995-2001 from the lamb price data as these dates are not in our stock number table)!

per_decline['Lambprice'] = lamb_change
per_decline.head()

	Total beef cattle	Total dairy cattle	Total sheep	Total deer	Total pigs	Total horses	Lambprice
Year							
1994	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000
2002	-11.025827	 34.444950	-20.002034	 33.858009	-19.100637	 11.811093	 17.768056
2003	 -8.344764	 32.882482	-20.041908	 37.229441	-10.766476	 18.504488	 28.869984

per_decline.index=per_decline.index.astype(int) #lamb2
per_decline.plot(kind='barh')
plt.title('Percent change in stock in NZ since 1994')
plt.xlabel('Percent change since 1994')
plt.ylabel('Year')

The next series of code and graphs adds in milk and lamb prices to try and see why farmers are moving from ovines to bovines!

milk_data = pd.DataFrame(pd.read_excel('milk_prices_usd.xlsx',sheetname='milk'))
milk_data.index=milk_data['year']
milk_data.head()

	year	thousand head	pounds	mill lbs	price_cwt
year					
1989	 1989	 10046	 14323	 143893	 13.56
1990	 1990	 9993	 14782	 147721	 13.68
1991	 1991	 9826	 15031	 147697	 12.24
1992	 1992	 9688	 15570	 150847	 13.09
1993	 1993	 9581	 15722	 150636	 12.80

#get rid of the info we don't need
milk_data = pd.DataFrame(milk_data.ix[5:,:])
milk_change = percent(milk_data.price_cwt[1994],milk_data)
per_decline['milk_price'] = milk_change
per_decline.head()

	Total beef cattle	Total dairy cattle	Total sheep	Total deer	Total pigs	Total horses	Lambprice	milk_price
Year								
1994	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000	 0.000000
2002	-11.025827	 34.444950	-20.002034	 33.858009	-19.100637	 11.811093	 17.768056	 -6.630686
2003	 -8.344764	 32.882482	-20.041908	 37.229441	-10.766476	 18.504488	 28.869984	 -3.469545
2004	-11.895128	 34.207998	-20.609926	 42.707754	 -8.072078	 13.376472	 33.670672	 23.747109
2005	-12.366101	 32.506699	-19.379727	 38.499840	-19.230733	-100.000000	 29.762385	 16.653816

per_decline.plot(kind='barh')

These graphs are a little busy, lets just concentrate on the important stuff.

<pre>animals=['Total dairy cattle','Total sheep','Lambprice','milk_price']
interesting_data=per_decline[animals]
interesting_data.plot(kind='barh')

interesting_data.plot()

Finally!

Oft repeated commands and work flows [work in progress]

Posted on January 10, 2014 by Dave Wheeler

This is just a dumping ground for commands I perform all the time so that I have somewhere to look them up!

Convert tophat bam file into a sorted sam file for htseq-count

#if used tophat should be already sorted samtools view -h tophat_sorted.bam > tophat_sorted.sam

#if need to sort before conver (output tophat_sorted.bam)
samtools sort tophat.bam tophat_sorted

Bash

wc -l seq_file.fas #count lines
grep -c '>' seq_file.fas #count seqs in fasta file
ls -l #permissions and file size
ls -lh #above with human format for sizes
chmod a+wx #add read write permissions to all
chmod 755 #read and execute for user/world
chmod -R 755 #change all files in folder
tail -n 400 #last 400 lines
tail -n +400 #from 400 to end
history | grep "command" #search history for commands
#sed prints to screen, redirect or us -i for inplace
#print a specific line (100) using sed -n just shows that line
sed -n -e '100d' test.txt
#delete a specfic line (line 1000) from a file and backup original
sed -i.bak -e '1000d' file.txt
#delete lines
sed -i.bak -e '1d,10d
#removes line 10 to 20 INCLUSIVE ie 11 lines
sed -i -e '10,20d' test.txt
#remove line 10, lines 15-20, line 100
sed -i -e '10d;15,20d;100d' test.txt
#convert fastq to fasta file!(effecient)[p is print]
sed -n '1~4 s/^@/>/p;2~4p' seq.fq > seq.fas
#get the md5 checksum of a folder
find FOLDER/ -type f -exec md5sum {} \; | sort -k 34 | md5sum > md5.txt

Htseq-count -s flag turn off stranded union is default

htseq-count -s no -m union tophat_sorted.sam > sample_counts.txt

Vim

v #visual
p #paste
# find word at hash
yy #yank word or line
d #delete line
wq #save and close
ma #mark a, use 'a' to goto this section
mz # mark z as above
{} #select paragraph
{d} #cut block</pre>

Python specific

pip install --upgrade packagename
#virtual env
virtualenv venv
source venv/bin/activate
deactivate venv

print "%f"%(np.mean(seq_length))
#prints 1019.662143
print "%.2f"%(np.mean(seq_length))
#prints 1019.66
print "%.1f"%(np.mean(seq_length))
#prints 1019.7 note rounding up
print "%d"%(np.mean(seq_length))
#prints 1019

#use with open('') as for automatic file closing
with open('infile.txt','r') as f:
    f.read()

Git

WordPress (see text for tags)

use the following tags
in sqaure_brackets code language="css" and end with code in sqr bracktets

import sys

test = sys.argv[1]

for name in test:
    print name
plot(xmir205,ycbx1,'go',fig_fn205(xmir205'),'-k')
plot(xmir205,ycbx1,'go',fig_fn205(xmir205),'-k')

browseVignettes()

Matplotlib
Change the font family.

font = {'family' : 'normal',
        'weight' : 'bold',
        'size'   : 22}

matplotlib.rc('font', **font)