Why perl for Science should die!

I think its fair to say the most popular programming/scripting language for Bioinformatics is Perl. My language of choice Python would certainly be coming in second. I reallly really don’t like perl, and I think it is a poor language for Science in general, here’s why!

1. Its hard to read

This is the biggest issue for me. Well commented Perl is actually pretty fun to read, its a punchy language so you don’t have to do too much scrolling to work out what is going on. However, the key point of that last comment is the word “comment”, poorly commented Perl scripts are a nightmare to read. And frankly, most of the scripts that I have read fall into this latter category. Turnover of people in Science is pretty high, so the idea using a language that makes it really hard for other people to just pick up where the other person left of is such a silly idea. The other key thing is that Science should be about transparency, and Perls silly syntax just adds another layer dust that hides mistakes and errors in logic. This is latter point is especially important when bioinformaticians are working with biologists who don’t know a computer language, essentially no oversight or review because the language is so god dam hard to understand unless you know the syntax.

Advertisement

Making MrBays run on a mulitcore machine

So what makes the excellent phylogenetic program MrBays even better, multicore support!

MrBays itself is pretty easy to install on a linux machine just by following the configure file notes, however I found it a little more tricky to get it to run in multicore mode. Others might find this useful, so I through I would add it to my blog

First download and extract a copy of MrBays and naviage to the /src directory in the terminal.

Install the required libraries
>sudo apt-get install mpich2 libmpich2-dev libmpich2-1.2 libreadline6-dev

Run the following to configure.
>autoconf
>./configure --enable-mpi=yes --with-beagle=no
>make

As beagle is for graphics processors we want to turn that off for a normal PC like system.

Now in your home directory make a file called “.mpd.conf” add this line to the file but change the ‘secretword’ to what ever you like: “MPD_SECRETWORD=<secretword>”

Change the permissions so that only you can read and write
>chmod 700 .mpd.conf

Run the mpd in the background, it shouldn’t complain, but if it does do what it asks.
>mpd &

Now run the program on 6 cores (or how many you have available), stdout will be written to GT.txt, all this will run in the background due to “&”
>mpirun -np 6 mb trimmed_nex.txt > GT.txt &

You can check the progress by opening the output file or just typing:
>tail -f HGT.txt

Done!

Sources
http://matthewvavrek.com/2011/03/19/mrbayes-and-multicore-processors/
http://mrbayes.sourceforge.net/wiki/index.php/FAQ#How_do_I_compile_single-_and_multi-processor_versions_on_SGI_machines.3F

Linux – my favourite parts

If you have never used a Linux OS then I highly recommend you give it a go. Why? Well the system gives you security, power, and most of all freedom! As they say though, with great power comes great responsibility, and you’ll learn this the hard way the first time you permanently delete your entire home directory with one careless key stroke (mine was “rm -rf *” – a really bad idea). Now days I loath logging into Windows (I wish for a day when google docs has citation support, perhaps via google scholar integration – how cool would that be!

If your interested in trying out a Linux distro then you can’t go past Ubuntu, many in the Linux World despise this Distro, but at the end of the day is great for new users and hell it’s what I use to this day.Ubuntu can feel a little heavy (nothing like Windozs), if you after something a little more scaled back that runs great on that old laptop you have sitting around try CruchBang. Be warned if you decide to go with CruchBang you are going to have to get used to using the terminal, but I actually think thats a good thing. For those that don’t know the terminal is a command line interface, ie not Graphic User Interface (GUI). Power uses know hundereds of commands and can do lots of things with a couple of neat keystrockes. I won’t lie, I still predominantyl use the GUI to navigate around in Ubuntu, but as I learned more tricks and tips I find myself drawn to that little terminal more frequently. So here are some of those neat tricks.

sudo apt-get install nautilus-open-terminal installs a menu option to open the terminal in the current window, saves you the pain of having to cd into your current window from the home directory.

This ones a beauty, history | grep ‘search term’. Stuck trying to remember that great new command you just learned, type this into your terminal and basically it searches your command history for the keyword. The results will bring up a numbered list of previous commands, you can then use !1234 (where 1234 is the commands number) to automatically run the command again.df -h, ls -l, ls -rt I use all the time, the first tells you about the state of your hard drive space, the second tells you extra information about your files (permissions, creation time, size etc), and the final one lists files in reverse order of creation time.

To quickly (and sloppy?) make a file executable without having to remember the permission codes use chmod +x. BTW chmod means “change mode”, if that helps you remember it. The numbers in chmod refere to the permissions for either Owner, Group or Other (the world). It work by 4=Read, 2=Write, 1=Execute, so the command “chmod 755 means the files owner can read, write and execute the file (4+2+1)=7, the group and the world can read and excute the file (4+1)=5, but not change it. Others handy ones are chmod 700, only the owner can look at this file, execute file, or anything for that matter and chmod 600 would be read only by owner ie it is write protected. What power!

The alias command allows you to create shortcuts that save you typing, for example, I use “res” as an alias for cd /media/5E40E9D940E9B843/Users/Dave/Documents/RESEARCH– saves some typing. They are easy to set up, just type alias list=’command I want to execute’.

To get and send files to a remote commputer use the SCP (secure shell copy) to do it securely. For a single file on remote machine copied to your home directory use scp remote_username@132.123.123.12:Nvit.txt ~ Where Nvit.txt is the target file on the remote machine and the destination is home (~).

Finally, it is worth taking the time to learn some bash coding so that you can automate common tasks. Especially as many bioinformatics software can take hours to run automation is the only way to go, the scripts also end up being a great documentation of what and how you did what.

Next time, why Perl for Bioinformatics must die!

Whats happening. Work (paper writing, job apps, admin(ahh)). Rec (Gardening, C/Java/Android programming)).