Friday, April 09, 2010

Genius for gene networks

Now that I cleaned the matrix data (see previous three posts) I can display them with Genius (google code) without having to deal with long genes descriptions:

First steps with Gridworks (3)

Once again I go back to the original file and, as I know that the first column has always the same format I will try: value.split(":")[-1] and the results will be:

And here we go:


In this case I've created a new column but I could have selected 'edit cell -> transform' to transform the original column directly.

Finally, to create exactly what I wanted:
value.split(':')[-2]+':'+value.split(':')[-1]

or
value.replace('Affymetrix:CompositeSequence:','')
or
value.split(':')[2,4].join(':')


In summary, with the last line of code, in a few seconds total (load the file and run the command) I can perform the desired transformation. Also, it would be helpful to be able to use the command "split multi-valued cells" with the choice of splitting into rows (as it is already possible now) or into columns (which is currently not possible).

First steps with Gridworks (2)

I am back on the 'Genius matrix' project and I notice that the column name is a bit cryptic, I therefore provide a new simple name.


Even if that was not the goal of my data cleaning I am still curious on how to split the first column in multiple ones using the separator ':' as criteria.


I select 'add column based on this column' and I get the following screen:

Using the "Gridworks Expression Language" (GEL) I can create a new column where I got rid of the prefix "Affymetrix:CompositeSequence" (as you can see in the above pic):

if(value.startsWith ("Affymetrix:CompositeSequence"), value.substring(29), value)

And the result is:


I still haven't used the separator ':' with my rule but the result is really close to what I wanted... and there is always the option to roll back.

Thursday, April 08, 2010

First steps with Gridworks (1)

I am having the chance of testing the alpha of Freebase Gridworks and given my appreciation for the work of David Huynh and Stefano Mazzocchi I am really excited about it. As these days I am working on a little software for visualizing gene networks and I need to perform a few simple steps for cleaning the data I decided to go Gridworks (video1 and video2). Gridworks is easy to install and once I run it, Gridworks opens a page in my browser.

I am then going to load a matrix in csv format simply filling in the data file location and a project name. Creating the project I get back the view of the matrix:


I now need to clean the first column that contains names such as:
Affymetrix:CompositeSequence:Rat230_2:1389668_at

and change them into something like: Rat230_2:1389668But first I want to visualize a larger number of rows (from 20 to 50) to make sure that the pattern is always the same. I select page size: 50 and this time the operation takes a few seconds to complete.
Now I want to find the best way to modify the first column and I start exploring the contextual menu:

Several quick options are available: transform to uppercase, lowercase, title case, collapse white spaces and so on. I also see a split multi-valued cells and this gets my attention. I select it, I specify as separator the semicolumn ':' and I get back:

This is not exactly what I wanted as I was expecting to get multiple columns out of the first one instead of getting multiple rows. Well, no problem the undo feature is one click away. You can either click on the top of the screen for the last command undo or you can browse the command history on the right of the screen and go back the desire amount of steps.


And I can go back to the original state.