staff project download information miscellaneous
Vect   DNA to Protein Tutorial
  Installing Perl
Mac
Windows
Unix


Download

Reference Manual
Introduction
Overview
Input Panel
Convert Panel
Output Panel
Perl Program Panel


Tutorials
Numerical Data Extraction
Statistical Data Extraction
Patent Calculation
PDB Data Extraction
GenBank Data Extraction
Tabular Data Analysis
Word Mapping
DNA to Protein Extraction

Change Log


FAQ

Cookbook
 
MangoPicky DownloadLucy2 DownloadTrend DownloadGRAMAUBViz DownloadgeneDBN Download

Part 2: Extracting Coordinate Sequences (1)

We need to get more data from the genbank report! This time, we need the coordinates of the coding sequences. The coordinates point to locations in the long BAC DNA sequence that we just extracted in the previous step. The coordinates are given in the format “join(2423, 2941)” or “complement(2341, 6577)”. Complement means the coding sequence has to be reversed.

Let's get the coordinates from the genbank report. Coordinates are located throughout the genbank report, but they are always before the /gene tag.

Red arrows point to coordinates, green point to the /gene tag that always follows a coordinate sequence.

However, we only want the ones that are under CDS tag. So, Right Click on CDS and select New Block Open condition. CDS will be highlighted green.

We can't use the closing paratheses as a Block Close Condition because there can be multiple closing paratheses in a line of coordinates. Also the length of a coordinate sequence varies. So the logical choice would be to use the /gene tag as a Block Close Tag.

Right click on the /gene tag and select New Block Close Condition.
/gene should be highlighted red now. But we're not done! Because the coordinate sequences are of differing length, and end in different places. If you try to select the coordinate sequences now, you'll also select things you don't want, such as the entire /gene="at2g18300" tag.

So, Right click on the green highlighted CDS tag and select "Selection Exclusive". Do the same thing for the /gene tag, Right click on the red highlighted /gene tag and select "Selection Exclusive".

Now, left click on the rows of coordinate sequences and you should find the coordinate sequences are now beautifully selected in grey :-) (Two clicks should be enough, first click on the first row and second click on the second row. this happens at somewhere near line 120) You need to use /gene as an end concatenation marker, so you want to click and drag-select "/gene" as the end maker for each coodinate set as well:

Click on Move to copy the data from to the Convert Data panel. Name the rule something meaningful, such as “Raw Coordinates”.

Last modified June 13, 2008 . All rights reserved.

Contact Webmaster

lab