staff project download information miscellaneous
Vect   GenBank Report Data Extraction
  Installing Perl
Mac
Windows
Unix


Download

Reference Manual
Introduction
Overview
Input Panel
Convert Panel
Output Panel
Perl Program Panel


Tutorials
Numerical Data Extraction
Statistical Data Extraction
Patent Calculation
PDB Data Extraction
GenBank Data Extraction
Tabular Data Analysis
Word Mapping
DNA to Protein Extraction

Change Log


FAQ

Cookbook
 
MangoPicky DownloadLucy2 DownloadTrend DownloadGRAMAUBViz DownloadgeneDBN Download

Extraction and Conversion of Protein ID

In the second tutorial, we will be extracting the protein id names from the Arabidopsis file. In this step, the New Line Selection and 'Quoted Data' rules will be used. In Vect, make sure you are in the 'Input Data' panel with the AC006439.txt file opened. The protein id appears only under the CDS block in the file.

Right click and drag over 'protein_id=' and select New Line Selection Condition from the pull down menu. A yellow block will appear. Select the protein id so that a grey highlighted region appears. (do this by left-clicking once on the id such as “AAD15516.1”).

Note: If you do not have any grey highlighted regions in the 'Input Data' panel then you have not selected any data and no data can be moved over to the next panel.

As in the previous gene sequence step, select the 'Move' button from the icon panel and give your rule a descriptive name. Here you just to type “raw protein id” and press enter key to follow the remaining tutorial.

In the 'Convert Data' panel select 'Insert' and select the 'To Extracted Quoted Data from other rule’ which is located in second line of pull down menu. Give your rule a descriptive name (here I just type ‘extracted protein id) and specify which data set you would like to use. (The one you just imported over, i.e. the protein id). Fill the 'nothing' blocks with quotes (") as shown in the following diagram. To be certain you have the right data set expand the yellow arrow. You should have 25 lines of protein id names.

Select Rule 2 (the concatenated sequence) then Select the 'Copy' button from the icon panel to move your data to the 'Output Data' panel. Format the data set and view the changes by selecting the 'Output' icon in the icon panel.

The tag should not be modified but can be moved around. If users wish to limit the output to a set number of lines, the tag may be edited by including a ':width' before the closing bracket (>). This restricts the body from flowing past the specified width. Example: <gene sequence:60>.

To show the Perl code, move to the 'Perl Program' panel and select 'Compile.' Your Perl program appears as shown below. To run the program generated, select the 'Run' icon. A new window will appear with the results of your Perl program.

If you want to see examples, please check previous mini tutorial for gene sequence extraction and conversion.

Last modified June 13, 2008 . All rights reserved.

Contact Webmaster

lab