staff project download information miscellaneous
Vect   Numerical Data Extraction
  Installing Perl
Mac
Windows
Unix


Download

Reference Manual
Introduction
Overview
Input Panel
Convert Panel
Output Panel
Perl Program Panel


Tutorials
Numerical Data Extraction
Statistical Data Extraction
Patent Calculation
PDB Data Extraction
GenBank Data Extraction
Tabular Data Analysis
Word Mapping
DNA to Protein Extraction

Change Log


FAQ

Cookbook
 
MangoPicky DownloadLucy2 DownloadTrend DownloadGRAMAUBViz DownloadgeneDBN Download

4. Manipulating Data with Perl (5)

4.2 Numerical Data Extraction

Now let’s get a little bit involved and do some more meaningful data extractions. Suppose you are a consumer’s data analyst, and you were asked: “How many states have cigarette sales more than 100, and what are these states?” With Vect, a couple of lines of Perl code can solve this problem.

First, you should select the data in the columns “State” and “Sales”, and make them numbers-only. You may follow the steps in Section 2 and Section 3 to do this. After those steps your Convert Data Panel should be similar to this one:

Next, Click Insert and choose the rule "To write simple Perl code to convert data". Delete all the lines begin with character “#” in the pop-up window, but leave every other line as is (we do need “$count” at this time). The content in you “define simple user rule” should look like this:

-----------------------------------------------------------------------------------------------
my $count = 0;

sub {

my ($input1, $input2) = @_;

return ('Total lines processed: ' . $count) if !defined($input1);

my $output = $count . ': ' . $input1 . ', ' . reverse $input2;

$count++;

return ($output);

}
-----------------------------------------------------------------------------------------------

There are 5 lines between the two braces. Let’s change them to achieve our goal:

Line 1: Let’s give some meaningful names to these two variables. This line could be:

my ($state, $sales) = @_;

Line 2: This line will be executed after we have processed all the data, we need this line to output how many states meet the given criteria ( in our case is “Sales greater than 100”), so let’s give a meaningful comments in the output:

return (' Number of states with sales more than 100: ' . $count) if !defined($state);

Note that the number of states with sales more than 100 is stored in the variable $count, next let’s see how to get this number.

Line 3 and Line 4: First we need to define $output in a single line, the reason of doing this is Line 3 will not always be executed in our case, but to prevent compiling error we always need to define $output. So we add one more line:

my $output;

Next, we should make the execution of line 3 an line 4 conditional, remember we only need to count the states with sales greater than 100:

if ($sales > 100 )
{
$output = $state . ': ' . $sales;

$count++;
}

What does this segment of the code do is: if the sales number is greater than 100, execute two lines of code between braces, which output the corresponding state name and the sales, and increment $count by 1. If the sales number is less than or equal to 100, do nothing. The output will be an empty string, and $count will not be incremented.

Line 5: This simply defines what variable to output. We don’t need to change it.

After these changes, let’s take a look of our code:

-------------------------------------------------------------------------------------------------------

my $count = 0;

sub {

my ($state, $sales) = @_;

return ('Number of states with sales more than 100: ' . $count) if !defined($state);

my $output;

if ($sales > 100 )
{
$output = $state . ': ' . $sales;

$count++;
}

return ($output);

}
-----------------------------------------------------------------------------------------------------

Click OK to close the pop-up window. Set your state data as the first input, and your sales data as the second input. Your fifth rule in the Convert Data Panel should look like this:

We are almost done, the only thing looks not so great is that for the state does not meet our criteria, there is an empty line of the output. To get rid of this, simply use Vect’s filter data rule:

Click “Insert”, and select “to filter data from other rules based on a pattern”.

Simply use our own rule as the input:

All right!! Now we’ve got everything right. To double check, click the small triangle between the rule number and rule name, make it a red one, like you’ve seen in the picture above. You can see the entire output, the last line of the output should be:



Yes, we have 42 states who have tobacco sales more and 100.

Congratulations! You have successfully manipulate some numbers with some simple Perl :-) Next we shall edit the template in Vect so that our output will look pretty!

Last modified June 13, 2008 . All rights reserved.

Contact Webmaster

lab