标签

2016年3月30日星期三

Statistics: Error bars, T-test & ANOVA

1. Error bars are used to show the distribution of data, indicate the error, or uncertainty in a reported measurement, and give a general idea of how precise a measurement is.

image

Descriptive error bars give information describing the data and show how the data are spread.

Inferential error bars are used to conclude whether the groups are significantly different, or whether the differences might just be due to random fluctuation or chance.

image

It is highly suggested using larger n, to achieve narrower inferential error bars and more precise estimates of true population values.

image

Statistic hypothesis test

It is used to test the significance of group differences between two or more groups.

• T test (Student test)

looks at differences between two groups on some variable of interest

• ANOVA (Analysis of variance)

tests the significance of group differences between two or more groups

2. T test

image

image

Paired two sample means: from same samples

Two-sample assuming equal variances: from different samples with equal variances

Two-sample assuming unequal variances: from different samples with unequal variances

image

image

3. ANOVA

image

image

image

image

image

image

image

image

 

image

2016年3月29日星期二

Perl: Combine rows

1. Combine rows

#!/usr/bin/perl -w
##########################################################################################
#Goal:combine 1st column and count max, min and mean values for each following columns
##########################################################################################
use strict;
use Data::Dumper;
use List::Util qw/max min sum/;

my (%rec, $key, @vals);

while (<DATA>) {
    chomp;
    ($key, @vals) = split;
    while (my ($i, $v) = each @vals) {
        push @{$rec{$key}->[$i]}, $v;
    }
}
print Dumper (\%rec);

while (my ($key, $vals) = each %rec) { #while & each to go through %hash 
    print $key, "\t";
    while (my ($i, $v) = each @vals) { #while & each to go through @array 
        my $max = max @{$rec{$key}->[$i]};
        my $min = min @{$rec{$key}->[$i]};
        my $sum = sum @{$rec{$key}->[$i]};
        my $n = @{$rec{$key}->[$i]};
        my $mean = $sum / $n;
        # 以下改变内部变量;
        $rec{$key}->[$i][0] = $max; # 求最大值;
        $rec{$key}->[$i][1] = $min; # 求最小值;
        $rec{$key}->[$i][2] = $mean; # 求平均值;
        print $rec{$key}->[$i][0]."/";
        print $rec{$key}->[$i][1]."/";
        print $rec{$key}->[$i][2]."\t";
    }
    print "\n";
}


__DATA__
a   2   3   1   3   2   2
s   2   2   2   2   2   3
s   1   2   3   1   2   1
b   3   2   2   1   5   2
a   2   3   3   5   2   4
s   6   8   4   9   2   5

Output:

$VAR1 = {
          'a' => [
                   [
                     '2',
                     '2'
                   ],
                   [
                     '3',
                     '3'
                   ],
                   [
                     '1',
                     '3'
                   ],
                   [
                     '3',
                     '5'
                   ],
                   [
                     '2',
                     '2'
                   ],
                   [
                     '2',
                     '4'
                   ]
                 ],
          'b' => [
                   [
                     '3'
                   ],
                   [
                     '2'
                   ],
                   [
                     '2'
                   ],
                   [
                     '1'
                   ],
                   [
                     '5'
                   ],
                   [
                     '2'
                   ]
                 ],
          's' => [
                   [
                     '2',
                     '1',
                     '6'
                   ],
                   [
                     '2',
                     '2',
                     '8'
                   ],
                   [
                     '2',
                     '3',
                     '4'
                   ],
                   [
                     '2',
                     '1',
                     '9'
                   ],
                   [
                     '2',
                     '2',
                     '2'
                   ],
                   [
                     '3',
                     '1',
                     '5'
                   ]
                 ]
        };
a 2/2/2 3/3/3 3/1/2 5/3/4 2/2/2 4/2/3 
b 3/3/3 2/2/2 2/2/2 1/1/1 5/5/5 2/2/2 
s 6/1/3 8/2/4 4/2/3 9/1/4 2/2/2 5/1/3 

2016年3月28日星期一

Perl: Array of array

1. Array of array

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my (@all, @aoa, @all2, @all3, @all34, @aoa2);

while (<DATA>) {
    chomp;
    @all = split;
    push @all34, $all[2], $all[3];
    push @aoa, [$all[2], $all[3]];
    push @all2, $all[2];
    push @all3, $all[3];
}
push @aoa2, (\@all2, \@all3);

print Dumper (@all34);
print Dumper (\@aoa);
print Dumper (\@aoa2);


__DATA__
a   1   2   3   4   5
b   3   5   6   6   8
c   1   5   5   5   7
d   4   2   6   7   9

Output:

$VAR1 = '2';
$VAR2 = '3';
$VAR3 = '5';
$VAR4 = '6';
$VAR5 = '5';
$VAR6 = '5';
$VAR7 = '2';
$VAR8 = '6';
$VAR1 = [
          [
            '2',
            '3'
          ],
          [
            '5',
            '6'
          ],
          [
            '5',
            '5'
          ],
          [
            '2',
            '6'
          ]
        ];
$VAR1 = [
          [
            '2',
            '5',
            '5',
            '2'
          ],
          [
            '3',
            '6',
            '5',
            '6'
          ]
        ];

2016年3月27日星期日

Perl: List::Util (转)

以下这些函数来自于 List::Util 模块,这是 Perl 内置的模块,不用白不用!

1. 求数组的和:不需要一个一个地累加,直接调用 sum 函数

use List::Util qw/sum/;
my @array = (10, 20, 30, 40);
my $sum = sum @array;       # 得到 100

2. 求数组的最大、最小值:不需要逐个比较,直接调用 max 和 min 函数

use List::Util qw/max min/;
my @array = (10, -1, 6, 25, 8);
my $max = max @array;           # 得到 25
my $min = min @array;           # 得到 -1

3. 如果是按照字符串排列的最大、最小值呢?调用 maxstr 和 minstr 函数

use List::Util qw/maxstr minstr/;
my @array = ("Beijing", "Shanghai", "Guangzhou", "Chengdu", "Nanjing");
my $maxstr = maxstr @array;     # 得到 Shanghai
my $minstr = minstr @array;     # 得到 Beijing

转自:http://bnuzhutao.cn/archives/788

2016年3月26日星期六

Perl: Hash of hash

1. Hash of hash

#!/usr/bin/perl -w
use strict;
use Data::Dumper;

my $hash_ref = {};
my ($who, $field, $key, $value);
while (<DATA>) {
    next unless s/^(.*?):\s*//;
    $who = $1;
    for $field ( split ) {
        ($key, $value) = split /=/, $field;
        $hash_ref->{$who}->{$key} = $value;
    }
}
print Dumper ($hash_ref);

for my $family (keys %{$hash_ref}) {
    print "$family: ";
    for my $role (keys %{$hash_ref->{$family}}) {
        print "$role=$hash_ref->{$family}->{$role} ";
    }
    print "\n";
}

Output:

$VAR1 = {
          'xiongxiong' => {
                            'pet' => 'garfield',
                            'husbnad' => 'tom',
                            'wife' => 'pretty',
                            'pal' => 'jerry'
                          },
          'flintstones' => {
                             'pet' => 'dino',
                             'wife' => 'wilma',
                             'husband' => 'fred',
                             'pal' => 'barney'
                           }
        };
xiongxiong: pet=garfield husbnad=tom wife=pretty pal=jerry 
flintstones: pet=dino wife=wilma husband=fred pal=barney

2. Hash of hash

#!/usr/bin/perl -w
use strict;

my %HoH;
while (<DATA>) {
    next unless s/^(.*?):\s*//;
    my $who = $1;
    for my $field (split) {
        my ($key, $value) = split /=/, $field;
        $HoH{$who}{$key} = $value;
    }
}

for my $family (keys %HoH) {
    print "$family: ";
    for my $role (keys %{$HoH{$family}}) {
         print "$role=$HoH{$family}{$role} ";
    }
    print "\n";
}


__DATA__
flintstones: husband=fred pal=barney wife=wilma pet=dino
xiongxiong: husbnad=tom pal=jerry wife=pretty pet=garfield 

2016年3月25日星期五

Perl: Reference

1. Hash reference

#!/usr/bin/perl -w
use strict;

my $hash_ref = {}; # make a hash reference;

while (<DATA>) {
    chomp;
    my ($key, $value) = split;
    $hash_ref->{$key} += $value; # dereference: $hash_ref->{$key} = ${$hash_ref}{$key};
}

for my $key (sort {$a cmp $b} keys %$hash_ref) { # %$hash_ref = %{$hash_ref};
    print "$key\t$hash_ref->{$key}\n";
}


__DATA__
bb  1
bb  8
aa  2
aa  4
cc  6 

Output:

aa  6
bb  9
cc  6

2016年3月24日星期四

Perl: Array elements

1. Count

#!/usr/bin/perl -w
##############################
#Goal: count array elements
##############################
use strict;

my @arr1 = (1,2,3,3,4,2,3);
my %hash1; 
$hash1{$_}++ for @arr1; 
print "$_\t$hash1{$_}\n" for keys %hash1;

my @arr2 = (23,5,9,109.29,23,23,9);
my %hash2;
map {!$hash2{$_}?$hash2{$_} = 1:$hash2{$_}++} @arr2;
print $_."\t".$hash2{$_}."\n" for keys %hash2;

Output:

4 1
1 1
3 3
2 2
109.29 1
9 2
23  3
5 1 

2. Sort

#!/usr/bin/perl
##############################
#Goal: sort array elements
##############################
use strict;

my @arr = qw/int0 int1 int2 int3 int34 int5 int6 int7 int8 int69 int10 int10 int12 int13 int14 int15 int16 int17 int18 int19 int20/;
print '<=>: ', join( ' ', sort { $a <=> $b } @arr ), "\n"; # string comparison;
print 'cmp: ', join( ' ', sort { $a cmp $b } @arr ), $/; # Numeric comparison;

my @sorted = map { "int$_" } 0..5, 17..20;
print "@sorted", $/; # $/="\n";

my @new = map {$_->[1]} sort {$a->[0] <=> $b->[0]} map {/(\d+)/; [$1, $_]} @arr;
print "@new", $/;

Output:

<=>: int0 int1 int2 int3 int34 int5 int6 int7 int8 int69 int10 int10 int12 int13 int14 int15 int16 int17 int18 int19 int20
cmp: int0 int1 int10 int10 int12 int13 int14 int15 int16 int17 int18 int19 int2 int20 int3 int34 int5 int6 int69 int7 int8
int0 int1 int2 int3 int4 int5 int17 int18 int19 int20
int0 int1 int2 int3 int5 int6 int7 int8 int10 int10 int12 int13 int14 int15 int16 int17 int18 int19 int20 int34 int69

3. Sort

#!/usr/bin/perl -w
###################################
#Goal:1.first column increasing
#     2.fifth column decrseasing
#     3.second column decrseasing
###################################
use strict;

my @a = <DATA>;
my @newa = map {$_->[-1]} sort {$a->[0] <=> $b->[0] || $b->[1] <=> $a->[1] || $b->[2] <=> $a->[2]} map {[(split)[0,4,1], $_]} @a;
# "$_" at the end thus "$_->[-1]";
for (@newa) {
    print "$_";
}


__DATA__
200809 74889999 3.0 6.6 188
200810 74885444 4.0 1.0 200
200810 74885555 0.8 5.5 120
200810 74889888 4.0 1.0 200

Output:

200809 74889999 3.0 6.6 188
200810 74889888 4.0 1.0 200
200810 74885444 4.0 1.0 200
200810 74885555 0.8 5.5 120

4. Sort

#!/usr/bin/perl
##############################
#Goal:sort by column 3
##############################
use strict;

my @by_uid = map {$_->[0]} sort {$a->[1] <=> $b->[1]} map {[$_, (split /:/)[2]]} <DATA>;
# "$_" at the beginning thus "$_->[0]";
for (@by_uid) {
    print ."\n";
}


__DATA__
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync

Output:

root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync