question about the nature of DBM ties

nkuipers nkuipers at uvic.ca
Tue Sep 24 15:09:39 CDT 2002


Hi all,

When you tie a data structure to an external file, is populating that 
structure for large input updating directly into the file or is it crowding 
more and more stuff into memory which then gets dumped into the file, or what?
 In other words, does tying in this manner free up more RAM?  The bottleneck 
in my code is the unique function but this function is necessary.  All in all 
the code works perfectly but takes too long.

#!/usr/bin/perl

use strict;
use warnings;
use DB_File;

my $infile = shift;
my $wordsize = 10;
my %clusters; #key=>value = 'id_string' => 'DNA_string'
my %k_strings; #key=>value =(ie.) 'ACGTGGTCAC' => [id_string1, id_string2,...]

tie(%k_strings, "DB_File", "index.tmp") or die "Can't open filename: $!";

%k_strings = &build_index(\%clusters);

untie %k_strings;

sub build_index {
	my $clusters_hashref = shift;
	my %k_hash;
	while ( (my $id, my $sequence) = each %$clusters_hashref ) {
		my $tmp = $sequence;
		while ( length($tmp) >= $wordsize ) {
			my $kstring = substr($tmp, 0, $wordsize);
			if ( exists $k_hash{$kstring} ) {
				push @{ $k_hash{$kstring} }, $id
				if unique(\@{ $k_hash{$kstring} }, \$id)
			} else { $k_hash{$kstring} = [ $id ] }
			$tmp =~ s/^\w//;
		}
	}
	return %k_hash;
}

sub unique {
	my ($array_ref, $id_ref) = @_;
	my $flag = 0;
	for (@$array_ref) {
		if ( $_ eq $$id_ref ) {
			$flag = 1;
			last;
		}
	}
	$flag == 1 ?  return 0 : (return 1);
}

__END__




More information about the Victoria-pm mailing list