[Charlotte.PM] Help me tune this for speed

drewhead@drewhead.org drewhead at drewhead.org
Tue Jul 26 20:20:08 PDT 2005


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Here's an interesting problem I've hit recently [and totally diffrent from
the recent template discussions :) ]

I have a set of X,Y coordinates.  I'm trying to describe these
coordinates.  One of the measures I'm trying to make is a 'remoteness'
factor.  I'm going to define this factor as a set of statistical metrics
(mean, median, stddev) on the set of edge between 'neighbooring'
points.  A edge is the distance between to points.  A point has 
definable neighboors by having an edge that no shorter edge that also
describes a neighboor intersects.

So my goal is to find the complete list of non intersected edges between
neighboors.  After that the stats become trivial.  I'll compute all the
unique edges first, then loop though them sorted smallest to largest.
Per edge I'll look through all previously found non intersecting edges to
prove/disprove intersection.  Since the smallest edge HAS to to be non
intersecting bu a smaller one [non exist] this should work looking through
the list in sorted order.

The problem is that this algorithm is O(N^3).  That's kind of nasty cause
it scales badly.  The example I'm providing is 301 points (I'll need to
process several thousand)  which baloones up to 27+ million caculations at
the limit.

I'm going to use Math::Geometry::Planar which has a SegmentIntersection
function.  I've tinkere dwith this for about a week, and think I'm loosing
more time in my storage/control loops than the actual math.  The example
data file processes on my XP2500 (linux) in just shy of 25 minutes.  I ran
a 2001 point grid over the weekend that never came back.

So, here are the example code and an datafile with 301 coordinates.
Anyone have any ideas of how to speed this up?  Here's my output:

Found unique Edges = 90300
 1 wallclock secs ( 1.28 usr +  0.06 sys =  1.34 CPU)
Unintersected edges 1778
1487 wallclock secs (1458.49 usr +  1.35 sys = 1459.84 CPU)


- -- 
     Drew Dowling               Drewhead          http://www.drewhead.org
 drewhead at drewhead.org    |                    |    WWW      / \ Alpha Phi Omega
Concord, North Carolina   |                    |  Nimat     /   \  Gamma Lambda
CLEMSON UNIVERSITY ALUMNI |                    | Apatschin /_____\
      TIGER BAND!         |      VGAP4 Hosting at http://vgap.drewhead.org
-----BEGIN PGP SIGNATURE-----
Comment: Public key available at http://www.drewhead.org

iD8DBQFC5vxV8J7U7yHE638RAoJUAKCiV2SPMTo4x0LcWnZ4kd2AFiYDOACdEKqL
IdAbgBgseXlIRWLxNXANq6A=
=lKdU
-----END PGP SIGNATURE-----
-------------- next part --------------
#!/usr/bin/perl

######################################################################
# CPAN modules
######################################################################

use YAML qw(Dump Load);
use Carp;
use Benchmark;
use Math::Geometry::Planar;

use strict;


######################################################################
# A few globals
######################################################################

my $points; # hash ref holding the raw data {'1' => [X,Y], '2' => [X,Y], ...}
my @edges; # Array to keep refrences to all edges [edge, X1, Y1, X2, Y2] ...
my $neighborEdges; # hash ref to hold sorted neighbor edges {'1' => edge1, '2' => edge2, ...}

######################################################################
#
#       MAIN
#
######################################################################

# Read in the X,Y coordinates from supplied data
{
  local $/;
  open(my $fh, "dump_301.dat") or die "Unable to open file: $!\n";
  $points = Load( <$fh> );
  croak($@) if $@;
  close($fh);
}

# Find all the unique edges

eval {
  my $start_distance = Benchmark->new;
  my $found = {};

  # p1 is starting or anchor point of the line segment
  foreach my $p1 (@{$points}) {
    # p1 is end point of the line segment
    foreach my $p2 (@{$points}) {
      # We don't need to caculate if anchor and end are the same
      # or we have already seen these two pairs [reversed] before    
      unless ( ($p1 == $p2) || $found->{$p2}->{$p1} ) {

        # Compute the edge
        my $edge = sqrt(
          ($p1->[0] - $p2->[0])**2 +
          ($p1->[1] - $p2->[1])**2
        );

        # Push a ref of the whole thing on a stack
        push (@edges, [ $edge, $p1->[0], $p1->[1], $p2->[0], $p2->[1] ]);
      
        # Keep track of the edges we've already computed (no need to do so twice)
        $found->{$p2}->{$p1} = 1;

      }
    }
  }

  print "Found unique Edges = ", scalar(@edges), "\n";
  print timestr(timediff(Benchmark->new, $start_distance)), "\n";
};

croak($@) if ($@);

# Compute the neighboors
my $start_intersect = Benchmark->new;

# Need an integer to keep our sort order
my $neighboorCNT = 1;

# Loop trough all found unique edges from smallest to largest
foreach my $aref ( sort {$a->[0] <=> $b->[0] } @edges) {

  my $neighbor = 1; # default true

  # Loop through all previously found edges in order
  for (my $n=1;$n<=$neighboorCNT;$n++) {
    if ($neighborEdges->{$n}) {

      my $points_ref = []; # Data structure to pass to SegmentIntersection
      push (@{$points_ref}, [$aref->[1],$aref->[2]]); # line 1 point A X,Y
      push (@{$points_ref}, [$aref->[3],$aref->[4]]); # line 1 point B X,Y
      push (@{$points_ref}, [$neighborEdges->{$n}->[1],$neighborEdges->{$n}->[2]]); # line 2 point A X,Y
      push (@{$points_ref}, [$neighborEdges->{$n}->[3],$neighborEdges->{$n}->[4]]); # line 2 point B X,Y

      my @p = SegmentIntersection($points_ref);

      # If a intersect is found, set false and quit checking
      if (SegmentIntersection($points_ref)) { 
        $neighbor = 0;
        last;
      }

    }
  }

  if ($neighbor) { 
    $neighborEdges->{$neighboorCNT} = $aref;
    $neighboorCNT++;
  }
  
}

print "Unintersected edges ", scalar(keys %{$neighborEdges}), "\n";

print timestr(timediff(Benchmark->new, $start_intersect)), "\n";

-------------- next part --------------
---
-
  - 3000
  - 3000
-
  - 3173
  - 3667
-
  - 3817
  - 2706
-
  - 2309
  - 3198
-
  - 3613
  - 2917
-
  - 2953
  - 3366
-
  - 3023
  - 2702
-
  - 2558
  - 2708
-
  - 2918
  - 2353
-
  - 2948
  - 3210
-
  - 3662
  - 2469
-
  - 2457
  - 3585
-
  - 3277
  - 2893
-
  - 3000
  - 2818
-
  - 2974
  - 3494
-
  - 2988
  - 3171
-
  - 2528
  - 2910
-
  - 2154
  - 3854
-
  - 3126
  - 3718
-
  - 3087
  - 3790
-
  - 2787
  - 3257
-
  - 1872
  - 3383
-
  - 3143
  - 2742
-
  - 2741
  - 2767
-
  - 3333
  - 3356
-
  - 3115
  - 3414
-
  - 2427
  - 2329
-
  - 3313
  - 2925
-
  - 2531
  - 3534
-
  - 2743
  - 3226
-
  - 2849
  - 3265
-
  - 3945
  - 2518
-
  - 2839
  - 3287
-
  - 2449
  - 2498
-
  - 3125
  - 3792
-
  - 3503
  - 3135
-
  - 3063
  - 2492
-
  - 3424
  - 3125
-
  - 3761
  - 3385
-
  - 2439
  - 2904
-
  - 2818
  - 2839
-
  - 3148
  - 3186
-
  - 2925
  - 2216
-
  - 2965
  - 3256
-
  - 2882
  - 3324
-
  - 3185
  - 2991
-
  - 2723
  - 3811
-
  - 3309
  - 3115
-
  - 3275
  - 2773
-
  - 3059
  - 3718
-
  - 3388
  - 3022
-
  - 3337
  - 3383
-
  - 3588
  - 2823
-
  - 3463
  - 3149
-
  - 3633
  - 3660
-
  - 3086
  - 3541
-
  - 2959
  - 2441
-
  - 3222
  - 2574
-
  - 2962
  - 2793
-
  - 2238
  - 2594
-
  - 3337
  - 3032
-
  - 2938
  - 2713
-
  - 2936
  - 3359
-
  - 2644
  - 2735
-
  - 2589
  - 3791
-
  - 2683
  - 2494
-
  - 3183
  - 2552
-
  - 2974
  - 3120
-
  - 2706
  - 4180
-
  - 2730
  - 2529
-
  - 2730
  - 3205
-
  - 2911
  - 3524
-
  - 3181
  - 2675
-
  - 3048
  - 2286
-
  - 2651
  - 2772
-
  - 2799
  - 2900
-
  - 2912
  - 2464
-
  - 3587
  - 3147
-
  - 3251
  - 3181
-
  - 2799
  - 2156
-
  - 3004
  - 3928
-
  - 3780
  - 2813
-
  - 2745
  - 2736
-
  - 3157
  - 3181
-
  - 2991
  - 2591
-
  - 3557
  - 2763
-
  - 2925
  - 2696
-
  - 3010
  - 3300
-
  - 2686
  - 2532
-
  - 3331
  - 2724
-
  - 3032
  - 3834
-
  - 2686
  - 2880
-
  - 3076
  - 3314
-
  - 2774
  - 2911
-
  - 2857
  - 2416
-
  - 2955
  - 2762
-
  - 2535
  - 2981
-
  - 2598
  - 2767
-
  - 3459
  - 3010
-
  - 3078
  - 3539
-
  - 2756
  - 3453
-
  - 2721
  - 3711
-
  - 2609
  - 2133
-
  - 2565
  - 3090
-
  - 2763
  - 3231
-
  - 2701
  - 2645
-
  - 2791
  - 2963
-
  - 3154
  - 2673
-
  - 2512
  - 2749
-
  - 2348
  - 2689
-
  - 3097
  - 3297
-
  - 2304
  - 2918
-
  - 3065
  - 3281
-
  - 2907
  - 2777
-
  - 3495
  - 2896
-
  - 3451
  - 2845
-
  - 3145
  - 2405
-
  - 3167
  - 3509
-
  - 3238
  - 3624
-
  - 2824
  - 3314
-
  - 3361
  - 2782
-
  - 2717
  - 3126
-
  - 2934
  - 3239
-
  - 2826
  - 3373
-
  - 3217
  - 2854
-
  - 3975
  - 2611
-
  - 2560
  - 2850
-
  - 3138
  - 3555
-
  - 2949
  - 2863
-
  - 3118
  - 3395
-
  - 2538
  - 3119
-
  - 3056
  - 2398
-
  - 3191
  - 2578
-
  - 2590
  - 3183
-
  - 3539
  - 3345
-
  - 2802
  - 2990
-
  - 2707
  - 3257
-
  - 3580
  - 3157
-
  - 3299
  - 3205
-
  - 2712
  - 2922
-
  - 2901
  - 2539
-
  - 2992
  - 3249
-
  - 2499
  - 3111
-
  - 3103
  - 3120
-
  - 3031
  - 2933
-
  - 3298
  - 2955
-
  - 3465
  - 3452
-
  - 3617
  - 3627
-
  - 2645
  - 3446
-
  - 2876
  - 2847
-
  - 2929
  - 3014
-
  - 3481
  - 3329
-
  - 3082
  - 2934
-
  - 2953
  - 2711
-
  - 3239
  - 2108
-
  - 3293
  - 3168
-
  - 2230
  - 2904
-
  - 3657
  - 2852
-
  - 2046
  - 2934
-
  - 3291
  - 3243
-
  - 3748
  - 2576
-
  - 3475
  - 3455
-
  - 2586
  - 2869
-
  - 3683
  - 2615
-
  - 3260
  - 2988
-
  - 3672
  - 2938
-
  - 2178
  - 2317
-
  - 3455
  - 3307
-
  - 3197
  - 2996
-
  - 3215
  - 3746
-
  - 2771
  - 3208
-
  - 2454
  - 2732
-
  - 3344
  - 2609
-
  - 2706
  - 2313
-
  - 2772
  - 2693
-
  - 3114
  - 3374
-
  - 3482
  - 2927
-
  - 2887
  - 2625
-
  - 2474
  - 3002
-
  - 2712
  - 1858
-
  - 2979
  - 3304
-
  - 3251
  - 2732
-
  - 2828
  - 3071
-
  - 3342
  - 3747
-
  - 2192
  - 3265
-
  - 3160
  - 2658
-
  - 2394
  - 3358
-
  - 2667
  - 3390
-
  - 2612
  - 2497
-
  - 2753
  - 3108
-
  - 3521
  - 2839
-
  - 2840
  - 2920
-
  - 3742
  - 3082
-
  - 2710
  - 3233
-
  - 3070
  - 2840
-
  - 3309
  - 2894
-
  - 3033
  - 2684
-
  - 2464
  - 2876
-
  - 2826
  - 3492
-
  - 2069
  - 2379
-
  - 3446
  - 2943
-
  - 2500
  - 3122
-
  - 2748
  - 3024
-
  - 2813
  - 2591
-
  - 2573
  - 2871
-
  - 3325
  - 3017
-
  - 3595
  - 3255
-
  - 3569
  - 2639
-
  - 2714
  - 3475
-
  - 3133
  - 3024
-
  - 2155
  - 3133
-
  - 3320
  - 3303
-
  - 3999
  - 2247
-
  - 3601
  - 3322
-
  - 2619
  - 2782
-
  - 2709
  - 2670
-
  - 3081
  - 2985
-
  - 3623
  - 3498
-
  - 3088
  - 3076
-
  - 2795
  - 3799
-
  - 2344
  - 3187
-
  - 3029
  - 3348
-
  - 2257
  - 2628
-
  - 2910
  - 3563
-
  - 3206
  - 2674
-
  - 3447
  - 3079
-
  - 2966
  - 3759
-
  - 3446
  - 3207
-
  - 3904
  - 2947
-
  - 3320
  - 3086
-
  - 3063
  - 2716
-
  - 2638
  - 2919
-
  - 2380
  - 2913
-
  - 3210
  - 3027
-
  - 3333
  - 2668
-
  - 3200
  - 1955
-
  - 3285
  - 2637
-
  - 2537
  - 2582
-
  - 3678
  - 3411
-
  - 2871
  - 3594
-
  - 2757
  - 3157
-
  - 3384
  - 2871
-
  - 2727
  - 3279
-
  - 4219
  - 2948
-
  - 3207
  - 2274
-
  - 2273
  - 3298
-
  - 2728
  - 2875
-
  - 2995
  - 3091
-
  - 2775
  - 2624
-
  - 2901
  - 3334
-
  - 2548
  - 4252
-
  - 3063
  - 2879
-
  - 2800
  - 2398
-
  - 2681
  - 2964
-
  - 3231
  - 3185
-
  - 3578
  - 3063
-
  - 3255
  - 2697
-
  - 2433
  - 2821
-
  - 2592
  - 2147
-
  - 3300
  - 2580
-
  - 2738
  - 4050
-
  - 2651
  - 3082
-
  - 2948
  - 2703
-
  - 2963
  - 3084
-
  - 3054
  - 3372
-
  - 2778
  - 3497
-
  - 2722
  - 3325
-
  - 3370
  - 3582
-
  - 3403
  - 3704
-
  - 2685
  - 3075
-
  - 2071
  - 3389
-
  - 2909
  - 2435
-
  - 3474
  - 2698
-
  - 2628
  - 2260
-
  - 2868
  - 3273
-
  - 2177
  - 2833
-
  - 2188
  - 2026
-
  - 3125
  - 2615
-
  - 2723
  - 2649
-
  - 2350
  - 3173
-
  - 2166
  - 2800
-
  - 2533
  - 2361
-
  - 2237
  - 3359
-
  - 3533
  - 2768
-
  - 2771
  - 2453
-
  - 2758
  - 3087
-
  - 2614
  - 2869
-
  - 3141
  - 2617
-
  - 2756
  - 2719
-
  - 2947
  - 3278
-
  - 3325
  - 3045
-
  - 2588
  - 2559
-
  - 3679
  - 2870
-
  - 2948
  - 3108
-
  - 2996
  - 3065
-
  - 3195
  - 2784
-
  - 2885
  - 2780
-
  - 2729
  - 3023
-
  - 3024
  - 2611
-
  - 2819
  - 3254
-
  - 2525
  - 2495


More information about the charlotte mailing list