[Edinburgh-pm] elastic search

Aaron Crane perl at aaroncrane.co.uk
Sat Mar 5 11:35:33 PST 2011

Miles Gould <miles at assyrian.org.uk> wrote:
>> Isn't that NP-hard, even with the restriction that the variables must
>> be binary?
> I'm not sure. It's certainly NP-hard without that restriction.

*surreptitiously checks Wikipedia*

Yes, it's still NP-hard.

http://en.wikipedia.org/wiki/Linear_programming#Integer_unknowns says:

"Binary integer programming (BIP) is the special case of integer
programming where variables are required to be 0 or 1 (rather than
arbitrary integers). This problem is also classified as NP-hard, and
in fact the decision version was one of Karp's 21 NP-complete

>> How many variables are you dealing with?
> Hmmmmmmm. That depends on the question the user wants answered,
> unfortunately. Average case is probably about 20-30, worst case is about
> 80.

Ouch.  20 sounds fine for exhaustive search, 30 probably not, and 80
is well into "give up and choose a different hobby" territory.

I was hoping it would be some nice easy problem in register
allocation, or something — you won't find many CPUs with more than 16
general-purpose registers. :-)

> It might be worth cacheing the worst-case calculation, because
> that's the one everyone's going to try first to test the site out :-)

Yes, that's a good idea.

Further, the answers are all going to be quite small — 80 bits if you
encode them carefully enough.  Given that, and assuming this is a
public service, it's almost certainly worth caching every answer you
ever emit, just on the off chance someone else wants it in the future.
 Like the same user coming back again the next day.

> a personalised "best rack for you" calculation

You might also consider letting users tell you which cams they already
have; each additional such piece of information halves the size of the
search space.

> We can restrict to cams whose range intersects with the
> range of interest; this would throw out a lot.


> We might be able to throw
> out cams that are overly-heavy or prohibitively expensive in the user's
> locale, though this would be tricky.

If you can get additional constraints of the form "maximum rack weight
is X", that certainly lets you discard cams which are individually
heavier than that.  But I fear that's unlikely to be practical — for
one thing, I bet there aren't many people who'd impose a rack weight
smaller than the weight of many commercially available cams.  Or do
you mean something cleverer than that?

> And we can coalesce ranges to
> minimise the number of coefficients to consider. But that still leaves
> us with an uncomfortably large n for an O(2**n) search.


> Thoughts?

The Wikipedia article says that there are efficient algorithms if the
constraint matrix is, like, totally unimodular, but I didn't bother
following the link to that definition, so I don't know whether this
problem meets that condition.  It also mentions some "advanced
algorithms" for solving integer LP problems; maybe one of those will
work for you.

I have a vague idea for a heuristic, but I've no idea if it will
actually work in practice.  First solve it as a real-valued¹
linear-programming problem, restricting each variable to the 0 < x ≤ 1
range.  Then take all the cams which came out in the answer, and if
there are few enough, do an exhaustive search over them.  That might
not produce a feasible answer, of course. :-(

¹ Yes, yes, I know: floating-point numbers are rationals, not reals.

Aaron Crane ** http://aaroncrane.co.uk/

More information about the Edinburgh-pm mailing list