[Pdx-pm] Musings on operator overloading (was: File-Fu overloading)

Aristotle Pagaltzis pagaltzis at gmx.de
Sun Feb 24 07:00:27 PST 2008


[Cc to perl6-language as I think this is of interest]

[Oh, and please read the entire thing before responding to any
one particular point. There are a number of arguments flowing
from one another here. (I am guilty of being too quick with the
Reply button myself, hence this friendly reminder.)]

* Eric Wilhelm <scratchcomputing at gmail.com> [2008-02-24 02:05]:
> # from Aristotle Pagaltzis
> # on Saturday 23 February 2008 14:48:
> >I find the basic File::Fu interface interesting… but operator
> >overloading always makes me just ever so slightly queasy, and
> >this example is no exception. 
> 
> Is that because of the syntax, the concepts, or the fact that
> perl5 doesn't quite get it right?

It’s a matter of readability. It’s the old argument about, if
not to say against, operator overloading: you’re giving `*` a
completely arbitrary meaning that has nothing in common in any
way with what `*` means in contexts that the reader of the code
had previously encountered.

> Does it help to know that error messages will be plentiful and
> informative?  (Not to mention the aforementioned disambiguation
> between mutations and stringifications.)

It has nothing to do with any of these factors.

> >I get the desire for syntactic sugar, I really do… but looking
> >at this, I think the sane way to accommodate that desire is to
> >attach overloaded semantics to a specially denoted scope
> >rather than hang them off the type of an object.
> 
> I can't picture that without an example.

Something like

    path { $app_base_dir / $conf_dir / $foo_cfg . $cfg_ext }

where the operators in that scope are overloaded irrespective of
the types of the variables (be they plain scalar strings,
instances of a certain class, or whatever).

Note that I’m not proposing this as something for File::Fu to
implement. It would be rather difficult, if at all possible, to
provide such an interface in Perl 5. You need macros or access to
the grammar or something like that in order to implement this at
all. Although I think that even if you have those, you wouldn’t
want to use them directly, but rather as a substrate to implement
scope-attached operator overloading as an abstraction over them.

But I think it’s desirable to use this abstraction instead of
using grammar modifications or macros directly, since it vastly
more limited power than the former and still much less power than
the latter. It should therefore be easier both in use by the
programmer who designs the overloading scope and in readability
for the maintenance programmer who reads code that uses overload
scopes.

It would particularly help the latter, of course, because the
code’s behaviour does not vary based on the types that happen to
pass through; the source code is explicit and direct about its
meaning.

> I suspect though that having the object carry the semantics
> around with it is still going to be preferred.

There are cases where it would be.

When the object is a mathematical abstraction in some broad
sense, e.g. it’s a complex number class, or it implements some
kind of container such as a set, then being able to overload
operators based on the type of that object would be useful.

But note that in all of these examples, it is very much
self-evident what the meaning of an overloaded `+` would be: that
meaning comes from the problem domain – a problem domain that has
the rare property of having concepts such as operators and
operands.

When you leave the broader domain of mathematical and “para-”
mathematical abstractions behind and start to define things like
division on arbitrary object types that model aspects of domains
which have nothing even resembling such concepts, you’re rapidly
moving into the territory of obfuscation.

A lot of C++ programmers could sing a song about that.

However, I think the way that Java reacted to this (“only the
language designer gets to overload operators!!”) is completely
wrong. I agree fully with the underlying desire you express:

> The essential motivation is that "if I can't make this
> interface work, I'm just going to slap strings together and be
> done with it."  The converse is that if I can make this
> interface work then cross-platform pathname compatibility
> becomes far less tedious.

Absolutely it is very, very useful to be able to define syntactic
sugar that makes it as easy and pleasant to do the right thing
(manipulate pathnames as pathnames) as it is to do the wrong
thing (use string operations to deal with pathnames). That is
precisely why I said that I do get why you’d want to overload
operators.

And this contradiction – that being able to declare sugar is
good, but the way that languages have permitted that so far leads
to insanity – is what sent me thinking along the lines that there
has to be some way to make overloading sane. And we all know that
all is fair if you predeclare. And that led me to the flash of
inspiration: why not make overloading a property of the source
(lexical, early-bound) rather than of the values (temporal, late-
bound)? And what we need to do that is a way to say “this scope
is special in that the operators herein follow rules that differ
from the normal semantics.” There you have it.

Note that even with mathematical abstractions, there are cases
where scope-bound overloading is a win over type-bound
overloading. Consider a hypothetical Math::Symbolic that lets you
do something like this:

    my $x = Math::Symbolic->new();
    print +( $x**2 + 4 * $x + 3 )->derivative( $x );

I hope it’s obvious how such a thing would me implemented. Now,
if you used type-bound overloading, then the following two
expressions cannot yield the same result:

    ( 2 / 3 ) * $x
    2 * $x / 3

But if overloading was scope-bound, they would!

(I have to credit and thank Kragen Sitaker for this example.
I spent a while last night chatting with him on IRC about this,
and he helped me clear up the idea a little in my head. He also
convinced me that type-bound operator overloading is important
for modeling concepts from the mathematical domain.)

> Consider the "load a group of files to be found in a given
> directory" task:
> 
>   my $dir = File::Fu->dir("foo");
> 
>   ...
> 
>   foreach my $fn (qw(bar baz bat)) {
>     my $file = $dir + $fn;
>     my $fh = $file->open;
>     while(my $line = <$fh>) {
>       ...
>     }
>   }
> 
> If you're uncomfortable with the lexical distance, you could
> put the dir() constructor inside (or next to) the loop, but
> that (IMO) places too much importance on the "specialness" of
> $dir.  I'm looking at it from the point of view that something
> named "$dir" which _isn't_ a File::Fu::Dir is an anomaly.

This is all precisely to my point: the lexical distance *is*
(potentially) too great, but reducing it *does* emphasise the
wrong thing. What’s *more*, you have to use this File::Fu class
and its instance methods to ensure that it all works properly.
I’ve seen Path::Class used extensively in the context of
Catalyst, and one thing that struck me is how often you must
explicitly stringify, because there’s this tension between APIs
that expect filenames as strings or objects of certain kinds,
where you can’t continue just pretending that your Path::Class
objects are mostly like strings – even though Path::Class does
make some effort to allow this.

My counterproposal, were it possible in Perl 5, which it isn’t,
would be something like this:

    my $dir = 'foo'; # no object at all!
    
    # ...
    
    foreach my $fn (qw(bar baz bat)) {
      my $file = path { $dir / $fn };
      open my $fh, '<', $file or die "$!\n";
      while(my $line = <$fh>) {
        # ...
      }
    }

This *really would* work regardless of whether you pass strings
or any kind of pretends-to-be-a-string path objects, it makes the
spot that does something unusual explicit about its unusualness,
and so avoids creating a desire to serve readability by skewing
lexical distance in a way not actually warranted by the intent of
the program.

It would also be a terrifically awesome way to cure the batshit
insanity of IO::All without losing the deliciousness that is its
very raison d’être.

-- 
*AUTOLOAD=*_;sub _{s/(.*)::(.*)/print$2,(",$\/"," ")[defined wantarray]/e;$1}
&Just->another->Perl->hack;
#Aristotle Pagaltzis // <http://plasmasturm.org/>


More information about the Pdx-pm-list mailing list