APM: File reading optimization

Brian Michalk michalk at awpi.com
Thu Feb 19 09:03:49 CST 2004


Thanks to all for their input on this matter.

The packet length solution would certainly be efficient, but I'm not willing
to set policy on something like that that I might regret in the future.
Take for instance the GPS data.  Unless I write a shim, I can't very well
have it send me the format we discussed.  There are many other devices, like
gyroscopes, and a scanning laser that have serial input.

I've come up with some different scenarios.
1) Write kernel modules that do not emit data until a complete line is ready
to send, and go with blocking reads.  If I carefully mix select() and
readline(), this will work.  I just have to be careful with executing
readlines.
2) Rewrite my base class to fork, or spawn threads, and go with blocking
reads.  This would involve some complexity with communicating back to the
parent when a message or data is received.  It might be pretty, or ugly.
I'm still looking into this.  If there were a select() equivalent, except
for signal, or messages ready, then this wouldn't be so bad.  My last
experience with signals in my perl modules led to system lockups, but from
what I'm reading, 5.8 has solved a lot of these issues.

This would all be so much easier if someone had a select() function in the
kernel API that would return on some line separator, or even "x" characters
of data ready.  Now, select always returns true if there is *any* data
available.  I understand the consequences though.  It would require a lot
more overhead to scan all of the input for line terminators.  Even so, it
would certainly be more efficient than any readline() command, because the
minute you do line oriented IO, every packet of data is scanned for a
terminator.

Thoughts, feelings, comments?

> -----Original Message-----
> From: austin-bounces at mail.pm.org [mailto:austin-bounces at mail.pm.org]On
> Behalf Of Wayne Walker
> Sent: Tuesday, February 17, 2004 3:17 PM
> To: Chris Vaughan
> Cc: Austin at mail.pm.org
> Subject: Re: APM: File reading optimization
>
>
> On Tue, Feb 17, 2004 at 10:10:55AM -0800, Chris Vaughan wrote:
> > Brian,
> >
> > If you have the flexibility, you may want to consider changing
> > your protocol away from separators and towards packets.  If you
> > don't have the flexibility, then don't read on.
> >
> > Have the sender send packed data length (in bytes), then the
> > data itself, in a loop.  The reader would simply block waiting
> > for the first 4 bytes, construct a count by unpacking the
> > integer, and block reading that count of the handle, forming
> > your message.  After the reader reads the message, it blocks
> > again waiting for the next count.
> >
> > The downside to this solution is that the reader has two logical
> > reading states.  If the reader gets out of sync for any reason,
> > you're screwed.
>
> This is a pretty good solution.  The reader/writer should never get out
> of sync, but I've just been through a nightmare on this (that I created
> myself).  Just remember that send() and recv() are NOT guaranteed to
> send/get the number of bytes you told it to!
>
> >
> > Regards,
> > Chris
> >
> > --- Brian Michalk <michalk at awpi.com> wrote:
> > > I am in a quandry about how to do efficient filehandle
> > > reading.
> > > I'm trying to make it uniform across all of the filehandles
> > > that may be
> > > named pipes, device driver handles, network sockets, or stdio.
> > >
> > > I have some slow devices on a serial line, and other fast
> > > devices that
> > > continally generate data at high rates.  My protocol is all
> > > line oriented,
> > > and that naturally leads me to use something like <>, but read
> > > the
> > > following:
> > > perldoc -f select
> > >             WARNING: One should not attempt to mix buffered
> > > I/O (like "read"
> > >             or <FH>) with "select", except as permitted by
> > > POSIX, and even
> > >             then only on POSIX systems. You have to use
> > > "sysread" instead.
> > >
> > > However sysread doesn't care about line separators.  Instead,
> > > I have to
> > > search through the incoming data for separators and store
> > > partial reads in
> > > my own buffer.  This is not a problem, I have code, and it
> > > works.  The
> > > performance is bad.  C code would have the same type of
> > > problem.
> > >
> > > The serial port dribbles in GPS data at 9600 baud, causing the
> > > select() to
> > > return without a complete line being available, so I store all
> > > of the one or
> > > two characters at a time in the internal buffer.  The radar
> > > data, however
> > > can come in at 100hertz, at about 12K of data per line, and
> > > I've got
> > > buffering turned on for performance, so I have to go searching
> > > through the
> > > data to find the line separators.
> > >
> > > Are there any better solutions?
> > >
> > > _______________________________________________
> > > Austin mailing list
> > > Austin at mail.pm.org
> > > http://mail.pm.org/mailman/listinfo/austin
> >
> >
> > =====
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >     Chris Vaughan    | "I love deadlines.  I like the
> >                      |  swooshing sound as they fly by."
> >  vaughan99 at yahoo.com |   - Douglas Adams
> >
> > __________________________________
> > Do you Yahoo!?
> > Yahoo! Finance: Get your refund fast by filing online.
> > http://taxes.yahoo.com/filing.html
> > _______________________________________________
> > Austin mailing list
> > Austin at mail.pm.org
> > http://mail.pm.org/mailman/listinfo/austin
>
> --
>
> Wayne Walker
> wwalker at bybent.com                 Do you use Linux?!
> http://www.bybent.com              Get Counted!  http://counter.li.org/
> Perl - http://www.perl.org/        Perl User Groups - http://www.pm.org/
> Jabber IM:  wwalker at jabber.phototropia.org       AIM:     lwwalkerbybent
> _______________________________________________
> Austin mailing list
> Austin at mail.pm.org
> http://mail.pm.org/mailman/listinfo/austin
>




More information about the Austin mailing list