APM: File reading optimization

Brian Michalk michalk at awpi.com
Thu Feb 19 14:53:39 CST 2004


Thanks.  I think I'm leaning towards a variant of this.

I have a big base class that implements all of the IO for network, devices,
etc.  Each type of node, like a GPS, or radar is subclassed, so there is
really only one program.  Sometimes two, if I need a fast C program to
process data from the device driver, but in this case, the results come out
on fifos at the other end.

A simple node would be something like the GPS node.  It listens to four UDP
broadcast sockets, one for distance information, one for commands, and so
forth.  It opens a filehandle to /dev/ttyS0, which is the input from the GPS
receiver.  When clients connect to me, they can connect on a "command" port,
or a "data" port.  The command port is an interactive port where the client
can set and get various information, and set runmodes, etc.  The "data" port
is mostly one way, where the client connects, asks for a version of data,
and from then on out, it is simply a one way data stream of that data to the
client.

It looks like callbacks are what I'm looking for here.  It seems I would
change my existing select groups into callback subroutines.  Crude examples
will follow.  This is coming off the top of my head and not from perl -cw.

# assume an exagerrated simple case:
# 1 UDP broadcast commands receive
# 1 filehandle from device receive
# 1 filehandle to device write
# 1 filehandle to command socket clients read/write
# 1 filehandle to data    socket clients read/write

# after blessings, etc.
$self->{bc_commands_sub} = \&$self->process_broadcast_command();  #
alternatively $self->can("process_bc_cmd");

Hmmm, I just looked at the RPC implementation in the Advanced Perl
Programming book.  It uses freeze/thaw.  At first glance this looks like a
data copy.  It would work fine for slow devices, but really clog up on the
fast ones.

Perhaps something more like a signal handler.  Keeping the same forking
idea, or threads, use shared memory for my data structures and have a signal
handler (we get 32 minus 3 user defined signals now with threaded perl) in
the base class gather up the appropriate data from shared memory and then
call the data processing function.

Ony the reading file handles should fork/thread.  Blocking writes are not a
problem, and are preferred.
Thread gets data, copies it to shared memory, then throws signal to parent
process for handling.

Mostly ramblings here on my part.  Trying to flesh this thing out.


> -----Original Message-----
> From: Bill Raty [mailto:bill_raty at yahoo.com]
> Sent: Thursday, February 19, 2004 1:33 PM
> To: Brian Michalk
> Subject: RE: APM: File reading optimization
>
>
> Brian,
>
> This may amount to what you've already mentioned, but here goes
> anyway:
>
> * Open up a RPC.pm interface (i.e. a perl sub) to your program
> where you would normally place your event that is to occur when
> a line has been read.  Info on how to do this can be found in
> the ORA "Advanced Perl Programming" book.
>
> * Start up perl programs that buffer your input waiting for LF,
> then make the call to your event subroutine when the line has
> been received.
>
> You're using IPC via RPC in this, and forking is going on.  On
> the other hand the pieces get simpler, and you've turned your
> program into an event driven server (even though this may be
> happening on the loopback).
>
> Startup of all this shouldn't be problematic:
>
> foreach my $perl_prog (qw(main_prog.pl gps_buffer.pl etc.pl)) {
>   system qw( $perl_prog & );
> }
>
> Crude, but the forking is done by the OS.  Double forking in
> perl to detach a process isn't tough either.  It can be found
> in the Perl Cookbook.
>
> -Bill
>
> --- Brian Michalk <michalk at awpi.com> wrote:
> > Thanks to all for their input on this matter.
> >
> > The packet length solution would certainly be efficient, but
> > I'm not willing
> > to set policy on something like that that I might regret in
> > the future.
> > Take for instance the GPS data.  Unless I write a shim, I
> > can't very well
> > have it send me the format we discussed.  There are many
> > other devices, like
> > gyroscopes, and a scanning laser that have serial input.
> >
> > I've come up with some different scenarios.
> > 1) Write kernel modules that do not emit data until a
> > complete line is ready
> > to send, and go with blocking reads.  If I carefully mix
> > select() and
> > readline(), this will work.  I just have to be careful with
> > executing
> > readlines.
> > 2) Rewrite my base class to fork, or spawn threads, and go
> > with blocking
> > reads.  This would involve some complexity with communicating
> > back to the
> > parent when a message or data is received.  It might be
> > pretty, or ugly.
> > I'm still looking into this.  If there were a select()
> > equivalent, except
> > for signal, or messages ready, then this wouldn't be so bad.
> > My last
> > experience with signals in my perl modules led to system
> > lockups, but from
> > what I'm reading, 5.8 has solved a lot of these issues.
> >
> > This would all be so much easier if someone had a select()
> > function in the
> > kernel API that would return on some line separator, or even
> > "x" characters
> > of data ready.  Now, select always returns true if there is
> > *any* data
> > available.  I understand the consequences though.  It would
> > require a lot
> > more overhead to scan all of the input for line terminators.
> > Even so, it
> > would certainly be more efficient than any readline()
> > command, because the
> > minute you do line oriented IO, every packet of data is
> > scanned for a
> > terminator.
> >
> > Thoughts, feelings, comments?
> >
> > > -----Original Message-----
> > > From: austin-bounces at mail.pm.org
> > [mailto:austin-bounces at mail.pm.org]On
> > > Behalf Of Wayne Walker
> > > Sent: Tuesday, February 17, 2004 3:17 PM
> > > To: Chris Vaughan
> > > Cc: Austin at mail.pm.org
> > > Subject: Re: APM: File reading optimization
> > >
> > >
> > > On Tue, Feb 17, 2004 at 10:10:55AM -0800, Chris Vaughan
> > wrote:
> > > > Brian,
> > > >
> > > > If you have the flexibility, you may want to consider
> > changing
> > > > your protocol away from separators and towards packets.
> > If you
> > > > don't have the flexibility, then don't read on.
> > > >
> > > > Have the sender send packed data length (in bytes), then
> > the
> > > > data itself, in a loop.  The reader would simply block
> > waiting
> > > > for the first 4 bytes, construct a count by unpacking the
> > > > integer, and block reading that count of the handle,
> > forming
> > > > your message.  After the reader reads the message, it
> > blocks
> > > > again waiting for the next count.
> > > >
> > > > The downside to this solution is that the reader has two
> > logical
> > > > reading states.  If the reader gets out of sync for any
> > reason,
> > > > you're screwed.
> > >
> > > This is a pretty good solution.  The reader/writer should
> > never get out
> > > of sync, but I've just been through a nightmare on this
> > (that I created
> > > myself).  Just remember that send() and recv() are NOT
> > guaranteed to
> > > send/get the number of bytes you told it to!
> > >
> > > >
> > > > Regards,
> > > > Chris
> > > >
> > > > --- Brian Michalk <michalk at awpi.com> wrote:
> > > > > I am in a quandry about how to do efficient filehandle
> > > > > reading.
> > > > > I'm trying to make it uniform across all of the
> > filehandles
> > > > > that may be
> > > > > named pipes, device driver handles, network sockets, or
> > stdio.
> > > > >
> > > > > I have some slow devices on a serial line, and other
> > fast
> > > > > devices that
> > > > > continally generate data at high rates.  My protocol is
> > all
> > > > > line oriented,
> > > > > and that naturally leads me to use something like <>,
> > but read
> > > > > the
> > > > > following:
> > > > > perldoc -f select
> > > > >             WARNING: One should not attempt to mix
> > buffered
> > > > > I/O (like "read"
> > > > >             or <FH>) with "select", except as permitted
> > by
> > > > > POSIX, and even
> > > > >             then only on POSIX systems. You have to use
> > > > > "sysread" instead.
> > > > >
> > > > > However sysread doesn't care about line separators.
> > Instead,
> > > > > I have to
> > > > > search through the incoming data for separators and
> > store
> > > > > partial reads in
> > > > > my own buffer.  This is not a problem, I have code, and
> > it
> > > > > works.  The
> > > > > performance is bad.  C code would have the same type of
> > > > > problem.
> > > > >
> > > > > The serial port dribbles in GPS data at 9600 baud,
> > causing the
> > > > > select() to
> > > > > return without a complete line being available, so I
> > store all
> > > > > of the one or
> > > > > two characters at a time in the internal buffer.  The
> > radar
> > > > > data, however
> > > > > can come in at 100hertz, at about 12K of data per line,
> > and
> > > > > I've got
> > > > > buffering turned on for performance, so I have to go
> > searching
> > > > > through the
> > > > > data to find the line separators.
> > > > >
> > > > > Are there any better solutions?
> > > > >
> > > > > _______________________________________________
> > > > > Austin mailing list
> > > > > Austin at mail.pm.org
> > > > > http://mail.pm.org/mailman/listinfo/austin
> > > >
> > > >
> > > > =====
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > >     Chris Vaughan    | "I love deadlines.  I like the
> > > >                      |  swooshing sound as they fly by."
> > > >  vaughan99 at yahoo.com |   - Douglas Adams
> > > >
> > > > __________________________________
> > > > Do you Yahoo!?
> > > > Yahoo! Finance: Get your refund fast by filing online.
> > > > http://taxes.yahoo.com/filing.html
> > > > _______________________________________________
> > > > Austin mailing list
> > > > Austin at mail.pm.org
> > > > http://mail.pm.org/mailman/listinfo/austin
> > >
> > > --
> > >
> > > Wayne Walker
> > > wwalker at bybent.com                 Do you use Linux?!
> > > http://www.bybent.com              Get Counted!
> > http://counter.li.org/
> > > Perl - http://www.perl.org/        Perl User Groups -
> > http://www.pm.org/
> > > Jabber IM:  wwalker at jabber.phototropia.org       AIM:
> > lwwalkerbybent
> > > _______________________________________________
> > > Austin mailing list
> > > Austin at mail.pm.org
> > > http://mail.pm.org/mailman/listinfo/austin
> > >
> >
> > _______________________________________________
> > Austin mailing list
> > Austin at mail.pm.org
> >
> === message truncated ===
>
>
> =====
> "There let the pealing organ blow,
> To the full-voiced choir below,
> In service high, and anthems clear,
> As may with sweetness, through mine ear,
> Dissolve me into ecstasies,
> And bring all Heav'n before mine eyes".
> John Milton - Il Penseroso (1632).
>




More information about the Austin mailing list