[sb.pm] Design Issues (was: When should we meet next?)

Kostas Pentikousis kostas at cs.sunysb.edu
Thu Apr 29 18:52:30 CDT 2004


On Thu, 15 Apr 2004, Robert Rothenberg wrote:
|On 4/14/2004 3:11 PM Siddhartha Basu wrote:
|> * What will be a good approach for taking inputs for a program specially
|> if i want to run it from a cron job.
|>     * From the command line by using @ARGV or Getopt::Long.
|>     * Or by setting environmental variables and accessing from $ENV
|>         hash.
|> Right now, i have mix of both but it is becoming cumbersome and i am
|> thinking about having settling down with one approach.

I do not use ENV variables. The main reason is that you need to
document how and when they are used. And you pollute one's
environment regardless if s/he uses the program or not.  If you
are concerned about code maintenance, in particular if you want
other people use your code, avoid ENV variables.

Command line arguments also need documentation, but if a new-comer
in your lab asks to use your code, you do not need to start
explaining ENV settings and such (what if bash, what if csh, what
if windows?). With command line options, you only need to email
her the command line.

|I'd use command-line arguments, since it's easier to pass arguments to when
|running the scripts as one-shots.

If you have several true/false switches, command line arguments
are probably the best way to go (esp. for a cron job). If you also
assign default values in your code, then the switches are needed
only for "exceptional cases" (possibly making the cron entry
considerably shorter). If, on the other hand, each option needs a
value, then having 8 switches and 10 value-pairs gets kind of
ugly, no?

I would consider using default values in the code and a
configuration file. Configuration files are pretty standard in
Unix/Linux, and you can also use them without a problem on Windows
(no need to start playing with the registry), should you decide to
run your code there as well. Configuration files are easy to
document (and you just need to attach them in the aforementioned
email to the new-comer), and can be created/updated on the fly,
even just before the main program is called.

If you feel that many parameters are pretty standard, take
advantage of the __DATA__ section strategically located at the end
of your Perl code.  This way, you keep default values and code
together, eliminating the need for "code.pl" and "code.config".

In sum, always use default values, don't worry if you only have
too many on/off command line options, and consider (if you have
more than a dozen parameters) reading defaults from a
configuration file.

|> column and then check whether the data in that column is present in
|> another text file. So, my approach is to either read the second text
|
|BioPerl has some flat database file drivers.  DBD::CSV, DBD::Sprite or
|DBD::File might provide some DBI drivers to handle flat files and simplify
|your work.

I personally like CSV: it's text, portable and you can import it
to almost anything (from spreadsheets to DBs, even to plotting
applications). Of course, CSVs are extremely easy to handle in
Perl (with or without CPAN modules)

|> Nothing else comes to my mind at this moment so i am dealing with a
|> bunch of scattared scripts.

I would consider putting the balk of the code in a module.  It
will simplify usage (the scripts will contain a few module
function calls), centralize the code, and you may even consider
uploading it to CPAN :)

|Where are these text files coming from that they are in different formats?
|
|> * What kind of format should i use for writing log file. Flat text file
|> or xml format.
|
|I'd avoid XML like the plague, unless you need to pass the logs to a program
|which requires it in XML.

I'd prefer plain text, possibly CSV. However, XML is not the
plague :) and it's not a bad idea if you move the logs around from
one application to another. Besides, XML can render the logs
self-documenting. If you decide to use it, avoid manual XML
generation. Use a standard CPAN module for reading and writing.

|Do you really need sophisticated markup for a log file?

Not really, at least for the garden variety log file.

Just my $0.02.

Best regards,

Kostas



More information about the StonyBrook-PM mailing list