[sf-perl] Help with naming modules
David Muir Sharnoff
sfpug at dave.sharnoff.org
Sat Jul 18 01:18:07 PDT 2009
I hope to soon open source a bunch of modules and commands. I need
help naming them. I developed them for Searchme, Inc, and they are all
currently in the SearchMe:: namespace, but I they should all be moved
out of the SearchMe namespace.
Please propose new names for my modules. I need to choose the
new names in the next few days. I also need to choose how to break
this up into distributions.
Thanks,
-Dave
Here's the list:
program: process_logs
A distributed log processing system. Working from a configuration
that desribes the steps and their dependencies, it runs the runs
the steps using a cluster of systems. Each step can consist of
the following parts:
The sources of the data
A filter (perl snippet) to choose which data to process.
A grouping function to choose how much data to process at once.
A transformation step to massage the data.
The output format.
How to bucketize the output data and into how many buckets.
How to sort the output data.
How to name the output data.
How often to run this step (on daily, weekly, etc data)
The start time (date) for which the step is valid (eg: 'last week')
The end time (date) for which the step is valid (eg: 'yesterday')
program: test_log_configs
A program to run sample data through particular steps defined
by a process_logs configuration file. Outputs Test Anything Protocol.
program: grab_data
Re-assemble bucketized data files generated by process_logs
program: ltsv_to_tsv
Convert bucketized multi-format data files from process_logs into
standard TSVs
program: ltsv_samples
Pull sample data from multi-format data files from process_logs into
program: ltsv_column_unique_counts
Count number of unique values per column in the multi-format data
files from process_logs into
SearchMe::Log::Jobs
The main driver for the process_logs program. It figures out the
time and job dependencies and queues all the jobs into a
SearchMe::JobQueue::DependencyQueue.
SearchMe::JobQueue
Generic queue of "jobs". Most likely to be subclassed
for different situations. Jobs are registered. Hosts are
registered. Jobs may or may not be tied to particular hosts.
Jobs are started on hosts when the hosts have less than their
maximum number of jobs running.
SearchMe::JobQueue::DependencyGraph
Maintain a dependency graph of objects. This does not depend
on SearchMe::JobQueue though it is used in combination with it
in SearchMe::DependencyQueue.
SearchMe::JobQueue::DependencyQueue
Combines SearchMe::JobQueue, SearchMe::JobQueue::DependencyGraph,
and SearchMe::JobQueue::DependencyTask to make a queue of jobs
that is run as their dependencies are met. Jobs are started by
calling perl callbacks.
SearchMe::JobQueue::Job
Base class of the jobs managed by SearchMe::JobQueue.
SearchMe::JobQueue::DependencyJob
Subclass of SearchMe::JobQueue::Job for use with
SearchMe::JobQueue::DependencyQueue
SearchMe::JobQueue::DependencyTask
Lighter than a job, a task is a callback. It doesn't get
scheduled, it simply runs when it can. Used with
SearchMe::JobQueue::DependencyQueue
SearchMe::JobQueue::BackgroundQueue
A job queue of jobs that are run using Proc::Background.
SearchMe::JobQueue::Command
A command line job for SearchMe::JobQueue::BackgroundQueue. Runs
any unix command.
SearchMe::JobQueue::Sort
A command line job that sorts files.
SearchMe::JobQueue::Move
A command line job that moves files, possibly remotely.
SearchMe::JobQueue::Sequence
A compound job that is actually a sequence of jobs.
SearchMe::JobQueue::RemoteDependencyJob
A remote perl callback job, using SearchMe::Misc::RemoteJob
SearchMe::Misc::RemoteJob
Make a RPC call to a remote system. It will start perl on the
remote system. Asynchronous. Requires IO::Event. For use when
farming work out from a central point. The remote job communicates
sychrounously, but the master is asychrounous.
SearchMe::Misc::RemoteJob::MasterCall
Make a callback to the master system from a remote job started
with SearchMe::Misc::RemoteJob.
SearchMe::Log::TSV
The ltsv (log TSV) format used for most log processing jobs.
The set of columns is discovered as the logs are written.
SearchMe::Log::Raw
The writer and parser for raw data files.
SearchMe::Log::Writers
Base clase for data writing modules.
SearchMe::Log::Paths
Translate path specifications into filenames. Generates
regular expressions to understand names of existing data files.
SearchMe::Log::Numbers
Force return values to be integer or float.
SearchMe::Log::Aggregate
Streming generic data aggregation. Based on a configuration,
it generates perl code to aggregate a stream of input data. It
can do nested aggregations (eg: urls, hosts, & domains) and
can do cross-product aggregations (domains x time-of-day). For
nested aggregations, the input must be sorted.
Has built-in functions support for min, max, mean, median,
standard deviation, etc. Also supports custom code. Can limit
memory use.
SearchMe::Aggregation::Stats
A few statistics functions: standard_deviation, percentile,
dominant, etc. All operate on
$SearchMe::Aggregation::Stats::ps->{keep}{$fieldname} so that
SearchMe::Log::Aggregate can have rules like:
p90_foo: percentile(foo => 90)
SearchMe::Log::Metadata
For SearchMe::Log::Jobs metadata, compresses the data
structures by joining duplicates together.
SearchMe::Log::Parsers
Base class for data reading modules.
SearchMe::Log::CountryNames
Country code to name and region table.
SearchMe::Log::Trim
Trim strings to length and make sure they're properly encoded
for inserting into a fixed-length database column. Will limit by
bytes even though there might be multi-byte characters in the
string.
SearchMe::Log::Durations
Parse durations and frequency like: "daily", "last week",
"4th wednesday of each month"
SearchMe::Log::ConfigCheck
Validate SearchMe::Log::Jobs configuration files.
SearchMe::Log::Task
Run SearchMe::Log::Jobs steps, usally on a remote system.
SearchMe::Log::Misc
Misc support funcs for SearchMe::Log::Jobs.
SearchMe::Log::Sql
Insert data strems into a database. Will do safe interpolation
into the query and can apply updates condititionally.
SearchMe::Log::CountryCode
Translate IP addresses to country codes using a dump of data
from http://software77.net/geo-ip/
SearchMe::YAML
Light wrapper around YAML::Syck to improve error reporting.
SearchMe::Iterator
A simple iterator function, from David Wheeler, but
now looks like a filehandle too.
SearchMe::POD2Twiki
Translate POD to twiki format and upload it into a twiki.
Flawed since Pod::Simple::Wiki doesn't manage a very good
translation. Uses WWW::TWikiClient.
SearchMe::IO::Event::Callback
Callback API for IO::Event. This can be IO::Event::Callback
and be added to the IO::Event distribution.
SearchMe::Misc::SmartOpen
Open files locally or remotely, with or without compression
based on the name of the file.
SearchMe::Slurp
Like File::Slurp, but reads & writes remote files using ssh/scp
via SearchMe::Misc::SmartOpen.
SearchMe::Misc::RemoteKiller
Remeber local and remote process ids. On receipt of a
control-C, try to kill them all.
SearchMe::Misc::MergeSort
Merge multiple input streams (SearchMe::Iterator's) using a
sort function to produce a single output stream. Can handle
large numbers of streams with only log N performance degradation.
Still not fast enough.
SearchMe::Misc::Random
Call srand()
SearchMe::Misc::RunCommands
Run multiple commands and process their output in parallel.
SearchMe::Misc::TieFunc
Simple API for tie'ing functions to hashes. A whole bunch
of useful examples like
%q_shell for quoting things safely for /bin/sh
%round for rounding numbers
%q_perl for quoting things safely for perl
%thoucomma for adding commas to large numbers
SearchMe::Misc::Hostname
Figure out hostnames the hard way: `ssh $host hostname` for
when DNS doesn't work right and you can't have aliases.
SearchMe::Misc::List
Things that should have been in Util::List like:
do_sublist( &selector, &callback, @list )
keys_to_regex(@list)
list2text(@list)
SearchMe::Misc::Misc
Misc support functions:
Add line numbers to eval blocks.
Detailed list comparision
Find differences in ordering
SearchMe::Config2
Enhanced YAML-format configuration files. Allows includes
and macros.
SearchMe::Config2::Checker
Configuration file validator. Describe the allowed configuration
in perl struct (embedded YAML) and compare to the
configuration supplied (another perl struct).
program: tsv
A Tab Separated Value slicer/dicer. All simple stuff, but handy
when working with tsvs. Can search them, cut them (by column name),
rotate them, etc.
program: do.hosts
Run commands across a cluster of systems. Can run run N per host,
can run locally (rsync) or remotely. Can limit the number simultaneous.
Can prefix output with the hostname, or not.
program: rkill
Given output of the form, "hostname: PID junk" as produced by
running do.hosts ps x | grep, go kill all the processes on their
hosts.
program: cronjob_wrapper
Run a command. Save the output. Send email only if the command
exits non-zero.
program: pods2htmls
Generate HMTL documentaion for all the pod documentation that
can be found in a source tree. Uses Pod::Html to do the work.
SearchMe::Features::Subprocess
Stream inputs through multiple slave process, resending
items that get dropped and recovering from hung or crashed
slave processes. Not connected to the log processing system,
but dependent on some of the same modules.
More information about the SanFrancisco-pm
mailing list