[sf-perl] Help with naming modules

David Muir Sharnoff sfpug at dave.sharnoff.org
Sat Jul 18 01:18:07 PDT 2009


I hope to soon open source a bunch of modules and commands.   I need
help naming them.  I developed them for Searchme, Inc, and they are all
currently in the SearchMe:: namespace, but I they should all be moved
out of the SearchMe namespace.

Please propose new names for my modules.  I need to choose the
new names in the next few days.  I also need to choose how to break
this up into distributions.

Thanks,

-Dave

Here's the list:

program: process_logs

	A distributed log processing system.  Working from a configuration
	that desribes the steps and their dependencies, it runs the runs
	the steps using a cluster of systems.  Each step can consist of
	the following parts:
	
	The sources of the data
	A filter (perl snippet) to choose which data to process.
	A grouping function to choose how much data to process at once.
	A transformation step to massage the data.
	The output format.
	How to bucketize the output data and into how many buckets.
	How to sort the output data.
	How to name the output data.
	How often to run this step (on daily, weekly, etc data)
	The start time (date) for which the step is valid (eg: 'last week')
	The end time (date) for which the step is valid (eg: 'yesterday')

program: test_log_configs

	A program to run sample data through particular steps defined
	by a process_logs configuration file.  Outputs Test Anything Protocol.

program: grab_data

	Re-assemble bucketized data files generated by process_logs

program: ltsv_to_tsv
	
	Convert bucketized multi-format data files from process_logs into
	standard TSVs

program: ltsv_samples

	Pull sample data from multi-format data files from process_logs into

program: ltsv_column_unique_counts

	Count number of unique values per column in the multi-format data 
	files from process_logs into

SearchMe::Log::Jobs

	The main driver for the process_logs program.  It figures out the
	time and job dependencies and queues all the jobs into a 
	SearchMe::JobQueue::DependencyQueue.

SearchMe::JobQueue

	Generic queue of "jobs".   Most likely to be subclassed
	for different situations.  Jobs are registered.  Hosts are
	registered.  Jobs may or may not be tied to particular hosts.
	Jobs are started on hosts when the hosts have less than their
	maximum number of jobs running.

SearchMe::JobQueue::DependencyGraph
	
	Maintain a dependency graph of objects.  This does not depend
	on SearchMe::JobQueue though it is used in combination with it
	in SearchMe::DependencyQueue.

SearchMe::JobQueue::DependencyQueue

	Combines SearchMe::JobQueue, SearchMe::JobQueue::DependencyGraph,
	and SearchMe::JobQueue::DependencyTask to make a queue of jobs
	that is run as their dependencies are met.  Jobs are started by
	calling perl callbacks.

SearchMe::JobQueue::Job

	Base class of the jobs managed by SearchMe::JobQueue.

SearchMe::JobQueue::DependencyJob
	
	Subclass of SearchMe::JobQueue::Job for use with 
	SearchMe::JobQueue::DependencyQueue

SearchMe::JobQueue::DependencyTask

	Lighter than a job, a task is a callback.  It doesn't get 
	scheduled, it simply runs when it can.  Used with
	SearchMe::JobQueue::DependencyQueue

SearchMe::JobQueue::BackgroundQueue

	A job queue of jobs that are run using Proc::Background.

SearchMe::JobQueue::Command

	A command line job for SearchMe::JobQueue::BackgroundQueue.  Runs
	any unix command.

SearchMe::JobQueue::Sort

	A command line job that sorts files.
	
SearchMe::JobQueue::Move
	
	A command line job that moves files, possibly remotely.

SearchMe::JobQueue::Sequence

	A compound job that is actually a sequence of jobs.

SearchMe::JobQueue::RemoteDependencyJob
	
	A remote perl callback job, using SearchMe::Misc::RemoteJob

SearchMe::Misc::RemoteJob

	Make a RPC call to a remote system.  It will start perl on the
	remote system.  Asynchronous.  Requires IO::Event.  For use when
	farming work out from a central point.  The remote job communicates
	sychrounously, but the master is asychrounous.

SearchMe::Misc::RemoteJob::MasterCall

	Make a callback to the master system from a remote job started
	with SearchMe::Misc::RemoteJob.

SearchMe::Log::TSV
	
	The ltsv (log TSV) format used for most log processing jobs.
	The set of columns is discovered as the logs are written.

SearchMe::Log::Raw

	The writer and parser for raw data files.

SearchMe::Log::Writers
	
	Base clase for data writing modules.

SearchMe::Log::Paths

	Translate path specifications into filenames.  Generates
	regular expressions to understand names of existing data files.

SearchMe::Log::Numbers
	
	Force return values to be integer or float.

SearchMe::Log::Aggregate

	Streming generic data aggregation.  Based on a configuration,
	it generates perl code to aggregate a stream of input data.  It
	can do nested aggregations (eg: urls, hosts, & domains) and 
	can do cross-product aggregations (domains x time-of-day).  For
	nested aggregations, the input must be sorted.

	Has built-in functions support for min, max, mean, median, 
	standard deviation, etc.  Also supports custom code.  Can limit
	memory use.

SearchMe::Aggregation::Stats
	
	A few statistics functions: standard_deviation, percentile,
	dominant, etc.  All operate on 
	$SearchMe::Aggregation::Stats::ps->{keep}{$fieldname} so that
	SearchMe::Log::Aggregate can have rules like:
		p90_foo: percentile(foo => 90)

SearchMe::Log::Metadata

	For SearchMe::Log::Jobs metadata, compresses the data 
	structures by joining duplicates together.

SearchMe::Log::Parsers

	Base class for data reading modules.

SearchMe::Log::CountryNames

	Country code to name and region table.

SearchMe::Log::Trim
	
	Trim strings to length and make sure they're properly encoded
	for inserting into a fixed-length database column.  Will limit by
	bytes even though there might be multi-byte characters in the
	string.

SearchMe::Log::Durations
	
	Parse durations and frequency like: "daily", "last week",
	"4th wednesday of each month"

SearchMe::Log::ConfigCheck
	
	Validate SearchMe::Log::Jobs configuration files.

SearchMe::Log::Task

	Run SearchMe::Log::Jobs steps, usally on a remote system.
	
SearchMe::Log::Misc

	Misc support funcs for SearchMe::Log::Jobs.

SearchMe::Log::Sql

	Insert data strems into a database.  Will do safe interpolation 
	into the query and can apply updates condititionally.

SearchMe::Log::CountryCode

	Translate IP addresses to country codes using a dump of data
	from http://software77.net/geo-ip/
	
SearchMe::YAML

	Light wrapper around YAML::Syck to improve error reporting.

SearchMe::Iterator
	
	A simple iterator function, from David Wheeler, but
	now looks like a filehandle too.  

SearchMe::POD2Twiki

	Translate POD to twiki format and upload it into a twiki.
	Flawed since Pod::Simple::Wiki doesn't manage a very good
	translation.  Uses WWW::TWikiClient.

SearchMe::IO::Event::Callback

	Callback API for IO::Event.  This can be IO::Event::Callback
	and be added to the IO::Event distribution.

SearchMe::Misc::SmartOpen

	Open files locally or remotely, with or without compression
	based on the name of the file.

SearchMe::Slurp
	
	Like File::Slurp, but reads & writes remote files using ssh/scp
	via SearchMe::Misc::SmartOpen.

SearchMe::Misc::RemoteKiller

	Remeber local and remote process ids.  On receipt of a
	control-C, try to kill them all.

SearchMe::Misc::MergeSort

	Merge multiple input streams (SearchMe::Iterator's) using a
	sort function to produce a single output stream.  Can handle
	large numbers of streams with only log N performance degradation.
	Still not fast enough.

SearchMe::Misc::Random

	Call srand() 

SearchMe::Misc::RunCommands

	Run multiple commands and process their output in parallel.

SearchMe::Misc::TieFunc

	Simple API for tie'ing functions to hashes.  A whole bunch 
	of useful examples like 

		%q_shell for quoting things safely for /bin/sh
		%round for rounding numbers
		%q_perl for quoting things safely for perl
		%thoucomma for adding commas to large numbers

SearchMe::Misc::Hostname
	
	Figure out hostnames the hard way: `ssh $host hostname` for
	when DNS doesn't work right and you can't have aliases.

SearchMe::Misc::List

	Things that should have been in Util::List like:
		
		do_sublist( &selector, &callback, @list )
		keys_to_regex(@list)
		list2text(@list)

SearchMe::Misc::Misc
	
	Misc support functions: 
		Add line numbers to eval blocks.
		Detailed list comparision
		Find differences in ordering

SearchMe::Config2
	
	Enhanced YAML-format configuration files.  Allows includes
	and macros.

SearchMe::Config2::Checker
	
	Configuration file validator.  Describe the allowed configuration
	in perl struct (embedded YAML) and compare to the 
	configuration supplied (another perl struct).

program: tsv

	A Tab Separated Value slicer/dicer.  All simple stuff, but handy
	when working with tsvs.  Can search them, cut them (by column name),
	rotate them, etc.

program: do.hosts

	Run commands across a cluster of systems.  Can run run N per host, 
	can run locally (rsync) or remotely.  Can limit the number simultaneous.
	Can prefix output with the hostname, or not.

program: rkill

	Given output of the form, "hostname: PID junk" as produced by
	running do.hosts ps x | grep, go kill all the processes on their
	hosts.
	
program: cronjob_wrapper

	Run a command.  Save the output.  Send email only if the command 
	exits non-zero.

program: pods2htmls
	
	Generate HMTL documentaion for all the pod documentation that
	can be found in a source tree.   Uses Pod::Html to do the work.

SearchMe::Features::Subprocess

	Stream inputs through multiple slave process, resending
	items that get dropped and recovering from hung or crashed
	slave processes.  Not connected to the log processing system,
	but dependent on some of the same modules.



More information about the SanFrancisco-pm mailing list