[Cincinnati-pm] November Virtual cinci.pm - Nov. 10th
Ernst, Kevin
Kevin.Ernst at cchmc.org
Wed Nov 10 19:06:11 PST 2021
Thanks, Jon, for the perlform[0] presentation!
Apropos of that, here's another example of using Perl formats (see
attached, line 497), and the output they produce.
The idea was to give me a monthly report of disk usage by username,
path, and file extension. The script uses multiple formats, including
one for a page header. The formats in this case don't have any weird
ANSI stuff going on, so data lines line up with the picture lines.
In a cron job, I pipe this into another script utilizing MIME::Lite[1]
under the hood to create a MIME multipart email, with the HTML part
being the plain text part wrapped in <html><body><pre> tags so it looks
right for people using Outlook.
Here's what one of the formats looks like:
format BYEXT_TOP =
Top disk utilization by file extension
================================================================================
extension disk utilization
--------- ----------------
.
format BYEXT =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<... @>>>>>>>>>>>>>>>
@LINE
.
…and the corresponding chunk of output would look something like this:
Top disk utilization by file extension
================================================================================
extension disk utilization
--------- ----------------
.bed 32.5T
.fastq.gz 14.7T
.bam 11.5T
.fq.gz 8.7T
.vcf 7.1T
.fastq 4.9T
<none> 3.6T
.vcf.gz 3.1T
.body.sam 2.6T
.bgen 2.3T
.txt 1.4T
.pgen 1.3T
.body.sorted.sam 885.7G
.fq 874.8G
.read2.fastq.gz 865.7G
.read1.fastq.gz 850.7G
.trim.srt.bam 806.0G
.pvar 750.6G
.bwt2glob.unmap.fastq 735.4G
.sam 678.1G
…where the input file used to generate the report would've been created
in advance with a 'find -printf' command line like this:
find . -type f -printf '%p\t%s\t%T@\t%A@\t%u\n' > usage.tsv
Now, disclaimer: I don't earn a living writing Perl, so I wouldn't hold
this up as an example of any kind of best practices or anything. But I
do *really* like Perl formats, and every few years I find a new reason
to bust them out for some little reporting task like this one!
--Kevin
[0]: https://perldoc.perl.org/perlform
[1]: https://metacpan.org/pod/MIME::Lite#Create-a-multipart-message
On 11/10/21 at 8:58 AM, Jon Gentle wrote:
> Good Morning,
>
> Quick reminder that we will be doing a virtual cinci.pm this evening
> about perlform. Hope to see or hear you then.
-------------- next part --------------
#!/usr/bin/env perl
##
## Summarize disk usage based on directory, owner, file extension
##
## Input: the results of a previous 'find -printf' invocation, the details
## of which are described in the manual (at the bottom of this file)
##
## Output: (by default) top five subdirectories by disk utilization, two
## levels below the top-level directory in the 'find' results; other
## options are configurable; see the output of '--help' or the manual
##
## Author: Kevin Ernst <kevin.ernst -at- cchmc.org>
## Date: 29 October 2019; major updates ~17 August 2020
##
use strict;
use warnings;
use autodie;
use Carp; # for 'croak'
use English; # for $ACCUMULATOR, $FORMAT_NAME, etc.
# someday...
#package ReportUsage;
#require Exporter;
#@ISA = qw(Exporter);
#@EXPORT_OK = qw( humanize volume_free_space );
# someday...
#BEGIN { $ENV{ANSI_COLORS_DISABLED} = 1 unless -t STDOUT }
#use Term::ANSIColor ':constants';
our $TSV_HEADER = "path\tsize\tmtime\tctime\towner";
our $HEADER_SCAN_LINES = 0; # header must appear in this many lines
# or '0' to disable checking for a header
our $DEFAULT_EXTENSION_LENGTH = 8; # max. length of extension(s) to consider
our $DEFAULT_EXTENSION_NUMBER = 3; # dont' consider more than 3 extensions
our $DEFAULT_LIMIT = 5 ; # limit top "n" to this many (0=unlimited)
our $DEFAULT_DEPTH = 2; # depth from "root" of 'find' command
our $DEFAULT_SEPARATOR = "\t"; # default output separator for '--raw'
our ($QUIET, $DEBUG, $WEND); # $WEND: print warnings/errors w/ context?
# need to use these in formats, so they need to be package global
our @LINE; # holds format line; see perlform "WARNINGS"
our $BASEPATH; # longest common parent path
our $DEPTH; # how deep to descend from there
our $DISKFREE; # stats on disk utilization
# allow this script to be used like a library, but also run like a script
main() if not caller();
sub main {
use Getopt::Long;
use Pod::Usage;
my $help;
my $man;
my $filename = '';
my $human;
my $raw;
my $deptharg = '';
my @depthlist;
# output modes ("by depth" is default)
my $bydepth = 1;
my $byowner = 0;
my $byext = 0;
my $limit = $DEFAULT_LIMIT;
my $sep = $DEFAULT_SEPARATOR;
my $depths = {};
my $owners = {};
my $exts = {};
my $usage = {};
Getopt::Long::Configure ('bundling');
# source: "Documentation and help texts" section of Getopt::Long
GetOptions(
'help|?' => \$help,
'manual' => \$man,
'human-readable|human|h!' => \$human,
'limit|l=i' => \$limit,
'depth|subdirs|subdirectories|d=s' => \$deptharg,
'all|a' => sub { $bydepth = $byowner = $byext = 1; },
'by-depth|by-directory|by-dir!' => \$bydepth,
'by-owner!' => \$byowner,
'by-extension|by-ext!' => \$byext,
'raw|r' => \$raw,
'quiet|q!' => \$QUIET,
'debug' => \$DEBUG,
'separator|sep|s=s' => \$sep,
) or pod2usage(-exitval => 2);
pod2usage(-exitval => 0) if $help;
pod2usage(-exitval => 0, -verbose => 2) if $man;
$filename = shift @ARGV;
# fall back to reading from stdin implicitly if stdin is not a terminal
$filename = '-' if not defined($filename) and not -t STDIN; ## no critic
@depthlist = $deptharg ? split /,/, $deptharg : ($DEFAULT_DEPTH);
# add keys to $depths hashref for each one of the specified depths
$depths = { map { $_ => {} } @depthlist };
# if $DEBUG is *not* set, then append a new line to errors/warnings so that
# the line number won't be shown
$WEND = defined $DEBUG ? "" : ".\n";
die "ERROR: an input file generated by 'find' is required. Try "
. "'--help'$WEND" unless $filename;
die "ERROR: the file '$filename' does not exist or is unreadable.$WEND"
unless -r $filename or $filename eq '-';
if ($raw) {
$QUIET = 1 unless defined $QUIET;
warn "WARNING: '-h' / '--human' is ignored for \"raw\" output mode$WEND"
if $human;
die "ERROR: the '-r' / '--raw' option requires one and only one of the\n"
. " '--by-owner', '--by-extension', or '--by-depth' options"
. "$WEND" if not ($byowner or $byext or $bydepth);
# only accept *one* display option and *one* depth for "raw" output; other
# combinations don't make sense because you'll have different kinds of
# statistics concatenated together with no separators in between
die "ERROR: the '-r' / '--raw' option accepts no more than ONE display "
. "option.\n"
. " See '--help' or '--manual' for details$WEND"
if $raw and $byowner + $byext + $bydepth > 1;
die "ERROR: the '-r' / '--raw' option accepts no more than ONE depth.\n" .
" See '--help' or '--manual' for details$WEND"
if scalar(@depthlist) > 1;
} # if not $raw
# default to human-readable figures
$human = 1 unless defined $human;
# returns hash reference of disk usage stats, and longest common prefix
warn "Parsing input file '$filename'...\n" unless $QUIET;
($usage, $BASEPATH) = parse_usage($filename);
# check to see that we actually got something back
die "ERROR: parsing input file failed (empty results)$WEND"
unless keys %$usage and $BASEPATH;
if ($bydepth) {
my $depthadd = 0;
# consider "depth" to mean "this many subdirs from the longest common
# path"; count how many path elements are in the $BASEPATH and add that
# many to $depth argument ('grep' filters empty list elements)
$depthadd = grep { $_ and $_ ne '.' } split /\//, $BASEPATH;
foreach my $depth (@depthlist) {
warn "Computing usage by depth=$depth in hierarchy...\n"
unless $QUIET;
$depths->{$depth} = usage_by_depth(
$usage,
depth=>$depth + $depthadd
);
# it's possible to go too deep with '-d' and get no results; check
# FIXME: see GitLab #47
if ($depths->{$depth}) {
$depths->{$depth} = reverse_sort_and_take_n(
$depths->{$depth},
n=>$limit
);
} else {
warn "WARNING: '-d' / '--depth' option ($depth) was *too* deep ".
"and yielded no results$WEND";
delete $depths->{$depth};
}
} # for each @depthlist
} # if '--by-ext'
if ($byowner) {
warn "Computing usage by file owner...\n" unless $QUIET;
$owners = reverse_sort_and_take_n( usage_by_owner($usage), n=>$limit );
}
if ($byext) {
warn "Computing usage by file extension...\n" unless $QUIET;
$exts = reverse_sort_and_take_n( usage_by_ext($usage), n=>$limit );
}
print "\n" unless $QUIET;
# if it's a mounted filesystem, get stats for it
$DISKFREE = disk_free($BASEPATH);
if ($DISKFREE) {
$FORMAT_NAME = 'DISKFREE';
write;
# start a new page
$FORMAT_LINES_LEFT = 0;
}
if ($raw) {
if ($bydepth) {
foreach my $depth (@depthlist) {
print_delimited($depths->{$depth}, sep=>$sep)
if $depths->{$depth};
}
} elsif ($byowner) {
print_delimited($owners, sep=>$sep)
} elsif ($byext) {
print_delimited($exts, sep=>$sep) if $byext;
} else {
croak "Shouldn't get here!";
}
}
else {
if ($bydepth) {
foreach my $depth (@depthlist) {
# set package-global $DEPTH because it's in the format header
$DEPTH = $depth;
print_formatted($depths->{$depth}, format=>'BYDEPTH',
human=>$human, trim=>$BASEPATH);
}
if (not ($byowner or $byext)) {
warn "\nHint: Try adding '--by-owner' and/or '--by-extension'"
. ".\n" unless $QUIET;
}
}
if ($byowner) {
print_formatted($owners, format=>'BYOWNER', human=>$human);
}
#? else {
#? warn "\nHint: Try adding '--by-owner' for per-user utilization.\n"
#? unless $QUIET;
#? }
if ($byext) {
print_formatted($exts, format=>'BYEXT', human=>$human);
}
#? else {
#? warn "\nHint: Try adding '--by-extension' for per-extension "
#? . "utilization.\n" unless $QUIET;
#? }
} # if '--raw'
print "\n";
} # main
##############################################################################
## h e l p e r f u n c t i o n s ##
##############################################################################
sub parse_usage {
my $filename = shift;
my $usage = {};
my ($fh, $path, $size, $mtime, $ctime, $owner, $prefix);
my ($has_header, $count) = (0, 0);
# format is: path ⇥ size ⇥ mtime ⇥ ctime ⇥ owner
if ($filename eq '-') {
$fh = *STDIN;
} else {
open $fh, '<', $filename;
}
while (<$fh>) {
$count++;
next if /^#/; # skip comments
# require header within the first $HEADER_SCAN_LINES lines (or 0=don't)
if ($HEADER_SCAN_LINES && $count > $HEADER_SCAN_LINES && !$has_header) {
die "ERROR: Invalid input file. See '--manual' for required " .
"format.\n" unless $has_header;
}
if (/$TSV_HEADER/) {
$has_header = 1;
next;
}
chomp;
# add leading './' if it's a relative path without one
$_ =~ s/^/\.\// if /^\w+/;
($path, $size, $mtime, $ctime, $owner) = split /\t/;
# funny story: this literally happened to me when 'find' came across
# files with embedded newlines in the filename and wrote the '-printf'
# format over two lines
if (grep { !defined } $path, $size, $mtime, $ctime, $owner) {
warn "WARNING: record for $path had empty fields; skipping$WEND";
next;
}
# sanity checks:
# if you wanted to terminate on non-existent files
#? croak "ERROR: Non-existent file '$path'" unless -f $path;
croak "ERROR: Bad size '$size'" unless $size =~ /^\d+$/;
croak "ERROR: Bad mtime '$mtime'" unless $mtime =~ /^-?[.\d]+$/;
croak "ERROR: Bad ctime '$ctime'" unless $ctime =~ /^-?[.\d]+$/;
croak "ERROR: Bad owner '$owner'" unless $owner =~ /^[-\w]+$/;
$usage->{$path} = {
size => $size,
mtime => $mtime,
ctime => $ctime,
owner => $owner,
};
# h/t https://rene.seindal.dk/2005/09/09/longest-common-prefix-in-perl/
$prefix ||= $path;
chop $prefix while ($path !~ /^\Q$prefix\E/); # \Q,\E = escape meta
}
close $fh;
return ($usage, $prefix);
} # parse_usage
# if path argument is absolute, return free space for mounted volume
sub disk_free {
my $path = shift;
return if $path =~ /^\./; # reject relative paths
# [0] device, [1] size, [2] used, [3] free, [4] percent, [5] mount point
my @stats = split /\s+/, `LC_ALL=C df -h '$path' | tail -n +2 2>/dev/null`;
if ($?) {
warn "WARNING: Unable to get free space for '$path'$WEND" if $DEBUG;
# or do nothing
} else {
return "$stats[3] of $stats[1] free ($stats[4] full)";
}
} # volume_free_space
sub usage_by_owner {
my $usage = shift;
my ($owner, $size);
my $ownersizes = {};
croak "Got an empty '\$usage' hashref" unless keys %$usage;
# make a list of file owners and total byte sizes
foreach my $path (keys %$usage) {
$owner = $usage->{$path}->{owner};
$size = $usage->{$path}->{size};
$ownersizes->{$owner} += $size;
}
return $ownersizes;
} # usage_by_owner
sub usage_by_ext {
my $usage = shift;
my ($ext, $size);
my $extsizes = {};
croak "Got an empty '\$usage' hashref" unless keys %$usage;
# make a list of file owners and total byte sizes
foreach my $path (keys %$usage) {
# strip off './' if it's there
($ext = $path) =~ s/^\.\///;
# Pro Tip: m/// returns list of capture subexpressions in list context
($ext) = ($ext =~ /
(
(?:\.\w{1,$DEFAULT_EXTENSION_LENGTH}) # a dot, then <= 8 chars
{1,$DEFAULT_EXTENSION_NUMBER} # up to 3 of them
)$ # at EOL; capture #1
/x);
$ext = '<none>' if not $ext;
$size = $usage->{$path}->{size};
$extsizes->{$ext} += $size;
}
return $extsizes;
} # usage_by_ext
sub usage_by_depth {
my $usage = shift;
my %opts = @_;
my ($subpath, $size);
my $depthsizes = {};
# sum up disk utilization by the parent directory up to $opts{depth}
# the regex matches $opts{depth} path elements followed by a filename
foreach my $path (keys %$usage) {
# Pro Tip: m/// returns list of capture subexpressions in list context
($subpath) = ($path =~ qr(
^( # at the beginning of the line
\.? # maybe a period (relative paths)
(?:/[^/]+) # a slash, then a bunch of non-'/' chars
{$opts{depth}} # exactly <depth> of them
/ # followed by a literal '/'
) # capture #1
)x);
next unless $subpath;
$size = $usage->{$path}->{size};
$depthsizes->{$subpath} += $size
}
return $depthsizes;
} # usage_by_depth
# sorts a hash based on value; returns an array ref
sub reverse_sort_and_take_n {
my $hashref = shift;
my %opts = @_;
my $result = [];
croak "Hashref references an empty hash" unless keys %$hashref;
croak "Need 'n' option" unless exists $opts{n} and $opts{n};
# sort entries based on the value, in reverse order
my @ranked = sort { $hashref->{$b} <=> $hashref->{$a} } keys %$hashref;
# subset the ranked keys, only if the "n" option is smaller than # of keys
@ranked = @ranked[0 .. $opts{n}-1] if $opts{n} < scalar(@ranked);
# take first $n
foreach my $entry (@ranked) {
push @$result, [$entry, $hashref->{$entry}];
}
return $result;
} # reverse_sort_and_take_n
sub print_delimited {
my $arrayref = shift;
my %opts = @_;
croak "Got an empty arrayref" unless @$arrayref;
$opts{sep} = $DEFAULT_SEPARATOR unless exists $opts{sep};
foreach my $entry (@$arrayref) {
print join($opts{sep}, @$entry), "\n";
}
} # print_delimited
sub print_formatted {
my $arrayref = shift;
my %opts = @_;
my (@pwent, $fullname);
croak "Got an empty arrayref" unless @$arrayref;
croak "Need 'format' option" unless exists $opts{format} and $opts{format};
$FORMAT_NAME = $opts{format};
$FORMAT_TOP_NAME = $opts{format} . '_TOP';
$FORMAT_FORMFEED = "\n\n";
foreach my $entry (@$arrayref) {
if ($opts{format} eq 'BYOWNER') {
@pwent = getpwnam $$entry[0];
$fullname = @pwent ? $pwent[6] : '<unknown>';
$$entry[0] = "$$entry[0] ($fullname)";
} elsif ($opts{format} eq 'BYDEPTH') {
$$entry[0] =~ s/^\Q$opts{trim}\E//;
}
@LINE = ($$entry[0]);
push @LINE, $opts{human} ? humanize($$entry[1]) : $$entry[1];
write;
}
# start a new page
$FORMAT_LINES_LEFT = 0;
} # print_delimited
# source: http://perldoc.perl.org/5.26.1/perlform.html
sub swrite {
croak "usage: swrite PICTURE ARGS" unless @_;
my $format = shift;
$ACCUMULATOR = "";
formline($format, @_);
return $ACCUMULATOR;
}
sub humanize {
my $bytes = shift;
# check to see if divisible by next higher prefix (remainder != self)
return $bytes if $bytes % 1024 == $bytes;
return sprintf("%0.1fK", $bytes/1024) if $bytes % 1024**2 == $bytes;
return sprintf("%0.1fM", $bytes/1024**2) if $bytes % 1024**3 == $bytes;
return sprintf("%0.1fG", $bytes/1024**3) if $bytes % 1024**4 == $bytes;
return sprintf("%0.1fT", $bytes/1024**4) if $bytes % 1024**5 == $bytes;
return sprintf("%0.1fP", $bytes/1024**5) if $bytes % 1024**6 == $bytes;
return sprintf("%0.1fE", $bytes/1024**6); # otherwise
} # humanize
##############################################################################
## f o r m a t s ##
##############################################################################
format DISKFREE =
================================================================================
Disk usage for volume @<<<<<<<<<<<<<<<<<<<<<<<<<... @>>>>>>>>>>>>>>>>>>>>>>>>>>
$BASEPATH, $DISKFREE
================================================================================
.
format BYDEPTH_TOP =
Top disk utilization in @*, @* level(s) deep
$BASEPATH, $DEPTH
================================================================================
subpath disk utilization
------- ----------------
.
format BYDEPTH =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<... @>>>>>>>>>>>>>>>
@LINE
.
format BYOWNER_TOP =
Top disk utilization by file owner
================================================================================
file owner disk utilization
---------- ----------------
.
format BYOWNER =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<... @>>>>>>>>>>>>>>>
@LINE
.
format BYEXT_TOP =
Top disk utilization by file extension
================================================================================
extension disk utilization
--------- ----------------
.
format BYEXT =
@<<<<<<<<<<<<<<<<<<<<<<<<<<<<<... @>>>>>>>>>>>>>>>
@LINE
.
1;
__END__
=encoding utf8
=head1 NAME
reportusage - Generate a report of disk utilization
=head1 SYNOPSIS
Processes the results of a previous C<find -printf> command (see
L<"Input File Format"> for complete details) into a concise report of total disk
utilization at aribtrary depth(s) in the directory hierarchy, by file owner, and
file extension.
reportusage [-?|--help] [--manual] [-h] [-l LIMIT] [-d DEPTH[,...]] [-r]
[-s SEP] [--quiet] [--debug] [FILE]
where:
-?, --help; --manual prints a brief help message; displays the manual
-h, --human-readable uses 'K', 'M', 'G', and 'T' suffixes if appropriate
-l, --limit LIMIT limits results to LIMIT (default: 5)
-d, --depth DEPTH[,...] displays disk usage DEPTH levels into hierarchy
(default: 2); separate multiple DEPTHs with commas
--[no-]by-depth [suppresses] displays usage by depth in hierarchy
--[no-]by-owner [suppresses] displays usage by file owner
--[no-]by-extension [suppresses] displays usage by extension
-a, --all print usage in all three of the above formats
(use '--no-by-X' options to exclude specific ones)
-r, --raw means no fancy reports; plain text delimited records
-s, --separator SEP is the separator to use for '-r' / '--raw' records
-q, --[no-]quiet suppress progress messages (implied for '-r')
--debug prints detailed errors/warnings for troubleshooting
FILE is the output of a previous 'find DIR -printf ...'
invocation (see the manual); read stdin if omitted
Run C<reportusage --manual> for further details. Please report bugs at
L<https://tf.cchmc.org/s/ykbdo>.
=head1 DESCRIPTION
Command line options may be "cuddled" together (I<e.g.>, C<-hd5>, and
option/arguments may be separated with an C<=> if desired (I<e.g.>,
C<--limit=10>).
If the C<FILE> argument is omitted, C<reportusage> will read input from stdin
(or print an error message if run interactively). You can also specify C<-> as
the filename, if that makes you happy, but only file or stream will be read at a
time.
If C<-r> / C<--raw> (raw output) is specified, then only a single C<-d> /
C<--depth> option may be specified, and only one of C<--by-depth>,
C<--by-owner>, or C<--by-extension> may be specified.
If C<--all> is given (all three report formats), you can "subtract" formats you
don't want with the C<--no-by-X> options; I<e.g.>, C<--all --no-by-owner>.
If the longest common path of all the files in the report is 1) an absolute
path; and 2) corresponds to a mounted filesystem, a summary of used/free space
for the mounted volume will be reported at the top. For relative paths, this is
automatically suppressed.
=head2 Input File Format
The C<FILE> parameter is expected to be produced by running C<find> like this:
# header is only required if $HEADER_SCAN_LINES is set (see below)
echo -e "path\tsize\tmtime\tctime\towner" > output.tsv
# format string is:
# path, size, mtime (epoch secs), atime, file owner, newline (LF)
find /path/to/dir -type f -printf "%p\t%s\t%T@\t%A@\t%u\n" >> output.tsv
Don't forget the C<-type f>, or you might get some odd results, since every line
in the input file is expected to be a file. You may wish to add some other
criteria to restrict the number of results from C<find>, such as C<-mtime +100>
or C<-size +10M>, because this will speed up the report generation.
Comments with the C<#> character are allowed (these lines are ignored when
parsing). The tab-delimited header row is not required unless
C<$HEADER_SCAN_LINES> is set to a non-zero value, and in that case, it must
appear within the first C<$HEADER_SCAN_LINES> lines of the input file or the
script will terminate with an error.
Files of the appropriate format are routinely generated and stored in
C</data/CAGE_clusterdata/.accounting> and C</data/weirauchlab/.accounting>,
with a C<.tsv> extension. Use the newest one you find there, or use the symlink
named C<latest.tsv>, if present.
=head1 EXAMPLES
=head2 Summary disk usage in current working directory, by file owner only
The C<--quiet> option suppresses printing of progress messages, which you may
not care about if the number of input records is small (in the tens of
thousands).
find . -type f -printf '%p\t%s\t%T@\t%A@\t%u\n' \
| reportusage --quiet --no-by-dir --by-owner
Remember that the C<--by-dir> option is the default, so you always have to
switch that off if you don't want it, before adding one of the other two
options.
=head2 Summary disk usage by file extension only, machine-readable
Adding C<--raw> suppresses the normal progress messages (unless you also give
C<--no-quiet>) and prints in a tab-delimited output format by default.
This is a good output format for processing the results with another tool.
find . -type f -printf '%p\t%s\t%T@\t%A@\t%u\n' \
| reportusage --raw --no-by-dir --by-extension > byext.tsv
=head2 Summary disk usage by file extension only, CSV output
The C<--separator> option can be used to specify a different output field
separator than the default of tab.
find . -type f -printf '%p\t%s\t%T@\t%A@\t%u\n' \
| reportusage --raw --limit=20 --sep=, --no-by-dir --by-ext > byext.csv
The C<--limit> option will print more than the usual top 5 directories with the
most disk utilization, which you might want if you're processing the output with
some other tool (I<e.g.>, plots with Excel). There is currently no way to ask
for "unlimited"; see #51 in the L<GitLab issue tracker|https://tf.cchmc.org/s/ykbdo>.
The longer "long" options have reasonable abbreviations, too, like C<--sep> for
C<--separator> and C<--by-ext> for C<--by-extension>. Have a look at the
C<GetOptions> invocation in the source for all the supported ones.
=head2 Summary disk usage for files ≥ 1 GB, not modified in last 100 days
The C<--limit> option will behave as described in the previous example, and the
C<--depth> option will show three different levels of hierarchy so you can zero
in on where the big files are.
find / -type f -size +1G -mtime +100 -printf '%p\t%s\t%T@\t%A@\t%u\n' \
| reportusage --limit=20 --depth=1,2,3
=head1 TROUBLESHOOTING
If you suspect problems with your input file, try the C<--debug> switch.
This can possibly reveal corrupted/partial records that could have arisen due
to, for example, embedded newlines or other funny characters in filenames
uncovered by your C<find> command (true story!).
Finally, double-check the L<"Input File Format"> section to make sure that your
input file has the correct columns.
=head1 BUGS
If you discover some behavior that could be a bug, report that behavior
L<here|https://tf.cchmc.org/s/ykbdo>.
Please include the exact command line invocation you tried and any relevant
error messages verbatim, in a L<Markdown code block|https://tf.cchmc.org/s/gitlab-markdown>.
=head1 AUTHOR
Kevin Ernst (kevin.ernst at cchmc.org)
=head1 LICENSE
MIT.
=cut
# vim: tw=80 colorcolumn=80
More information about the Cincinnati-pm
mailing list