[Oc-pm] OCPM March Meeting Discussions

Tue Mar 28 12:17:22 PDT 2017

Team,

  Sharing a recap of what we discussed impromptu. We sure can revisit for a
deep dive in the coming months.

Thanks,
Ramana

*A) dt - a data transformation/tracking framework*
Goal*:* Ability to compose tabular data from multiple sources in an
automation friendly framework, AND track the results in a revision
controlled fashion.

Sources can be structured (e.g. Relational DB, xls/csv/tsv/ods etc formats)
or unstructured (e.g. HTML table in a webpage).
Automation can schedule the jobs (say) via a cron job, or Interactively
view results.
Ability to combine (e.g. join, pivot, group, order, subset, union,
intersection etc) is essential.
Persistence to results (final as well as intermediate)
Tracking to find trend over time. E.g. How a row (or a field of a table)
evolved over time.

Imagine a use case...Combine diverse data (from MS Excel on Dropbox,
Numbers worksheet on Mac, and a HTML Table in a webpage) using web services
(e.g. find stock price) and put the result into a SQLite Database and push
the result to a web page and a Google Sheet for users to view. Same can be
seen from a terminal in interactive fashion - with periodic refresh.

Wow...that seems like a valuable weapon that saves tons of cycles, at least
in my world. :-)

We reviewed the version 0.01 of the tool during yesterday's OCPM meeting.
The tool was built using easydatabase (
https://sites.google.com/site/easydatabase/) and sqlite3, using Perl's
ability glue diverse data sources together.

[snip]
$ dt infile=test.csv [outformat=psv]
              Formats input to unform width 'psv' (pipe seperated values)
$ cat test.csv | dt
              Takes input from a pipe
$ dt infile=test.csv infile=test1.csv
command='$dt->[0]=$dt->[0]->join($dt->[1], 0, ["Name"], ["Name"],
{renameCol => 1})' outformat=xls outfile=t.xls
               Composes tables and stores results
$ cat t.csv | dt informat=csv command='dt2db($dt->[0], undef, "t.db", "t")'
                Creates persistent DB tables from in-memory Data::Table
objects
$ sqlite3 -header t.db "select * from t"
                DB and SQL access
$ cat t.csv | dt informat=csv command='dt2db($dt->[0], undef, "t.db", "t")'
                 When data changes, only changed rows are updated giving a
time history of data
[/snip]

Question: Is there a tool out there that solves this class of problems? If
so, we can learn and adopt; else we can refine and release the tool.

*B) Webscraping question*

How do we get the JSON from URL e.g. "
https://www.tipranks.com/api/stockInfo/getDetails/?name=aapl" via Perl
script?
Looks like there is some challenge to usual headers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.pm.org/pipermail/oc-pm/attachments/20170328/c14ddbed/attachment.html>