From mvr707 at gmail.com Tue Mar 28 12:17:22 2017 From: mvr707 at gmail.com (Ramana V Mokkapati) Date: Tue, 28 Mar 2017 12:17:22 -0700 Subject: [Oc-pm] OCPM March Meeting Discussions Message-ID: Team, Sharing a recap of what we discussed impromptu. We sure can revisit for a deep dive in the coming months. Thanks, Ramana *A) dt - a data transformation/tracking framework* Goal*:* Ability to compose tabular data from multiple sources in an automation friendly framework, AND track the results in a revision controlled fashion. Sources can be structured (e.g. Relational DB, xls/csv/tsv/ods etc formats) or unstructured (e.g. HTML table in a webpage). Automation can schedule the jobs (say) via a cron job, or Interactively view results. Ability to combine (e.g. join, pivot, group, order, subset, union, intersection etc) is essential. Persistence to results (final as well as intermediate) Tracking to find trend over time. E.g. How a row (or a field of a table) evolved over time. Imagine a use case...Combine diverse data (from MS Excel on Dropbox, Numbers worksheet on Mac, and a HTML Table in a webpage) using web services (e.g. find stock price) and put the result into a SQLite Database and push the result to a web page and a Google Sheet for users to view. Same can be seen from a terminal in interactive fashion - with periodic refresh. Wow...that seems like a valuable weapon that saves tons of cycles, at least in my world. :-) We reviewed the version 0.01 of the tool during yesterday's OCPM meeting. The tool was built using easydatabase ( https://sites.google.com/site/easydatabase/) and sqlite3, using Perl's ability glue diverse data sources together. [snip] $ dt infile=test.csv [outformat=psv] Formats input to unform width 'psv' (pipe seperated values) $ cat test.csv | dt Takes input from a pipe $ dt infile=test.csv infile=test1.csv command='$dt->[0]=$dt->[0]->join($dt->[1], 0, ["Name"], ["Name"], {renameCol => 1})' outformat=xls outfile=t.xls Composes tables and stores results $ cat t.csv | dt informat=csv command='dt2db($dt->[0], undef, "t.db", "t")' Creates persistent DB tables from in-memory Data::Table objects $ sqlite3 -header t.db "select * from t" DB and SQL access $ cat t.csv | dt informat=csv command='dt2db($dt->[0], undef, "t.db", "t")' When data changes, only changed rows are updated giving a time history of data [/snip] Question: Is there a tool out there that solves this class of problems? If so, we can learn and adopt; else we can refine and release the tool. *B) Webscraping question* How do we get the JSON from URL e.g. " https://www.tipranks.com/api/stockInfo/getDetails/?name=aapl" via Perl script? Looks like there is some challenge to usual headers. -------------- next part -------------- An HTML attachment was scrubbed... URL: