From mark at purdue.edu Wed Jan 2 08:18:43 2019 From: mark at purdue.edu (Mark Senn) Date: Wed, 02 Jan 2019 11:18:43 -0500 Subject: [Purdue-pm] hardware at January 9 meeting Message-ID: <41086.1546445923@pier.ecn.purdue.edu> Purdue Perl Mongers, I'll bring the following hardware to the January 9 meeting (5:30 pm, enter at EE 270) so people can see it and try it. I don't have a laptop to hook these up to so it will just be to see if it fits your hand, etc. KEYBOOARD.IO MODEL 01 KEYBOARD A fully programmable ergonomic keyboard. See https://shop.keyboard.io for a picture. I'm programming this so I'll only use thumb keys, row below home row, home row, and row above home row and will never need to press more than one key at a time. TYPEMATRIX 2020 KEYBOARD See http://www.ergonomic-keyboard.us/reviews/typematrix-2020 for a picture and revieww of the 2020. The 2020 is my all-time favorite keyboard. Haven't used the Model 01 enough to get used to it. The TypeMatrix 2020 is no longer made, the TypeMatrix 2030 is slightly narrower, but is too small for me to use comfortably See http://www.ergonomic-keyboard.us/reviews/typematrix-2030/ for a picture and review of the 2030. Love the feel of the keys and how they don't need to be pushed far to activate. CLEARLY SUPERIOR TECHNOLOGIES CST2545W-RC 2" TRACKBALL. See https://www.amazon.com/CST2545W-L-TRAC-Wired-Performance-Trackball/dp/B00ECHGE3O for picture and reviews (ignore the fact that the model numbers don't match---don't know why---Amazon claims this is the model I bought. My all-time favorite pointing device. -mark From mark at purdue.edu Thu Jan 24 10:27:36 2019 From: mark at purdue.edu (Mark Senn) Date: Thu, 24 Jan 2019 13:27:36 -0500 Subject: [Purdue-pm] challenge problem: sentiment analysis Message-ID: <31035.1548354456@pier.ecn.purdue.edu> Purdue Perl Mongers, A person (I don't know if they want to be identified offhand) demonstrated how to get information from federalregistrar.gov and/or (I forget for sure) regulations.gov during our last meeting using an API. From https://www.regulations.gov/document?D=EPA-HQ-OAR-2017-0355-21117 EPA received more than 270,000 comments on the ANPRM, which have informed this proposed rulemaking. From https://www.wolframalpha.com/input/?i=270000+seconds [270000 seconds is] 3.3 days Challenge problem: figure out how to use the API for regulations.gov and "sentiment analysis" (google it) to automatically classify comments. I understand regulations.gov limits the rate at which one can download information but if some "sentiment analysis" software can automatically classify comments faster/better/cheaper that humans or other existing software on a small trial, regulations.gov may be interested in that. I certainly wouldn't want to read 270K comments and summarize them. -mark From zeewfo at gmail.com Sun Jan 27 06:31:47 2019 From: zeewfo at gmail.com (dsk) Date: Sun, 27 Jan 2019 09:31:47 -0500 Subject: [Purdue-pm] challenge problem: sentiment analysis In-Reply-To: <31035.1548354456@pier.ecn.purdue.edu> References: <31035.1548354456@pier.ecn.purdue.edu> Message-ID: Interesting challenge. A quick search on CPAN led me to the Text::Mining package, has anyone used it for this type of project? https://metacpan.org/pod/Text::Mining With the government shutdown, regulations.gov may not be approving new API keys. If anyone needs some example comments, I can put together a small archive. Thanks, dsk On Thu, Jan 24, 2019 at 1:27 PM Mark Senn wrote: > Purdue Perl Mongers, > > A person (I don't know if they want to be identified offhand) > demonstrated how to get information from federalregistrar.gov and/or (I > forget for sure) regulations.gov during our last meeting using an API. > > From > https://www.regulations.gov/document?D=EPA-HQ-OAR-2017-0355-21117 > EPA received more than 270,000 comments on the ANPRM, which have > informed this proposed rulemaking. > > From > https://www.wolframalpha.com/input/?i=270000+seconds > [270000 seconds is] 3.3 days > > Challenge problem: figure out how to use the API for regulations.gov and > "sentiment analysis" (google it) to automatically classify comments. I > understand regulations.gov limits the rate at which one can download > information but if some "sentiment analysis" software can automatically > classify comments faster/better/cheaper that humans or other existing > software on a small trial, regulations.gov may be interested in that. I > certainly wouldn't want to read 270K comments and summarize them. > > -mark > _______________________________________________ > Purdue-pm mailing list > Purdue-pm at pm.org > https://mail.pm.org/mailman/listinfo/purdue-pm > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zeewfo at gmail.com Mon Jan 28 09:38:51 2019 From: zeewfo at gmail.com (dsk) Date: Mon, 28 Jan 2019 12:38:51 -0500 Subject: [Purdue-pm] challenge problem: sentiment analysis In-Reply-To: References: <31035.1548354456@pier.ecn.purdue.edu> Message-ID: I put together a couple of files that hold 2438 comment records from the https://www.regulations.gov/docket?D=EPA-HQ-OAR-2017-0355 docket. In https://x646b.org/projects/ppm/sentiment/ , the .csv file is a comma separated value file and the .db file is a sqlite file with the same information. They have two columns of interest, comment_text and attachments. comment_text holds the text of the comment record. If the comment text says something like "See attached" or "See attached file(s)" then the comment text is only available as an attachment and the name of the attachment is stored in the attachments column. The attachments have been archived and can be downloaded as a separate .tgz file from the same directory as the other two files. Thanks, dsk -------------- next part -------------- An HTML attachment was scrubbed... URL: