Week 7, July 27-August 2
- Seth Kurke
- Aug 3, 2020
- 2 min read
Updated: Aug 10, 2020

Here we are at the homestretch! It will be a busy two weeks and now I am pushing it harder until I reach the finish line. This project, now that I am in the process of my third deliverable, took a detour last week. My goal was to begin a new software program, OpenRefine. But for the purpose of the project, something like Microsoft Access would have been more appropriate. I have Microsoft Access experience, so I decided to continue with Excel and eschew OpenRefine. This is not to denigrate the software, but I think it was not right for the project. The software requires a completed spreadsheet or document to upload, and then it separates the data into metadata categories. That seems to be redundant, since I need to make the spreadsheet first. I understand how it could be beneficial perhaps with the scope of a different project, but since my deliverables need to be in the form of a MS Excel spreadsheet, I have decided to tackle the data and try to complete as much as I can as possible.
This doesn't mean I am putting my head down and bulldozing thought the data. Each collection demands full attention because there is such a variety in the resources. This means there is a minimum of cutting and pasting, and a maximum of skimming and browsing certain documents to find the information needed to fill in the blank cells. At this juncture, one of the most complex areas to address are the categories of "contributor." Many of the documents are submitted by a chairperson (chairman, really, because most of the documents were written in the 40's and 50's) within the US Senate or Representatives. These groups of congresspeople are then divided into committees. Often times the committee stands on its own, with no individuals, and they are marked as the contributor. Sometimes the name of the committee changes but they are tackling the same topic. I can also mark them as "organization responsible for publication" since there is no true publisher and this group is the one who wrote the document and printed it out. This is why I decided to not have a Publisher category. These resources were published through government channels, and are not limited to the rules of a publishing house. Sometimes these reports are printed out and distributed to a limited group of people. Meaning, it is not for public consumption. At least it wasn't upon printing and distribution.
Yet, sometimes I want to find persons of importance within these documents. Maybe someone is searching for "Chester Nimitz" or "Charles Elston." Well, sometimes their fingerprints are all over the documents, though their name isn't mentioned. This is where some research comes into play. Many times you may get a name, but with only initials for the first name. More research is involved. Google deserves some credit, but so do my searching skills. Multiply this by 100's and soon to be nearly 1000 documents, then you can fill up dozens of hours fast. This is the plan for the homestretch. I have two weeks to go. Wish me luck!
Comments