This week we were given several data sets. The task was threefold: discover how data is made available, how much data there is, and what format it is in. I have included below my findings for each data set.
The first data set was OpenStreetMap, an open source map creation site. Here is the website. I discovered a Wiki page entitled "Downloading data" which pretty much covered all the topics we are interested in. All data use is free as long as OpenStreetMap is attributed. The data can be downloaded from this page in its entirety (i.e., the entire planet) or smaller regions (such as country or state) can be downloaded from one of several third party sources, listed here. The data is already in XML format, which is a huge plus for us.
The next two data sets are both Amazon Web Services (AWS) Public Data Sets, website here. It looks like this data is free to anyone, as long as you sign up for the free 12-month trial. This page describes how to use AWS services to download data. However, it seems that the data format is dependent upon the specific data set, and that this information is not always given in the description of the data set. Here is one thread that talks about this, although I had to really search through the forums to find it. The two data sets that we are looking at on AWS are the Twilio/WiGLE.net (US street names and addresses mapped to their zip codes and latitude/longitude ranges) and TIGER/Line Shapefiles (Census 2000 data and shape files of American states, counties, subdivisions, districts, and places) data sets. It is also worth noting that the OpenStreetMap data is also available via AWS.
The final data set we were given was broadly described as "Transportation dataset from US Dept. of Transportation providing data on aviation, maritime, highway, transit, rail, etc.." I typed the beginning of this phrase into Google and discovered here that the Department of Transportation (DOT) has been publishing all of its data since September 2010 and will be finished by December 2014. Apparently, the DOT alone has contributed more than 700 data sets to the overall governmental data set site, data.gov. When I explored further I was rewarded with this site which contains more than 150,000 governmental data sets.. WOW. That is a lot of data! Here are some resources, including APIs, which are provided by the DOT to developers for better obtainment and utilization of all this data.
In conclusion, we have a lot of options when it comes to what kind of data we want to use. Hopefully during our next meeting we can begin to hone in on the topic or theme of application(s) that we plan to develop. The more I explore this field, the more I am amazed by the multitudinous capabilities of big data.