CSPP 53017: Assignment 1
Due by 1:59am on Tuesday, January 29, 2013
Project proposal (first draft)
First you will need to form a team with one or two more classmates.
If you would like to work on the project on your own, please, discuss
it with the instructor. You can organize your team and distribute the
work among team members in any way you like, but, please make sure
that everyone understands (though not necessarily implements) all
aspects of the project.
Your draft proposal should discuss the following points:
- What are the key question(s) that you would like to answer by
building the proposed data warehouse? Please, be as specific as
possible, and list up to 10 questions.
- What are the data sources that you would like to use? Please, be
as exhaustive as possible bur prioritize the data sources since you
will likely end up using only a few of your top choices.
- For each data source, list some details such as whether it is
available via API or as a flat file, size in terms of number of tuples
and disk volume, and any limitations.
Please, avoid any data sources that require screen-scraping
(extracting the data from the html pages of a web site) and only list
data sources that are publicly available.
You will submit your proposal by emailing your first draft as a PDF or
text document to the instructor. Please, use the following title for
your submission CSPPDW_P1.
Problem Set
You will
complete the problem set using Gradiance (http://www.newgradiance.com/), an online learning
system, developed by a team led by professor Jeff Ullman.
You will need to create an account on Gradiance by following the instructions on the site. Please, note that the url of the site is newgradiance.com.
To sign up for our course, specify the class sign up token provided in class.
The name of the homework is CSPPDW-HW1-Win13. All questions in this
problem set are multiple choice. However, to answer them correctly
you will need to work out their long (general) answers. A correct
answer is worth 3 points. You lose a point for each incorrect
answer. You can attempt the problem set as many times as you like; only
your highest score will count. Note that you will probably get slightly
different questions each time you take it.
The due date for the Gradiance part of the homework is 1:59am on Tuesday, January 29, 2013.