CS 235: Assignment 3

Due in class, Wednesday, April 23, 2007


PDA: Part 3

The third part of your PDA project is to create a database in MySQL, populate it with some real and lots of fabricated data and run several interesting queries.
  1. Get a MySQL account. For specific instructions on how to get a MySQL account, login to MySQL, create relations, etc. please, refer to Getting Started With MySQL. Get acquainted with MySQL, change your initial password, experiment with some simple commands, and check out the help command. This part is for your enjoyment only - there is nothing to hand in.

  2. Create relations for your PDA based on your final relational schema from Part 2. In addition to creating the appropriate attributes and types, please declare primary keys for your relations. The attribute types supported by MySQL are described in the MySQL online documentation.

    Turn in a script showing a MySQL session in which your relations are created successfully. Also show, for each relation, the result of the mysql DESCRIBE command once the relation has been created: for a relation T, type "DESCRIBE T;". Please see Recording Your Session from the Getting Started With MySQL document for a guide to preparing output to be submitted for this and subsequent project parts.

  3. Develop a small amount (10-20 tuples per relation) of real data and load it into your relations using the SQL load command. See Loading Data in MySQL for information on how to bulk-load data.

    Turn in a script showing the loading of your data into MySQL.

  4. Develop a substantial amount of data for your database and load it into your relations using the SQL load command. To create the data, write a program in any programming language you like that creates large files of records in a format acceptable to the MySQL bulk loader, then load the data into your PDA relations. If you are using real data, your program will need to transform the data into files of records conforming to your PDA schema. The rest of you will write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values--strings especially--to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than the small "toy" databases often used in classes. The data you generate and load should be on the order of:

    If the semantics of your schema includes relations that are expected to be relatively small (e.g., states in the US), it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:

    1. Be sure not to generate duplicate values for your key attributes.

    2. Your PDA almost certainly includes relations that are expected to join with each other. For example, if your project is a dating site, you may have a Users relation with an attribute ID that's expected to join with the attribute userID in the relation Dates. In generating data, be sure to generate values that actually do join--otherwise all of your interesting queries will have empty results! One way to guarantee joinability is to generate the values in one relation, then use the generated values in one relation to select joining values for the other relation. For example, you could generate user IDs first (either sequentially or randomly), then use these IDs to fill in the userID values in the Dates relation.

    Turn in your program code for generating or transforming data, a small sample of the records generated for each relation (5 or so records per relation), and a script showing the loading of your data into MySQL.

  5. Develop and test:

    Please note:


    There is no problem set this week.