Develop a substantial amount of data for your database and
load it into your relations using the SQL load command. To
create the data, write a program in any programming language you like
that creates large files of records in a format acceptable to the
MySQL bulk loader, then load the data into your PDA
relations. If you are using real data, your program will need to
transform the data into files of records conforming to your PDA
schema. The rest of you will write a program to fabricate
data: your program will generate either random or nonrandom (e.g.,
sequential) records conforming to your schema. Note that it is both
fine and expected for your data values--strings especially--to be
meaningless gibberish. The point of generating large amounts of data
is so that you can experiment with a database of realistic size,
rather than the small "toy" databases often used in classes. The data
you generate and load should be on the order of:
- At least two relations with thousands of tuples
- At least one additional relation with hundreds of tuples
If the semantics of your schema includes relations that are
expected to be relatively small (e.g., states in the US),
it is fine to use some small relations, but please ensure that you
have relations of the sizes prescribed above as well. When writing a
program to fabricate data, there are two important points to keep in
mind:
- Be sure not to generate duplicate values for your key attributes.
- Your PDA almost certainly includes relations that are expected to
join with each other. For example, if your project is a dating site,
you may have a Users relation with an attribute ID
that's expected to join with the attribute userID in the relation
Dates. In generating data, be sure to generate values that
actually do join--otherwise all of your interesting queries will have
empty results! One way to guarantee joinability is to generate the
values in one relation, then use the generated values in one relation
to select joining values for the other relation. For example, you
could generate user IDs first (either sequentially or randomly), then
use these IDs to fill in the userID values in the
Dates relation.
Turn in your program code for generating or transforming data, a small
sample of the records generated for each relation (5 or so records per
relation), and a script showing the loading of your data into MySQL.