Using the Oracle Bulk Loader

Overview
Creating the Control File
Creating the Data File
Loading Your Data
Loading Without a Separate Data File
Loading DATE Data
Entering NULL Values

Overview To use the Oracle bulk loader, you need a control file, which specifies how data should be loaded into the database; and a data file, which specifies what data should be loaded. You will learn how to create these files in turn.

Creating the Control File A simple control file has the following form:

LOAD DATA
INFILE <dataFile>
APPEND INTO TABLE <tableName>
FIELDS TERMINATED BY '<separator>'
(<list of all attribute names to load>)

<dataFile> is the name of the data file. If you did not give a file name extension for <dataFile>, Oracle will assume the default extension ".dat". Therefore, it is a good idea to name every data file with an extension, and specify the complete file name with the extension.
<tableName> is the name of the table to which data will be loaded. Of course, it should have been created already before the bulk load operation.
The optional keyword APPEND says that data will be appended to <tableName>. If APPEND is omitted, the table must be empty before the bulk load operation or else an error will occur.
<separator> specifies the field separator for your data file. This can be any single character. It is a good idea to use a character that you know will never appear in the data, so the separator will not be confused with data fields.
Finally, list the names of attributes of <tableName> that are set by your data file, separated by commas and enclosed in parentheses. This list need not be the complete list of attributes in the actual schema of the table, nor must it be arranged in the same order as the attributes when the table was created -- sqlldr will match attributes to by their names in the table schema. Any attributes unspecified in the list of attributes will be set to NULL.

As a concrete example, here are the contents of a control file test.ctl:

LOAD DATA
INFILE test.dat
INTO TABLE test
FIELDS TERMINATED BY '|'
(i, s)

Creating the Data File Each line in the data file specifies one tuple to be loaded into <tableName>. It lists, in order, values for the attributes in the list specified in the control file, separated by <separator>. As a concrete example, test.dat might look like:

1|foo
2|bar
3| baz

Recall that the attribute list of test specified in test.ctl is (i, s), where i has the type int, and s has the type char(10). As the result of loading test.dat, the following tuples are inserted into test:

(1, 'foo')
(2, 'bar')
(3, ' baz')

Warning: Note that the third line of test.dat has a blank after "|". This blank is not ignored by the loader. The value to be loaded for attribute s is ' baz', a four-character string with a leading blank. It is a common mistake to assume that 'baz', a three-character string with no leading blank, will be loaded instead. This can lead to some very frustrating problems that you will not notice until you try to query your loaded data, because ' baz' and 'baz' are different strings.

Note: Oracle literally considers every single line to be one tuple, even an empty line! When it tries to load data from an empty line, however, an error would occur and the tuple will be rejected. Some text editors love to add multiple newlines to the end of a file; if you see any strange errors in your .log file about tuples with all NULL columns, this may be the cause. It shouldn't affect other tuples loaded.

Loading Your Data The Oracle bulk loader is called sqlldr. It is a UNIX-level command, i.e., it should be issued directly from your UNIX shell, rather than within sqlplus. A bulk load command has the following form:

sqlldr <yourName>@weasel control=<ctlFile> log=<logFile> bad=<badFile>

Everything but sqlldr is optional -- you will be prompted for your username, password, and control file. <ctlFile> is the name of the control file. If no file name extension is provided, sqlldr will assume the default extension ".ctl". The name of the data file is not needed on the command line because it is specified within the control file. You may designate <logFile> as the log file. If no file name extension is provided, ".log" will be assumed. sqlldr will fill the log file with relevant information about the bulk load operation, such as the number of tuples loaded, and a description of errors that may have occurred. Finally, you may designate <badFile> as the file where bad tuples (any tuples for which an error occurs on an attempt to load them) are recorded (if they occur). Again, if no file extension is specified, Oracle uses ".bad". If no log file or bad file are specified, sqlldr will use the name of the control file with the .log and .bad extensions, respectively.

As a concrete example, if sally wishes to run the control file test.ctl and have the log output stored in test.log, then she should type

sqlldr sally control=test.ctl log=test.log

Reminder: Before you run any Oracle commands such as sqlldr and sqlplus, make sure you have already set up the correct environment by sourcing ~matei/bin/cs235.sh (or ~matei/bin/cs235.csh - see Getting Started With Oracle).

Loading Without a Separate Data File It is possible to use just the control file to load data, instead of using a separate data file. Here is an example:

LOAD DATA
INFILE *
INTO TABLE test
FIELDS TERMINATED BY '|'
(i, s)
BEGINDATA
1|foo
2|bar
3| baz

The trick is to specify "*" as the name of the data file, and use BEGINDATA to start the data section in the control file.

Loading DATE Data The DATE datatype can have its data loaded in a format you specify with considerable flexibility. First, suppose that you have created a relation with an attribute of type DATE:

CREATE TABLE foo (
    i int,
    d date
);

In the control file, when you describe the attributes of foo being loaded, you follow the attribute d by its type DATE and a date mask. A date mask specifies the format your date data will use. It is a quoted string with the following conventions:

Sequences of d, m, or y, denote fields in your data that will be interpreted as days, months, years, respectively. As with almost all of SQL, capitals are equally acceptable, e.g., MM is a month field.
The lengths of these fields specify the maximum lengths for the corresponding values in your data. However, the data can be shorter.
Other characters, such as dash, are treated literally, and must appear in your data if you put them in the mask.

Here is an example control file:

LOAD DATA
INFILE *
INTO TABLE foo
FIELDS TERMINATED BY '|'
(i, d DATE 'dd-mm-yyyy')
BEGINDATA
1|01-01-1990
2|4-1-1998

Notice that, as illustrated by the second tuple above, a field can be shorter than the corresponding field in the date mask. The punctuation "-" tells the loader that the day and month fields of the second tuple terminate early.

Entering NULL Values

You may specify NULL values simply by entering fields in your data file without content. For example, if we were entering integer values into a table with schema (a, b, c) specified in the .ctl file, the following lines in the data file:

3||5
|2|4
1||6
||7

would result in inserting the following tuples in the relation:

(3, NULL, 5)
(NULL, 2, 4)
(1, NULL, 6)
(NULL, NULL, 7)

Keep in mind that any primary keys or other constraints requiring that values be non-NULL will reject tuples for which those attributes are unspecified.

Note:If the final field in a given row of your data file will be unspecified (NULL), you have to include the line TRAILING NULLCOLS after the FIELDS TERMINATED BY line in your control file, otherwise sqlldr will reject that tuple. sqlldr will also reject a tuple whose columns are all set to NULL in the data file.

If you do not wish to enter values for any row of a given column, you can, as mentioned above, leave that column out of the attribute list altogether.

This document was written originally for Prof. Jeff Ullman's CS145 class in Autumn, 1997; revised by Jun Yang for Prof. Jennifer Widom's CS145 class in Spring, 1998; further revisions by Jeff Ullman, Autumn, 1998; further revisions by Srinivas Vemuri for Prof. Jeff Ullman's CS145 class in Autumn, 1999; further revisions by Nathan Folkert for Prof. Jennifer Widom's CS145 class in Spring, 2001.