Originally published on 08-10-2011 09:22 AM
In many situations you may have large binary objects, for example, an image, that you want to move between a file and a database table or between database tables. If the binary content is less than 64K, the Read File, Read Table, Write File, and Write Table operators can process this content. But if your binary content is larger than this limit, these operators are no longer suitable and you need to use the Read Custom and Write Custom operators. However, moving large binary content through a data integration application is not a best practice as it would significantly slow the application and require extraordinarily large memory allocations. This article details how an expressor data integration application can work with these large data objects in a way that is easy to implement and minimizes the impact on performance.
When working with these large binary objects, whether they are stored in files or database tables, you will manipulate the content using expressor Datascript in the Read Custom and Write Custom operators. If the data is in a file, the I/O library functions can be used to read and write the content. If the data is in a database table, you will use the Expressor Datascript SQL Module to extract the content from the table and database management system utilities (such as the Oracle SQL Loader or the SQL Server SQL Command utilities) to write the content. The best way to learn these techniques is by reviewing some examples. Let's start by reading content from a file.
Suppose you have an image file named Download.jpg.
To read the content of this file into expressor, use the I/O library functions open and read to assign the content to a variable in your code.
However, reading a large binary object into your code is not something you will do with any regularity as you will most likely not be able to pass the content through the remainder of the dataflow.
What you are more likely to do is to read binary content from a database table and write it to a file. For this you will use the expressor Datascript SQL Module, as shown in the following example.
The statement in line 1 makes the functionality of the expressor Datascript SQL Module available to the coding within the Read Custom operator. Then in the initialize function a connection to the source database is established. This connection is based on an existing ODBC DSN, which in this example is named MSSQLDemo and is directed against a SQL Server database. Be sure to create the ODBC DSN using the expressor-3 drivers. The remaining arguments are credentials (username and password) to connect to the database. The statements in the finalize function release the database connection.
Lines 11-12 in the read function execute a SELECT statement against the source table, which returns a cursor object. Line 13 then fetches the first entry from the result set.
The while loop within the read function carries out the processing logic. In this example, two things are happening. The binary data, contained in the result set field named image, is written to a file (lines 19-23).
file_name=".\\" .. tostring(row.entry) .. ".bmp"
while the other result set fields are used to initialize the record that will be emitted by the operator (lines 17-18). In line 24, the next entry is retrieved from the result set. Each time an initialized output record is returned, it is emitted and the read function re-invoked. When there are no more entries in the result set, the while loop ends and the read function returns nil. To the expressor runtime, the nil return indicates that the operator has processed all available data.
For each entry in the result set, a file containing the binary data has been written. Note how the name of each file includes some identifying information from other result set fields, in this example, the value in the field entry, which is the primary key column in the table.
Writing the binary data to a database table currently requires use of the utilities supplied with each relational database management system. For example, with SQL Server, the sqlcmd utility provides the necessary functionality while for Oracle you use the sqlldr utility.
This is an example of the Write Custom operator code used to write the binary content to an Oracle database table.
The sqlldr utility is capable of directly loading the content from the file into the database, so there is no need to explicitly read the file.
Lines 7-12 are used to set up a record specific detail file used by the sqlldr utility. As each record is processed by the Write Table operator, this file is rewritten containing content specific to the record being processed. In this simple example, the file contains three comma separated fields: the record identifier named entry, a company name, and the name of the file containing the binary content.
Line 14 sets up the sqlldr command, which is dependent on a loader control file named loader2.ctl. Line15 then runs the sqlldr utility. Note how the OS library execute function can be used to invoke a process external to the expressor dataflow. The control file provides the sqlldr utility with the information it needs to insert data into the table. Once you have written the control file, place it into the external sub-directory under the project; this makes the file a artifact in the project and insures that it will be available to the sqlldr utility.
INFILE 'images.txt' "STR X'0d0a'"
INTO TABLE oimage2
FIELDS TERMINATED BY ','
ext_fname FILLER CHAR(40),
image LOBFILE(ext_fname) TERMINATED BY EOF)
When the table insert operation succeeds, the record specific detail file and the binary file are deleted (lines 17-18).
This next screen shot shows an example of the coding used to write to a SQL Server table.
To demonstrate a slightly different approach, this example first inserts a row into the table and then updates the row with the binary content. Note how the initial insert operation is executed using the functionality of the expressor Datascript SQL Module (lines 7-8 and 13-15) while the update is effected through the SQL Server sqlcmd utility (lines 17-19). If both the initial insert and the update operations succeed, the binary file is deleted.
The sqlcmd utility is dependent on the script file myscript.sql, which contains the following statements.
UPDATE demo.dbo.image3 SET image=
(SELECT * FROM OPENROWSET (BULK N'C:\image\$(fn)',
SINGLE_BLOB)as foo) WHERE entry=$(ent);
After you have written this script file, place it into the external sub-directory under the project; this makes the file a artifact in the project and insures that it will be available to the sqlcmd utility.
In lines 17 and 18, two command line arguments - fn and ent - were initialized. The values in these arguments are incorporated into the statement executed by the sqlcmd utility. The placeholder $(fn) will be replaced by the name of the binary file to load and the placeholder $(ent) becomes the primary key of the record to update.