Search This Blog

Saturday, February 27, 2010

Case Study: Oracles’ JDeveloper IDE and its Use for Data-migration.

Case Study: Oracles’ JDeveloper IDE and its Use for Data-migration.

This case study attempts to enlighten readers of JDevelopers’ powerful BC4J technology and how it was used to create a Java application to perform data migration. This is by no means an exhaustive context nor does it imply that the implementation described herein is the only way to implement such an application. We write this case study merely as an example to describe common problems, solutions and technology explanations to aid readers with their data-migration ventures.
The following topics will be discussed as they all relate to data migration:
The design pattern implemented in the application for code maintainability,
Where, when and why to write custom code in the ADF generated java classes,
Solutions to some perhaps common problems encountered before and during implementation of the application,
Unit testing and a simple design pattern that, as will be seen, will make testing relatively easy and quite maintainable,
Exception handling.

On my most recent project, I was tasked to perform a data-migration between one database (specify) and another (specify). Contract requirements, by which my team and me were bound, required us to use the JDeveloper IDE development tool (Although, Eclipse was preferred and we will describe ways in which we used Eclipse). This proved to be a major advantage over other open-source technologies (specify) mainly due to ADF’s Business Components for Java (BC4J) technology as available for use from within JDeveloper (specify other possible ways to use BC4J without JDeveloper). Because of BC4J we can create accurate object relational mappings of database tables as an entity object (what is an ‘entity’ object). These entity objects can be used in such a way as to create one or more view objects based on one or more entity objects (or even no entity object in the case of a transient view object – to be discussed in a later article). In order to access the data model the view objects must be grouped into what is called an Application Module (an application requiring database connectivity, as developed by JDeveloper, must have 1 or more Application Modules). One design pattern is to create nested application modules where an Application Module is chosen to be the root module and other application modules are then nested within the root module. Another design pattern, as used in this case study, is to have separate application modules, not nested, where transactional support is not necessary or needed.
The required task was to migrate data from one applications data-model to another applications slightly similar data-model. Both applications served the same purpose, which was data tracking of PDUFA related meetings. They were similar in that they both tracked meetings. The source application tracked the history of the meeting and related drug applications and the destination application tracked the current status of the meeting as well as the associated drug application sponsors. As with any project the requirements gathering process is performed first-thing in the projects life cycle. The requirements are certainly needed yet they are not always well elaborated or a simple statement makes up the whole requirement. Such as was the case for this data migration. The stated requirement for this task was almost literally “… migrate database A to database B…”. Well, as can be seen, this picture didn’t represent a thousand words. So, the requirements had to be gathered and defined. The requirements gathering process was difficult at first. The existing schemas made available to me were not accurate. For instance, tables were missing from the source database, specifically the main table. There were tables in both schemas that were not being used in production at all which required determination of which tables to use. Access to both schemas is needed. Point of contacts(POC’s) for each schema is needed. You’ll need POC’s for both the source and destination applications, they can direct you to documentation for the application. You’ll need POC’s of someone that represent the users of the applications. You’ll also need POC’s for the databases themselves. There are several components required for a successful migration. You’ll need to understand the purpose of the source application as well as the purpose of the destination application, which in this case study the purpose of the source and destination application are the same. This knowledge can be obtained from the applications user guide, instruction manual and related media (hopefully documentation was generated). You’ll need access to both schemas including read/write access. You’ll need data dictionaries for both schemas. If these do not exist or cannot be found readily, then the data dictionary must be created through research, as was the case here. This can take significant time. The data dictionary is important as it gives you the data types, size, not-null constraints and the meaning and/or possible values for each column. This information is required to have and create correct mappings between the datasources. When building the mapping definitions keep in mind which columns the source database are required and which are not (can be null). Some columns will not be mapped and should be noted. Some columns from the source will be dropped. Some columns will have a direct mapping, whereas for other columns, the relationship will be derived. This derivation could be based on one or more columns and/or separate schemas. The derived mapping requirement should consist of an algorithm detailing how the column is to be populated based on the source data. This is usually found by talking to the POC ‘s. Again, keep in mind that the size of the source cannot be smaller than the destination column and you must compensate for data type differences. A source value larger than the destination column or inserting the wrong type of data will cause an Exception to be thrown during the migration. A separate schema can be created for the purpose of translation. In other words, lets say that a specific source column maps directly to a destination column and that the only differences are that the possible values contained in these columns have the same meaning but are not exactly the same (different words, same meaning). A ‘middle’ table or ‘look-up’ table can be used in this case. Where one column would contain the possible source values and the other column would be the direct mapping to the possible values in the source table. Once the mapping is complete and the client is in agreement on what has been defined the requirements gathering process is complete and coding can begin.
You will need to create application modules for each data source (connection). This is the beauty of using Oracles ADF BC4J technology. Persistence is easy. Remember, that there exists a project level defined connection. So when you are creating your view objects and you cannot see the schema during the create-wizard process you must change the default database at the project level. My application made use of four application modules: source, destination, middle (translation and exceptions tables) and an outside schema that was used for translation purposes. Each of these will contain the prospective view objects pertaining to its relative table.
The source application module contained about 7 tables. The destination application module contained view objects of 2 tables; 1 parent table and 1 child related table. The middle application module contained view objects of 2 translation tables, copies of the destination tables and a migration-linking table.
The two translation tables were used as a mapping between the source and destination columns that had the same meaning but different representation of the data.
The tables representing the copies of the source tables are used to store the records being migrated and any status associated with them of which all constraints are stripped and no parent/child relationship exists. These tables are important to the migration as it gives an accurate account for each migrated record: whether or not it was successfully migrated. It can also aid in the manual clean-up work that inevitably accompanies migrations.
The linking table has two main purposes. The first is that both the source and destination schemas contain a primary key labeled as meeting id and are each generated by a sequence number generator. In order to maintain a link between the two schemas we store the mapping of the migrated records, which consists of the meeting id of the source table and the newly generated meeting id for the destination table. The other purpose of this table is for the initialization methods found in the main class and are used to clear out the newly migrated data when performing back-to-back migrations for testing purposes.
There exist two distinct design patterns found in this effort. The first is at the application level and the second is found in the unit testing level.
At the application level the design pattern I used for this application consisted of a single translator class, a utility class, and two bean classes. View objects data bound to the source, destination and middle tables were contained in their respective application module. And of course we had a main class for executing the application.
The translator class is used as a medium for translating source values into destination values where each method implements a mapping algorithm based on the migration requirements. The translator class includes, as members, the Application Modules needed to perform the translations. The utility class contains methods for performing various repeating tasks and I pulled private methods from out of the translator class and placed into the utility class as well, which allows for unit testing of private classes. The bean classes implemented temporary storage for the translated source data, and contained getters and setters for each column value of the destination tables. Validation was also implemented which tested for values that required a not-null value based on not-null column constraints as specified during the requirements gathering process. Also included in the bean classes was a migration status variable whose purpose is to record the status and any message associated with the migration of the record, which could range in status from a successful migration to some caught exception. We could have also used Transient View Objects, which I will explain shortly. I did not implement the use of transient view objects as a means for temporary storage but it is the recommended way.
The design pattern for the unit testing consisted of connect fixtures for each application module, unit tests for the translator class, unit tests for the utility class and unit tests for the instance methods in the main class.
The connect fixtures initializes each application module as they are used in the unit tests and are fairly easy to create (show example). I created unit test classes for each translator method. I could have just created one class with a bunch of test methods but it would have gotten fairly unwieldy and the single class/method pattern makes it easier to test/debug one method at a time. I created a test suite to run all tests at once. I created one unit test for the utility class’s methods. The unit tests for the instance methods in the main class require further discussion. I created create, update and delete (CRUD) methods for each destination table. These methods were used in the initialization test methods. The initialization methods, which enable us to be able to run multiple migrations without creating duplicates, delete all records in the destination tables based on the records found in the middle table used to link the source records to the newly created destination records. The initializations are performed in the following order: the child tables are cleared followed by the parent tables and if no exception occurs we commit the transaction. Then we delete all records in the exceptions tables and in the middle table we used to link the source to the destination table. To test the initialization method we create records in each source table and run the initialization method. We test for cases where the destination table contain no records and for case where they do contain records. The migration shouldn’t continue if the initialization fails. Once initialization is successful we can continue with the migration. The selection query for the source data is based upon the requirements. We use a view object and iterate through each record. We use the meeting id to create the beans of the child records and associated parent records from each record factory, the record factories contain the translator classes, then we take each bean and insert them into the destination view objects. We call validate on each record and call commit. If transaction is successful we insert a new linked record and we insert into the exceptions tables. So we have a migrate method, insertIMTSRecord, insertIMTSSponsorRecord, insertlinkedRecord, insertSponsorExceptions, insertImtsExceptions and, as mentioned previously, the initialization methods. The initialization methods reference delete methods for each table. The test classes would then contain test methods for the insert and delete methods. The migrate method is the only method that should have public access. So how do we test the other private methods?

Validation for column values that cannot be null. By catching Exception (validating the row) or through temp bean.

For each migration candidate insert into middle table, with no constraints, including a status column of the migrated record.

Initialize methods for clearing out destination tables by using the source key to dest key mapped table.

Middle table containing source key and it’s mapped destination key.

Temporary sequence numbers, instead of using up sequences.

Inserting parent/child records what may cause errors and how to avoid them.

No comments: