WWTD (What Would Troy Do)?: February 2010

Saturday, February 27, 2010

Variables and Identifiers

Variables and Identifiers: what’s the diff?

Some ridiculously wise guy once said, “A house is only as strong as the foundation on which it is built”. That same principle can be applied to a Java developer: A Java developer is only as good as his knowledge of basic Java fundamentals. So if your seeking some fundamental Java truth read on because this months topic is sure to build on those foundations as we will attempt to clarify the difference between a variable and an identifier.
I was often confused by this ‘concept’ (actually, I didn’t know the difference) and treated them both as one and the same because in general topics discussing Java, no distinction is made between variables and identifiers, although in some case, as will be discussed shortly, this is certainly true with primitives. Why this distinction is not rendered in some of the more basic Java topics is in my humble opinion because it doesn’t take understanding byte-code to implement ‘HelloWorld’. But as we approach more advanced topics in Java this distinction between a variable and identifier becomes quite important.
The difference between languages that implement data types for runtime objects (such as Java) and those that only use data types at compile time to generate runtime executable code to manipulate those objects (such as C/C++) is largely based in the distinction between identifiers and variables regardless of the similarities of the syntax shared between these two languages. One main distinction is that objects are handled quite differently at runtime. This characteristic difference impacts how objects are manipulated in the language in general. For a Java developer, understanding this difference and specifically, in terms of how java treats objects, will aid in the understanding of various concepts throughout Java such as when objects are being passed between other objects or around a network using tools such as RMI, or the concept of an interface. Both these concepts and more, succinctly become quite clear to the programmer upon the understanding of the nature of objects at compile time and runtime.
An identifier is the name and associated data-type that identifies a variable in a Java source file or more simply, the name of the variable that is in the source code for the program.
For instance:
int aPrimitiveInt
where, aPrimitiveInt is the name and int is the data type.
A variable, on the other hand, is the actual instantiation (the memory storage of data values being manipulated in the program) of that data-type at runtime or simply the actual memory that is allocated at runtime.
For instance:
Int aPrimitiveInt = 24;
where, 24 is the variable, or in this case, the actual contents stored in memory.
For clarity sake, a data-type is a set of data values and operations on those data values.
As a ‘newbie’ to Java I held a rather fuzzy understanding of this difference between a variable and an identifier and hence confused the definitions and concepts and referred to both as a variable. As it stands, I was half right: in the case for primitives (int, float, double, etc) the difference between the identifier and variable is irrelevant.
…well, sort of.
Java implements two mechanisms to ensure type matching between compile time identifiers and runtime variables: one for objects, which is by ‘design’, and one for primitives, which is not so much a mechanism by design but more or less a mechanism as a result of design. The specifics resulting in the mechanism for primitives are beyond the scope of this article and will not be discussed. However, the mechanism itself will.
This specific mechanism is that the data type of the variable referenced by the identifiers for primitives cannot be changed in Java. In other words, if a variable of a different data type is used in an assignment statement for a primitive, it always involves copying the variable and converting the copy to the correct type during compilation.
Primitive data types are built directly into the Java virtual machine and therefore share the same representations and operations for primitive data types as the Java compiler. This means that the compiler can generate the code that manipulates these variables and once the compilation is complete, their data types can be safely forgotten. So therefore, the programmer knows that the data type of the identifier is the same data type of the variable it references.
Below is a code fragment to help visualize this concept:
int i = 1.234f;
Here, we have an int and we want to set it to a float variable. During compilation, since the data type of the variable referenced by the identifier cannot be changed (the int cannot be changed to a float – because java says so!) the float will get copied into another variable with a data type of int and hence, the reference (1.234f) cannot change the data type (i) it (the reference) represents. So, therefore, while it is wrong to say that the identifier and the variable are the same, making the distinction for primitive data types can be ignored.
Well, I hope you all now have more of a foundational understanding the Java language and specifically of the mechanism inherent to Java concerning type matching between compile time identifiers and runtime variables for primitives. Remember, the distinction itself between an identifier (the name and data type it represents) and it’s associated variable (actual instantiation of the data at runtime) for primitives can be ignored as Java does not allow the data type of the variable referenced by the identifiers for primitives to be changed. For those that did not know this distinction and thought they were one and the same you now know different and can continue ‘not knowing’ as you, in a way, kind of ‘now knowing’, are now in a sense, a wise guy.
Next time, we’ll talk about the other mechanism (which is a ‘design’ in Java) to ensure that objects are type safe in Java: the runtime data type tag.

Dynamically Calculating Primary Keys

Oracles’ Application Development Framework business Components (ADF BC ) allows for the implementation of business rules for adding default values to entity attributes, but what type of rules can we implement for the case of a sequence generated primary key?
It is standard practice within current industry practice to define a primary key column on all object relational database tables, which in most cases is a unique number (as opposed to a unique string of characters). This unique number can be generated by a database sequence number generator or by calculating the next available value based on what is currently realized on that column. The latter presents obvious transactional problems whereas the previous is the standard way of handling unique number-based primary keys. This presents an interesting challenge when implementing business logic, which involves inserting new record/records into a database. If we create a new Row, populate the various attributes (except for the primary key) and attempt to save the transaction we will most assuredly receive a database error, as a primary key is always unique and non- null. To prevent this type of error we can implement this logic in one of two ways:
• java code (dynamically calculated default values),
• or we can use ADF BC provided class, oracle.job.Domain.DBSequence type in tandem with a database trigger
There are many approaches to dynamically calculating the sequence value with java code. I will describe two approaches, one is how I implemented the solution on one project and the second is a solution recommended by Oracle and is which I currently practice.
The java based solution I implemented, involved creating various private methods on the ApplicationModuleImpl: getNextSequence() – which called a pl/sql stored procedure to return the next available sequence, and createRecord() – which creates a new row, populates the various attributes (including a call to getNextSequence() for the primary key) and inserts the row back into the ViewObject, which ultimately inserts into the database (for the current transaction upon post changes) and in the database once commit is called.
The Oracle solution is far more eloquent and utilizes the existing ADF BC library.
The EntityImpl class contains a protected method, create(), which can be overridden to set attribute defaults or in this case, dynamically assign a database sequence number to the primary key. create() is called whenever the entity object instance is created, i.e. a new record is created from the ViewObject. In the create() method, just after the call to super.create() we use the oracle.jbo.server.SequenceImpl class which wraps Oracle database sequences. To instantiate a SequenceImpl, we need the current DBTransaction, which can be obtained through a call to the getDBTransaction() method (conveniently found in the EntityImpl class) and the db sequence name. Once we instantiate an instance of SequenceImpl we can get the next sequence and ‘automatically’ set the attribute representing the primary key immediately upon Row creation. Two obvious observations and potential issues from both ‘java code’ solutions can be asserted here: we had to add extra java code and the possibility of using up sequence numbers when the current transaction must be rolled back. I’m not saying that adding java code is necessarily ‘bad’, as certain requirements warrant this type of solution, but we certainly don’t want to use up our sequence numbers as it helps to keep our sequence consecutive.

The second way (and certainly not final) to generate a Primary Key using DB sequences is to create a trigger in the database to update the value of the database column on which the attribute is based. If a trigger is indeed created to handle the creation of the primary key, than the data-type for the entity attribute representing the primary key should be set to DBSequence. DBSequence maintains a temporary unique value in the entity cache until the data is posted at which time a new sequence value will be generated. In this way whenever a new record is created we do not have to worry about populating the primary key attribute via java code inside the service methods and/or use up our sequences. Of note, any field that uses DBSequence, except on rows that have been posted, should not be made visible to the user as the temporary value inserted into the cache has no relationship to the value that the database column will contain.
So there you have it, two options of an ADF BC general business rule for implementing the insertion of a sequence generated primary key value into newly created rows: dynamically calculating a value using java code inside ADF BC libraries or by using a DML enabled database trigger and a DBSequence as the attribute type.
Next month, we’ll talk about adding default values to entity attributes other than primary keys.

Case Study: Oracles’ JDeveloper IDE and its Use for Data-migration.

Case Study: Oracles’ JDeveloper IDE and its Use for Data-migration.

This case study attempts to enlighten readers of JDevelopers’ powerful BC4J technology and how it was used to create a Java application to perform data migration. This is by no means an exhaustive context nor does it imply that the implementation described herein is the only way to implement such an application. We write this case study merely as an example to describe common problems, solutions and technology explanations to aid readers with their data-migration ventures.
The following topics will be discussed as they all relate to data migration:
The design pattern implemented in the application for code maintainability,
Where, when and why to write custom code in the ADF generated java classes,
Solutions to some perhaps common problems encountered before and during implementation of the application,
Unit testing and a simple design pattern that, as will be seen, will make testing relatively easy and quite maintainable,
Exception handling.

On my most recent project, I was tasked to perform a data-migration between one database (specify) and another (specify). Contract requirements, by which my team and me were bound, required us to use the JDeveloper IDE development tool (Although, Eclipse was preferred and we will describe ways in which we used Eclipse). This proved to be a major advantage over other open-source technologies (specify) mainly due to ADF’s Business Components for Java (BC4J) technology as available for use from within JDeveloper (specify other possible ways to use BC4J without JDeveloper). Because of BC4J we can create accurate object relational mappings of database tables as an entity object (what is an ‘entity’ object). These entity objects can be used in such a way as to create one or more view objects based on one or more entity objects (or even no entity object in the case of a transient view object – to be discussed in a later article). In order to access the data model the view objects must be grouped into what is called an Application Module (an application requiring database connectivity, as developed by JDeveloper, must have 1 or more Application Modules). One design pattern is to create nested application modules where an Application Module is chosen to be the root module and other application modules are then nested within the root module. Another design pattern, as used in this case study, is to have separate application modules, not nested, where transactional support is not necessary or needed.
The required task was to migrate data from one applications data-model to another applications slightly similar data-model. Both applications served the same purpose, which was data tracking of PDUFA related meetings. They were similar in that they both tracked meetings. The source application tracked the history of the meeting and related drug applications and the destination application tracked the current status of the meeting as well as the associated drug application sponsors. As with any project the requirements gathering process is performed first-thing in the projects life cycle. The requirements are certainly needed yet they are not always well elaborated or a simple statement makes up the whole requirement. Such as was the case for this data migration. The stated requirement for this task was almost literally “… migrate database A to database B…”. Well, as can be seen, this picture didn’t represent a thousand words. So, the requirements had to be gathered and defined. The requirements gathering process was difficult at first. The existing schemas made available to me were not accurate. For instance, tables were missing from the source database, specifically the main table. There were tables in both schemas that were not being used in production at all which required determination of which tables to use. Access to both schemas is needed. Point of contacts(POC’s) for each schema is needed. You’ll need POC’s for both the source and destination applications, they can direct you to documentation for the application. You’ll need POC’s of someone that represent the users of the applications. You’ll also need POC’s for the databases themselves. There are several components required for a successful migration. You’ll need to understand the purpose of the source application as well as the purpose of the destination application, which in this case study the purpose of the source and destination application are the same. This knowledge can be obtained from the applications user guide, instruction manual and related media (hopefully documentation was generated). You’ll need access to both schemas including read/write access. You’ll need data dictionaries for both schemas. If these do not exist or cannot be found readily, then the data dictionary must be created through research, as was the case here. This can take significant time. The data dictionary is important as it gives you the data types, size, not-null constraints and the meaning and/or possible values for each column. This information is required to have and create correct mappings between the datasources. When building the mapping definitions keep in mind which columns the source database are required and which are not (can be null). Some columns will not be mapped and should be noted. Some columns from the source will be dropped. Some columns will have a direct mapping, whereas for other columns, the relationship will be derived. This derivation could be based on one or more columns and/or separate schemas. The derived mapping requirement should consist of an algorithm detailing how the column is to be populated based on the source data. This is usually found by talking to the POC ‘s. Again, keep in mind that the size of the source cannot be smaller than the destination column and you must compensate for data type differences. A source value larger than the destination column or inserting the wrong type of data will cause an Exception to be thrown during the migration. A separate schema can be created for the purpose of translation. In other words, lets say that a specific source column maps directly to a destination column and that the only differences are that the possible values contained in these columns have the same meaning but are not exactly the same (different words, same meaning). A ‘middle’ table or ‘look-up’ table can be used in this case. Where one column would contain the possible source values and the other column would be the direct mapping to the possible values in the source table. Once the mapping is complete and the client is in agreement on what has been defined the requirements gathering process is complete and coding can begin.
You will need to create application modules for each data source (connection). This is the beauty of using Oracles ADF BC4J technology. Persistence is easy. Remember, that there exists a project level defined connection. So when you are creating your view objects and you cannot see the schema during the create-wizard process you must change the default database at the project level. My application made use of four application modules: source, destination, middle (translation and exceptions tables) and an outside schema that was used for translation purposes. Each of these will contain the prospective view objects pertaining to its relative table.
The source application module contained about 7 tables. The destination application module contained view objects of 2 tables; 1 parent table and 1 child related table. The middle application module contained view objects of 2 translation tables, copies of the destination tables and a migration-linking table.
The two translation tables were used as a mapping between the source and destination columns that had the same meaning but different representation of the data.
The tables representing the copies of the source tables are used to store the records being migrated and any status associated with them of which all constraints are stripped and no parent/child relationship exists. These tables are important to the migration as it gives an accurate account for each migrated record: whether or not it was successfully migrated. It can also aid in the manual clean-up work that inevitably accompanies migrations.
The linking table has two main purposes. The first is that both the source and destination schemas contain a primary key labeled as meeting id and are each generated by a sequence number generator. In order to maintain a link between the two schemas we store the mapping of the migrated records, which consists of the meeting id of the source table and the newly generated meeting id for the destination table. The other purpose of this table is for the initialization methods found in the main class and are used to clear out the newly migrated data when performing back-to-back migrations for testing purposes.
There exist two distinct design patterns found in this effort. The first is at the application level and the second is found in the unit testing level.
At the application level the design pattern I used for this application consisted of a single translator class, a utility class, and two bean classes. View objects data bound to the source, destination and middle tables were contained in their respective application module. And of course we had a main class for executing the application.
The translator class is used as a medium for translating source values into destination values where each method implements a mapping algorithm based on the migration requirements. The translator class includes, as members, the Application Modules needed to perform the translations. The utility class contains methods for performing various repeating tasks and I pulled private methods from out of the translator class and placed into the utility class as well, which allows for unit testing of private classes. The bean classes implemented temporary storage for the translated source data, and contained getters and setters for each column value of the destination tables. Validation was also implemented which tested for values that required a not-null value based on not-null column constraints as specified during the requirements gathering process. Also included in the bean classes was a migration status variable whose purpose is to record the status and any message associated with the migration of the record, which could range in status from a successful migration to some caught exception. We could have also used Transient View Objects, which I will explain shortly. I did not implement the use of transient view objects as a means for temporary storage but it is the recommended way.
The design pattern for the unit testing consisted of connect fixtures for each application module, unit tests for the translator class, unit tests for the utility class and unit tests for the instance methods in the main class.
The connect fixtures initializes each application module as they are used in the unit tests and are fairly easy to create (show example). I created unit test classes for each translator method. I could have just created one class with a bunch of test methods but it would have gotten fairly unwieldy and the single class/method pattern makes it easier to test/debug one method at a time. I created a test suite to run all tests at once. I created one unit test for the utility class’s methods. The unit tests for the instance methods in the main class require further discussion. I created create, update and delete (CRUD) methods for each destination table. These methods were used in the initialization test methods. The initialization methods, which enable us to be able to run multiple migrations without creating duplicates, delete all records in the destination tables based on the records found in the middle table used to link the source records to the newly created destination records. The initializations are performed in the following order: the child tables are cleared followed by the parent tables and if no exception occurs we commit the transaction. Then we delete all records in the exceptions tables and in the middle table we used to link the source to the destination table. To test the initialization method we create records in each source table and run the initialization method. We test for cases where the destination table contain no records and for case where they do contain records. The migration shouldn’t continue if the initialization fails. Once initialization is successful we can continue with the migration. The selection query for the source data is based upon the requirements. We use a view object and iterate through each record. We use the meeting id to create the beans of the child records and associated parent records from each record factory, the record factories contain the translator classes, then we take each bean and insert them into the destination view objects. We call validate on each record and call commit. If transaction is successful we insert a new linked record and we insert into the exceptions tables. So we have a migrate method, insertIMTSRecord, insertIMTSSponsorRecord, insertlinkedRecord, insertSponsorExceptions, insertImtsExceptions and, as mentioned previously, the initialization methods. The initialization methods reference delete methods for each table. The test classes would then contain test methods for the insert and delete methods. The migrate method is the only method that should have public access. So how do we test the other private methods?

Validation for column values that cannot be null. By catching Exception (validating the row) or through temp bean.

For each migration candidate insert into middle table, with no constraints, including a status column of the migrated record.

Initialize methods for clearing out destination tables by using the source key to dest key mapped table.

Middle table containing source key and it’s mapped destination key.

Temporary sequence numbers, instead of using up sequences.

Inserting parent/child records what may cause errors and how to avoid them.

WWTD (What Would Troy Do)?

Search This Blog

Saturday, February 27, 2010

Variables and Identifiers

Dynamically Calculating Primary Keys

Case Study: Oracles’ JDeveloper IDE and its Use for Data-migration.

Blog Archive

About Me

Followers