Intro
In my previous work experience, data migrations arrived in the final phase of the project when the development activities were almost completed. The migrations, always minor, concerned the loading of customers, suppliers, chart of accounts and little else. In the current company where I work, Dedalus, data migrations have represented and still represent an important part of the new ERP implementation process. In this post I would like to give you some methodological suggestions that I have obtained in the work experience of the last few months.
Method – Approach
The key concept of this post is, in pure agile methodology, migrations must be addressed and started as soon as possible. Starting in the early stages of the project to talk about migrations, there are only benefits. In fact, planning the migration activity means starting to carry out analysis sessions in which it will be possible to define which data within the legacy systems it is important to consider. During this data exploration activity of legacy systems it will be discovered (unfortunately) that many existing data are inconsistent, incorrect or dirty and it will therefore be necessary to perform data cleansing in order to obtain information that is atomic, coherent and clean.
In addition to the analysis of the legacy systems, of course, the new target system that is coming to life in the meantime must be understood. Usually the first stages of migrations concern master data which in the case of an ERP are global address books with addresses and contacts, customers, vendors and only subsequently transactional data.
The purpose of the migration is therefore to make a mapping between the information of the legacy systems and the target system. By performing this analysis, one can then begin to understand how the new target system works. Understanding which are the key and mandatory fields of basic entities such as global address books, customers or suppliers is in fact the first step in becoming familiar with a new ERP.
With this analysis / mapping activity it is therefore possible to find out if, in the development of the new system, it is necessary to add new fields to contain information that, in legacy systems, were of fundamental importance. From these words we understand how the migration activity is necessarily iterative and how, by inserting it in an agile context, we can only have benefits.
Tools
To carry out data migrations, as mentioned, the architectural pattern to use is certainly pipes&filters. To implement this pattern, you can choose from software already available on the market (both open source and paid – with license) or you can decide to create your own ETL software.
To make this choice it is essential to understand:
- can the data structure of legacy systems change? (generally it should be stable but it may happen that you have to consider new tables)
- is the data structure of the target system stable enough? (since the target system is under development, new fields or even new tables could be added)
- is the transformation process simple or is it necessary to write very complex code scripts to transform the data?
The answer to these three questions will determine whether a particular ETL tool is right for you. For example, if the input and output data structure is subject to change, it will be essential to find software that is able to vary the mapping with sufficient ease. If the transformation logic is complex, you have to be careful to find software that allows you to easily write code (and I had a hard time finding this one). In my case, given the complexity of the work, I decided to write a C#-based ETL from scratch. For the interface part with the database I choose to use a MicroOrm based on Dapper.
I structured the code by creating a solution and various projects:
- Helper project where there are utility classes for loggers or for assigning sequence numbers for the target system
- project, divided into subfolders, in which there are DAOs and DTOs of each legacy and target system
- GUI project in which there is management of a simple graphical interface
- main project in which there are macro classes, one for each legacy system, whose purpose is to contain methods that interface with the legacy systems, transform the data, and structure them as required by the target system.
Recommendations
Going more technically, when you need to migrate data such as master data (customers, suppliers, contacts, etc..) it is important to keep track of the primary key of the starting record and the primary key of the record that will be created in the new system. One could think of building a table with these fields:
- OldLegacyCode: primary key of the record to migrate
- LegacySystem: Source legacy system name
- NewCode: primary key of the record in the new system.
It is important to create the LegacySystem field because if we have more legacy systems to migrate, it is possible that some records of different systems have the same key. This transcoding table should be created for each master data to be migrated. This table can be very useful for building reports that show end users how a record was migrated and transformed. For example, in this way it is easy to understand if two customers from two different systems are conveyed to a single customer in the final system. Furthermore, when transactional data will have to be migrated, it is easy to understand which customer of the new target system to assign these transactions to.