5 keys to Document Migration without screwing it up
If anyone tells you migrating legacy documents is easy, you have my permission to kick them in the shin! Migrating legacy documents is difficult, there are many factors that business don’t think of when deciding now is a great time to organize and clean house. I don’t want to scare you, I don’t. However, there are so many factors at stake, and of course no one wants to leave valuable data in the old system when the new one is oh so empty and shiny! Around 50% of the whole deployment effort relates to document migration and the activities surrounding it. How can we ensure that we have a successful transition of legacy into a brand spanking new system and also preserve all its history? Here are some tips for you.
1. Migration is its own project
There is no one-button approach to document migration. It is a complex, time consuming endeavor. It deserves its own project plan, approach, budget, and team. An entity level scope and plan are a must have right at the beginning so there are no sudden exclamations of “Oh we forgot to load THESE reports. Who will do that?” right before the deadline.
You also have to option of doing it in one go (not recommended) or small batches every week. This is not the easy decision you think it is though. Everyone needs to be in agreement and there must be clear communication to all business and technical stakeholders of when and what data will be in the new system. This also applies to any system outages as well.
- Whose content will be migrated? HR, IT, Finance, Enterprise-wide?
- What’s the business driver for migration? New cloud-based system to store files or staff retirements or to retain tacit knowledge to just name a couple of examples.
- What content is actually eligible for migration?
- Once the data is migrated to the new system, will the legacy repository still be in use or made read-only?
- When will decommissioning of the legacy repository happen? (e.g. 1 year)
- Do we have the access to the source repository, a staging area (for cataloging, purging, and sorting) and a target system to store the files?
Which brings us to…
2. Realistically Estimate Time and Scope
Every stage of the project requires careful time consideration, including understanding the field, mapping the source field to the target field, configuring or building transformations, performing tests, measuring data quality for the field, how many ‘tags’ or categories will we apply to the documents.
There are tools to help such as Sharegate, Jitterbit, Midas, or Starfish ETL. These will help reduce time in the build phase especially.
But understanding the source data, which is the most crucial task in any document migration project, simply cannot be done by automated tools. This requires analysis to take time going through the documents to determine what effort will be required to classify, purge and enrich with new metadata.
If you want a very simplistic estimate, one 8-hour day for every collection of like content transferred from the legacy system to the new one (using excel or a datasheet that you can quickly copy and paste like property values).
There are of course exceptions, like data replication between the same source and target schemas without further transformation, also known as a 1:1 migration, where we can base the estimate on the number of tables to copy.
Creating a detailed estimate is very much an art.
Some questions to ask will be:
- Do we have the right document migration tools available?
- What formats are in scope? For example, executables and databases in or out? Emails? Microsoft Office Products? Older out of date products?
- Will the documents be related to and connected to system data, business intelligence or process automation?
- How many years of volume will be included and what’s the total volume and count of files?
- Is there any cleaning (categorizing) anticipated?
3. Checking the quality of the data
Optimism is not your friend when it comes to data quality. Even if you aren’t getting any issues from the legacy system, there will be issues and there will be many.
All new systems have new rules that might even violate legacy rules. For instance, email correspondence may be required on the new but was not in legacy.
Watch out for the occasional bump that comes with documents that no one has touched in years. Perhaps there is legacy documents that are still using Wordperfect and now needs to be converted to Microsoft Word or PDF. Check for media / format obsolescence and media degradation (e.g. floppy disks).
A good rule on this is “the older it is, the bigger mess we are going to find”. It is vital you decide early on just how much history you want to transfer to the new system based on its longer term legal and operational retention value.
Here are a few questions that will need answers:
- What keywords would enable quicker search and relatability?
- Do we have to rename files longer than 255 characters?
- Are there any files into many subfolders deep that create too long of a string to transfer?
- What do we do with files that have names like untitled.doc or doc1.doc or even joe.doc?
- Will the new repository accept special characters in files names (#,!,$,%,/)?
- What do we do with empty documents?
- Do we have to convert file formats?
- Are emails included? Skype, Slack, or Chat records? Zoom recordings?
- What do we do with redundant, outdated and trivial information (ROT)?
- After purging and cleansing, do we have the same count or a reduced count of files?
4. Engaging your business people (the content owners and creators)
Business people are the creators and consumers who truly understand the data and who, in the end, can recognize what has historical value to keep and what to dispose of. This is why it is important to have someone from the business team involved in classifying documents and mapping.
This is where running a test batch and then letting the business team go at it is in your best interest. You may hear “Oh, I see now, right, we are going to have to change that.” a lot.
Business users add “Context to content” and provide a deep understanding of the documents, where they come from, what’s the subject, location, and relation to other daily office duties. (e.g. Employee file, field investigation, road repair, mortgage assessment, litigation, event campaign, etc.)
If you don’t engage the subject matter experts, your new system is going to documents that wont be accurately related to historical and current day business activity. Furthermore, the ability to interconnect documents and system data for business intelligence and automation will be significantly hampered without this context.
Some questions to nail down with the business owners and authors:
- Who does the information belong to?
- Have some of the others left the organization? If so, who will take responsibility for this information?
- Is there 3rd party information that we are keeping as an internal record or is the 3rd party expected to keep the official record?
- are users aware and have time allocated to participate in cleaning?
5. Migration is not a “One-Time” activity
As I mentioned at the beginning, trying to do this in one big bang is not recommended. We know it is a crappy job and hope we can just be one and done, but the truth is you will probably be doing the migration in ‘waves’. This means repeating an action multiple times.
Typically, you have a dry run, which should be about 25% of the documents. Here you will be looking for accuracy and time taken to load. Second is a repeatable batch load of ‘like’ content until the entire scope of documents for upload is complete. The poorer the data quality the more runs will have to take place.
Questions to ask to optimize the migration process:
- Have there been any changes made to the source documents during this transaction period, otherwise known as deltas?
- If there are a data discrepancy errors during upload, will business users be available to investigate and update?
- Will the source repository be converted to ‘read only’ after document migration?
- When will the source and staging copies be decommissioned or deleted?
Moving legacy documents into a new system is a complex journey with a lot of hidden potholes. Preparation is everything, and expect the unexpected. The key is to have clear communication on the process, a dedicated team, and patience.