College for Professional Studies
MS Computer and Information Technology
School of Computer & Information Science
Thesis - Open Access
Number of Pages
The integrated Public Health Information System (iPHIS) system requires a maximum one hour data latency for reporting and analysis. The existing system uses trigger-based replication technology to replicate data from the source database to the reporting database. The data is transformed into materialized views in an hourly full refresh for reporting. This solution is Central Processing Unit (CPU) intensive and is not scaleable. This paper presents the results of a pilot project which demonstrated that near real-time Extract, Transform and Load (ETL), using conventional ETL process with Change Data Capture (CDC), can replace this existing process to improve performance and scalability while maintaining near real-time data refresh. This paper also highlights the importance of carrying out a pilot project to precede a full-scale project to identify any technology gaps and to provide a comprehensive roadmap, especially when new technology is involved. In this pilot project, the author uncovered critical pre-requisites for near real-time ETL implementation including the need for CDC, dimensional model and suitable ETL software. The author recommended purchasers to buy software based on currently available features, to conduct proof-of-concept for critical requirement, and to avoid vaporware. The author also recommended using the Business Dimensional Lifecycle Methodology and Rapid-Prototype-Iterative Cycle for data warehouse related projects to substantially reduce project risk.
Date of Award
© Wei-Chwen Wilson
All content in this Collection is owned by and subject to the exclusive control of Regis University and the authors of the materials. It is available only for research purposes and may not be used in violation of copyright laws or for unlawful purposes. The materials may not be downloaded in whole or in part without permission of the copyright holder or as otherwise authorized in the “fair use” standards of the U.S. copyright laws and regulations.
Wilson, Wei-Chwen Soon, "Near Real-Time Extract, Transform and Load" (2007). Student Publications. 317.