Category

Technology

Submitted By

James Ashby, james.ashby@colorado.edu, Lead Data Engineer/Architect 

Project Team

James Ashby, james.ashby@colorado.edu, Lead Data Engineer/Architect 

Project Description

I improved how the OIT Data Services team manages and runs data flows, which deliver critical campus data like student information from OIT/UIS to various Boulder groups like Student Affairs, Parking Services, and the Bookstore. Our previous system for orchestrating data flows combined multiple complex technologies in a confusing way, and made it hard to understand, monitor, debug, and modify them. I migrated our data flows to a new orchestrator called Prefect, which provides a single dashboard to schedule flows and see results. I also wrote a comprehensive Python package called “Prefect Tools” which greatly simplifies the process of developing and deploying new flows and connecting to external systems (for example, delivering patron info to Norlin Library's server). My initial migration involved rebuilding 30 flows serving 15 campus groups and laid the foundation for solving a wide range of data delivery needs with a single framework.

Project Efficiency 

This migration was a major paradigm shift for how the team managed data flows going forward, resulting in a massive simplification and clarification of the entire development lifecycle of building, testing, deploying, monitoring, and troubleshooting data flows. Through my "Prefect Tools" package, I halved the required number of source files and lines of code across our data flows, and reduced nearly 300 configuration files scattered across 100+ folders to just 60 ultra-compact configuration pages listed in one place on the Prefect Cloud website. This makes it far easier for engineers to understand data flows and modify them when needed.

Project Inspiration 

I joined the OIT Data Services team in April 2022 just as the senior-most engineer was leaving. My first challenge was to redirect all our data flows from using an old server to a new one. As I examined how everything was working, I realized we didn't need an extra server at all. Between that and the difficulty I experienced with learning how all the existing data flows actually worked, I started thinking of how to eliminate unnecessary technologies and create a new, simple, flexible way to connect to external systems.

What Makes You Happiest about this Project?

I love a clean, well-organized shop, and I think this project was a significant first step toward achieving that goal on the OIT Data Services team. When I first started, it was very difficult to do basic tasks like identify when and how a data flow failed, view or change a flow's data sources or destinations, or update a package dependency. In particular, changing a key password would have been a monumental task, so this meant several passwords had not been changed in years. Now all these things can be done quickly and easily, and that's hugely satisfying.