There’s more digital data than ever to sort through these days, and that means there are lots of work for folks looking to make sense of it all.
For businesses, the promise of pulling insights to create more efficiencies and learnings led Harvard Business Review to famously declare in 2012 that data scientist is the “sexiest job of the 21st century.” But behind every elegant insight, there’s a system that pulls together data from different sources, cleans it up and gets it ready for examination.
These days, the engineers who build those systems are in high demand. The role of data engineer has been popping up on more lists of in-demand tech jobs, and technical training programs looking to prepare folks for the workforce are including the skills that will help land these roles in programming.
At its recently completed software development training program, UMBC Training Centers put a focus on the role. Tom Cain, the Columbia-based provider’s program director for technology and computer science, brought experience as a data engineer with Federal Hill-based cybersecurity company RedOwl Analytics. For the last three years, he’s only seen it grow.
“It’s really hot right now,” he said. “As data analytics continue to grow, the folks that get the data in a format for the analyst to use are pretty important.”
Plus it pays: the average salary is around $102,000 a year, per Glassdoor.
So after initially talking to him for an overview of the program, I called him back up to talk about what these roles actually involve.
Just as software has a front-end to handle what the user sees, and a back-end that helps it run, there are two sides to data operations. Cain said the data engineers are essentially the “back-end engineers” in this equation.
The central question is: “There’s all this data out there. How do you pull that data into the database, and what pieces of the data do you want to get pulled into the database?”
They build the systems that are designed to be able to access, filter and format the data that’s coming from disparate sources ranging from websites to Microsoft Word. Once it is in the database, it is then passed on for data analysts to do the work of extracting meaning.
A primary programing language for this is Python, which can help to get the data from the internet into a database, as well as the libraries associated with the language. That’s what Cain focused on in the software development course.
SQL is another key language used to communicate with a database. The data can come in variety of formats, as well. It could be CSV or JSON.
And there are different approaches to take, such as writing scripts that help to ingest the data into the database. There are often many different scripts running at the same time, so a key task of maintaining them is to make sure they’re working.
When it comes to career trajectory, Cain said junior data engineers typically handle most of the coding. At a more senior level, engineers meet with leaders or customers to determine what kind of data is needed in the database, and what kind of system should be built to do it.
After gaining the initial skills, there are also different directions to go, whether it’s a specific role in data engineering or into a customer-facing role.
“Once you’re a data engineer, the sky’s the limit,” he said.
This editorial article is a part of Software Development Month of Technical.ly's editorial calendar.
Before you go...
To keep our site paywall-free, we’re launching a campaign to raise $25,000 by the end of the year. We believe information about entrepreneurs and tech should be accessible to everyone and your support helps make that happen, because journalism costs money.
Can we count on you? Your contribution to the Technical.ly Journalism Fund is tax-deductible.
Join our growing Slack community
Join 5,000 tech professionals and entrepreneurs in our community Slack today!