Software Development

To understand what it takes to be a data engineer, start here

Among data scientists and analysts, think of data engineers as the architect. Here's a roundup of free courses and other resources to get you learning about the role.
This is a guest post by Rajvi Mehta, Women in Data — Philadelphia's regional sponsorship lead.

Skill-Based Learning is a series brought to you by Women in Data — Philadelphia that highlights different skills required for being successful in data and tech and offer resources for you to start learning. This week, we are focusing on understanding data engineering.

To understand the basic skill sets required to be a data engineer, let’s start with understanding what a data engineer’s role comprises of.

This role is ever growing and ever changing. A data engineer has the responsibility to collect, store, query, clean and manipulate databases in an efficient way. A typical data engineer works closely with data scientists and data analysts. Think of a data engineer as the architect who builds data tables for the scientists and analysts to analyze.

A very strong requirement for any data engineer role is a broad knowledge base of different database languages like SQL or NoSQL. A data engineer needs to be proficient in Python, as well as certain data warehousing techniques like Hadoop, MapReduce, HIVE and Apache Spark.

This famous blog by a data engineer who worked at Airbnb, Robert Chang, explains that day-to-day of a data engineer, especially if you are someone who is looking to understand the difference between the roles of a data scientist and a data engineer.

Free (or nearly free) courses

If you are someone who has decided to walk the path of data engineer, these three courses will be a good starting point for you:

  1. Big Data in AWS (Amazon Web Services) Cloud — Learning about AWS is crucial for an aspiring data engineer. This Udemy course is a basic introduction to all the AWS offerings. This course expects the applicant to have a basic understanding of data-related concepts like data streaming, databases and data warehousing.
  2. Data Engineering, Big Data, and Machine Learning on GCP Specialization — This online course provides participants with an introduction to designing and building data pipelines on Google Cloud Platform. At the end of this course, you will be able to work on a data engineering project.
  3. Microsoft Certified: Azure Data Engineer Associate — If you are looking for a more advanced course to upskill your data engineer knowledge, it will be beneficial to look into this course. Each module trains the user to be a successful data engineer on the Azure platform.


If you are interested in learning about new and enhanced star schema dimensional modeling patterns and certain case studies that might help you understand certain business use cases, read “The Data Engineering Cookbook.” The author, Andreas Kretz, discusses his knowledge of data engineering that is based on his data science workflow.


And the book “DW 2.0: The Architecture for the Next Generation of Data Warehousing” describes the future of data warehousing at both architecture and technology level.


If you are someone who is working as a data engineer or have an experience in this field, please let me know what kind of other resources we could add to these series of data engineering:

Subscribe to our Newsletters
Technically Media
Connect with companies from the community
New call-to-action