Programming for Data Engineers/Warehousing? via /r/learnprogramming

Programming for Data Engineers/Warehousing?

I got my degree in statistics and took a few data science courses and after that I dabbled in some personal projects using python & R. However I noticed for a lot of jobs they are looking for people that have experience working with huge datasets, like petabytes or data, and they're asking for pipelines, scaling, ETL etc.

Where can one learn about all these things and practice writing scripts for the various software?

Things like: Hadoop, Hive,Pig and MapReduce ( I see mentioned a lot but to be honest I don't understand exactly what these do, but a lot of employers seem to ask about this). Building pipelines and optimizing them. Practice Extract , transform, and load.

I'm not sure if I'm missing any other big concepts, please feel free to mention anything that seems important as well as good source to learn about it.

Submitted July 13, 2017 at 08:05PM by TheAceInTheHole
via reddit


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s