Programming for Data Engineers/Warehousing? via /r/learnprogramming

I got my degree in statistics and took a few data science courses and after that I dabbled in some personal projects using python & R. However I noticed for a lot of jobs they are looking for people that have experience working with huge datasets, like petabytes or data, and they're asking for pipelines, scaling, ETL etc.

Where can one learn about all these things and practice writing scripts for the various software?

Things like: Hadoop, Hive,Pig and MapReduce ( I see mentioned a lot but to be honest I don't understand exactly what these do, but a lot of employers seem to ask about this). Building pipelines and optimizing them. Practice Extract , transform, and load.

I'm not sure if I'm missing any other big concepts, please feel free to mention anything that seems important as well as good source to learn about it.

Submitted July 13, 2017 at 08:05PM by TheAceInTheHole
