It's been more than a half a year since I graduated the "Applied Data Science: Machine Learning" Program at the EPFL Extension School, and this post has been long overdue.
I was quite enthusiastic when I first heard about the program, I had long wanted to formalize my knowledge in data science and machine learning, and this program seemed like the right combination: there are four theoretical modules consisting of short and concise lessons, followed by practical exercises on the same topics and a project which brings everything together. The fifth and last module is a capstone project, which is typically a bigger project of your own choosing. In general, all projects are complete end-to-end projects that allow you to practice major step and work across the entire data analysis pipeline (data acquisition and transformation, predictive modelling and analytics, neural networks and deep learning).
1. Leave all assumptions behind.
There were many highs and lows, and of course, countless cups of coffee and some sleepless nights. I learned a lot and grew along the way. Having worked with (structured) data my whole life, I thought it would be a breeze, but I soon realized I needed to leave old thinking patterns behind. Data science isn't concerned with answering specific queries, instead parsing through massive datasets in sometimes unstructured ways to expose insights. Data analytics works better when it is focused, having questions in mind that need answers based on existing data. Data science is more concerned with initial observations, future trends, and potential insights.
2. Don't start by jumping into the deep end.
There is still a lot of hype around artificial intelligence and deep learning and how mastering these techniques will build the technology of the future: self-driving cars, advanced robotics and so on. Before jumping into deep learning and natural language processing, it's important to maste the fundamentals. "Applied Data Science: Machine Learning" program starts with the techniques and algorithms og "classical" machine learning, which are the building blocks for more advanced topics.
Classical machine learning still has an incredible untapped potential. While the algorithms are already mature, we are still in the early stages of discovering impactful ways to use them.
3. Go beyond the scope.
For the capstone project I decided to work on a topic that I personally feel very strongly about. Cyberbullying (known as 'cybermobbing' in German and French) is the act of one or several people acting together to deliberately insult, tease, humiliate, harass or even blackmail one or more victims. Although it generally involves children and teenagers, it can affect anyone. Formerly, traditional bullying was limited to schools and youth crowds. However, with the growing reputation of social media and its swift embracement into our daily lives cyberbullying has become an emerging problem and it can continue at home. Social media practically exhibits various features that make them a suitable way for cyberbullies to target their victims, such as being anonymity or lack of supervision.
Working on this topic meant my capstone project would be on natural language processing, an area which wasn't covered in the program. Besides getting the chance to work with a real-word (and very messy data set), it was really fun to compare different word embeddings and vectorization techniques and use them in combination with classical machine learning models such as (Naive Bayes or XGBoost) and convolutional neural networks.
Happy learning!
created with
Static Site Generator .