Why Keep-It-Simple Matters in Data Science, And How To Achieve It
"SEE IT BIG, AND KEEP IT SIMPLE" - Wilferd Peterson
There are many quotes about the importance of simplicity. Recently, I came across the one above by Wilferd Peterson. Probably Wilferd had said this for the profession of data science, as it applies so strongly to almost every step and aspect of the data science process, as listed below:
Start Simple By Leveraging Existing Resources
Data science is a new discipline for most organizations. If that is the case, the organization does not need to wait for a big team to start with. It can start small and simple by a) engaging an experienced data scientist, which could be an external part-time contract, and b) identify potential talent from existing resources. The external resource should help the organization to a) identify the data science opportunities, b) coach existing resources, and c) execute simple use cases with the highest possible business values.
Make Your Communication Intuitive
Communication is the most critical aspect of a data science project, especially in the beginning. Most projects either take too long or never get kicked off properly because of communication challenges. Therefore, it is very important that you identify the key business partners and leverage the principle of simplicity while communicating with them. Try to avoid complicated and scary data science jargon in your communication with the business. Rather keep your communication intuitive and focused.
Don’t Try to Boil The Ocean
Data scientists are often tempted to bring in every possible internal and external data source for a given problem statement. That may take a long time before the start of data exploration and visualization. Rather one can start with immediately available data sources, and then decide the need of the other data sources.
Simple Models Always Generalize Well
Everyone loves deep learning. However, most data science problems can elegantly be solved with simple machine learning models, such as regression or decision trees. Therefore, always narrowcast to the simple models before jumping to model training. This could save a lot of time and computational resources.
Leverage the Power of Cloud
Most data science projects get stuck at the model deployment phase. Primarily, this is caused by (on-premise) legacy systems and technologies. Cloud platforms offer a whole lot of tools and enable businesses to deploy machine learning models quickly. You can consider going with a hybrid model deployment solution.