Software Engineering to Data Science

Rose Day - Mar 28 '20 - - Dev Community

The other day I saw a question from @yadlra posted:

Hey guys, what skills do I need to be a data scientist? Do the same skills apply as being a software developer?

I posted a lengthy response but felt others would also benefit from a post on the topic.

In 2017, I graduated college as a computer engineer and began working full time before making the transition into data science. I have found the skills learned from software to be greatly transferrable to data science, especially when you are looking to build tools to work with big data.

Currently, I work on a team that focuses on tools and machine learning application, but there are many areas of data science that one could consider. Below, I outline some of the areas, skills, and resources that I looked into during my transition into the field.

Areas of Data Science

There are many fields in which data science can be applied. Recently, I have spoken at two colleges for talks on data science and was asked a series of questions:

Is there a field in which you cannot utilize data science?

My answer to this question was no. I believe that every field could utilize data science if the data is available, or in turn, be able to use it in the future, if the data is made available. I have even seen artists utilize data to create sculptures. Once you have picked a field of interest, you can find data related jobs in data science, data engineering, and more.

Did you study data science in your undergraduate or is it okay to have a Bachelors degree in another field first?

Like mentioned earlier, I did not study data science initially but transitioned to it when I went back for a masters. I have found that with most data scientists I have worked with, they have studied something else first before moving into data science. These individuals specialized in computer science, different areas of engineering, mathematics, and more. It is perfectly fine to specialize in one area and then use the knowledge gained there to further your data science education. You will bring different aspects to data science that others may not see due to your background.

How did you go about searching for jobs?

One thing I have learned while job searching in the field is that not all titles mean the same thing. Titles ranged from data scientist, data analyst, business analyst, data engineer, and more. I stopped searching by title and started searching by keywords as it helped me narrow down job descriptions that fit with what I was looking for. Keywords I used when searching focused on the languages I wanted to program in (Python), the tools I wanted to get exposure to or keep using (Tableau, Spark, Git), and the type of work I was looking for (IoT, Connected Systems, Sensors).

Another method of job searching that can be a valuable tool is networking, both virtually and attending meetups. For example, I have found LinkedIn to be a wonderful tool to network virtually with colleagues in my field, and to connect with them on what their companies are doing and the types of conferences or events they are attending.

What skills have helped you in your career and what tools have you used?

Once I found a field to specialize in, I began focusing on the skills I wanted to learn. The first job I had in data science was a co-op as a data engineer. During this time I worked to learn more about Docker for containerization of data applications, databases, and ETL processes. After this job, I began working as an engineer in applied data science. At this point, I began focusing my skills more on statistics / math, data cleaning, big data handling, data storytelling and visualizations, and tool development. Of these two jobs, tool development and application development was where I was able to leverage more of my software background, while more data analysis and visualizations were where I was able to use more of my data science background. During both jobs, I have found programming to be a valuable skill, especially object oriented programming and writing clean code. Skills can vary job by job but finding out what you enjoy doing will aid you not just in finding a job but in enjoying it too!

If you are looking for tools and frameworks to learn, here are just a few to consider:

  • Jupyter Notebooks for prototyping code
  • Git for version controlling notebooks and other code
  • Spark and Hadoop for big data processing
  • Databases and data storage for different types of data, understanding when to use what type of data store will be helpful
  • Tableau and other BI tools for dashboarding and visualizations
  • Python visualization libraries: Bokeh, Matplotlib, etc.
  • R and its respective libraries / R Shiny

What resources have you found helpful to continue learning outside of going to college?

A good book to look at when starting in data science is An Introduction to Statistical Learning: With Applications in R. This was a great read and very helpful when I was starting out. I still keep a copy at work as a reference for when I am working on something.

If you are looking for another good read that is more software focused, Clean Code by Robert Cecil Martin is a great book focused on how to write well structured code that is easy to read. The book focuses on Java to teach the skills but I apply the same concepts to Python.

Another good resource for books is the publisher O'Rielly. They have tons of good books on data science and Python. I have found their books to be helpful and keep some of them as reference at work as well.

I also recommend looking into open source data sets available to see what flavors of data exists and to determine your preference on what kind of data you want to work with. Some types of data include spacial data, image processing, structured or unstructured documents, etc.

There is so much you can look into with data science, just pick somewhere to start and run with it!

. . . . . . . . . . . . . . . . .