Here is my journey as a data analyst, software engineer, data scientist, and engineer.
To simplify things, I will use the term AI to represent machine learning (ML), natural language processing (NLP), computer vision (CV), deep learning (DL).
I think the job of a data analyst is to analyze and derive business insights from the data.
When I was working as a data analyst, the main tools I used were spreadsheets, business intelligence tools (Qlik, Tableau, PowerBI), and some programming.
Here are the few things you need to do to excel in this role:
- Be detail-oriented. You must be able to spot mistakes that others will miss
- Understand the business objective. Produce content that will contribute to marketing or sales
- Clear communication. Deliver the insights to the target audience in a clear and easy to understand format
After two years of working as a data analyst, I have gotten decent at building monthly reports and dashboards. I have worked with data in various industries (mobile dating, startups accelerator, logistics). I see that most of the data don't differ much, and I felt ready to take on a new challenge.
Without much background in SE, I had a steep learning curve. Since the job was to build data visualization for the internal web dashboard, I decided to focus on front-end engineering.
SE is a means to an end, not an end in itself. I told myself not to go down the SE rabbit hole. Eventually, I realized that it's hard to focus on one area in SE and ignore the rest as they are connected. Since then, I am still learning about the different areas of SE.
- Learn a language by building something
- Don't drift too far and focus on solving the task
- Learn by reading experts' code
After working as a Software Engineer that focuses on building data visualization (D3.js), I thought that my skillset was too niche. BI tools such as Tableau are sufficient for most companies to answer their data questions. It will not be cost-effective for them to hire an engineer to focus on building dashboards and reports. I got to pivot into a more general role with higher demands. I knew that data science is my passion, so I got a job as a Data Scientist.
In this job, I am both a Data Scientist and a consultant. The company I worked for provides ML software to help financial institutions with fraud detection. As the bank keeps the data center on-premise, I have to work in the bank every day, as a vendor.
As the second Data Scientist in a company with less than 15 employees, I got to work with different aspects of solution delivery. First, I will set up the equipment and install all the essential software required for the ML software. Then I will have to look through and confirm if the bank provides the necessary data. Once verified, I will conduct data exploration, cleaning, transformation, and training of ML models. Finally, when the results look promising, we will present to the client our proof of concept (POC). For POC that succeeds and becomes a production project, I will spend months in a single bank. I work closely with the Software Engineers to iterate and improve the ML software with new insights and outliers.
At the start, I enjoy the job a lot as I get to work with the financial (big) data and different data schemes by the banks. But after 4 POC, when I knew what data points to look for, it becomes a routine. While I still discover small insights sometimes, I felt that my rate of growth slows down. I felt like a DS generalist. I can do ML and NLP. DL is not required in this company as the financial regulators do not accept a black-box ML solution. Specialization is not required as a simple solution works well enough for most cases.
Understanding the limitations of AI, I only trust it for data exploration purposes. I will not bet my money or life on it to make the final decision. I think only companies like Google have the resources (data, money, brains) to build a reliable general-purpose AI. For a startup, the only way to survive is to focus on building a specialized AI that only does one thing, but is the best in the market.
I see myself as an executor than a researcher. Spending the majority of the time reading research papers to improve the model by 0.1% accuracy is not what I want. I know that in the long term, I want to start my own company, SE will be more useful than DS. Hence, I look for a job as a data engineer/architect. It is a role that requires both SE & DS skills.
Data Scientist tips
- Clear communication is key. Normal people don't understand the model you built, you have to make it clear and simple
- Storytelling. People rely on ML prediction to get a rough idea, but they will not trust it to make any serious decision, especially when money is at stake. You have to convince people why they should continue to hire you when it's a system that they can probably live without for most cases
- Software engineering. You may not be the one who codes everything, but you need to write decent code, have a good understanding of basic SE in order to work flawlessly with your software engineers
Data Engineer / Architect
As always, in a small startup, many things are barely functioning. I have the opportunity to design and build the data architecture without obstacles since I am the only person doing it.
With my software engineering and data science experience, I can see the gap between software and data teams.
I will use database design as an example.
To the software team, database design is to make sure the data are normalized, relationships defined, it's clean and fast.
To the data team or non-technical team, data normalization is unnatural and doesn't make sense. Why would you split the data into different tables instead of consolidating everything in a table? To build an ML model, we will need all the available features in columns (denormalized data), a complete opposite of what the software team is doing.
In a big company, the solution is data warehouses. The software team can design their normalized data in the database while the data team can get their consolidated data in the data warehouse.
In a startup, with limited resources, a data warehouse is not the solution as it will incur extra hardware costs.
My solution is to build both normalized and denormalized versions of the same data inside a database, one version for each team.
It would not be a problem for me to rebuild if necessary since I am the creator. I am in charge of the data team. I have data scraping, pipelines, modeling, infrastructures, databases, APIs, and R&D projects to handle. With limited time, I know I can live with this compromise. Startups move fast and most wouldn't survive long enough for you to worry about scalability issues, I need to balance the speed of execution and scalability of my design.
I am capable of building a production data-driven product myself. I have worked at 5 startups and totally comfortable with starting my own company right now if making good money is not a concern.
While the main reason I worked at startups is to learn the "secret" to building a successful company, the secondary goal was to become the head of data. I thought that this should be an accelerated career path, unlike MNCs where it would take decades for you to get that title. I thought that skills and achievements should make it a clear and straight path towards the head position. But I learned that it's naive. Founders are pretty much like dictators. They set the rules and you are totally naive to think that they will play or negotiate "fair". Some of them may have such strong character that even becoming the chief of data is not worth the emotional roller coaster.
Now, I no longer feel the excitement that I once felt with startups, with the exception of my own startup.
I am sure as my skill reaches the next level, what I am looking for will change again.