The objective of this data science project is to discover early-stage (<= Series A) Asia Pacific (APAC) startups with the potential to become a unicorn (> $1 billion USD valuation).
I am a globalist. Before I look beyond APAC, I want to have an overview of my surroundings.
I am interested in startups that:
Tech in Asia (TIA) is a Y-combinator alumni Singapore startup. The Asia version of Techcrunch and Crunchbase.
TIA plays a part in helping Asia tech scene, so I will not share the data nor the method I used to get it:slight_smile:
From the data, I managed to reverse engineer and get parts of the TIA database schema.
TIA has invested decent time in compiling the companies data as this is the cleanest data I have scraped.
TIA updates the data regularly. The number of companies increases from
Based on the data, I do see some problems, so it's not perfect.
I manage to gauge the strength of TIA's Data & Software Engineer from this process.
D3 is a JS library for manipulating documents based on data. It helps to combines powerful visualization components and a data-driven approach to DOM manipulation.
D3 is the de facto standard for building complex data visualizations on the web.
Usually, engineers use backend language (Python, Ruby, Java) for the ETL process and pass the finalized data to the frontend. With D3, you can do ETL at the frontend directly.
D3 is a low-level JS library with a steep learning curve. To fully utilize D3, you need:
(array => object => array => object)
The data to feed to D3 will bloat my web app. I am outdated (v3) with D3 (v5). I do not have the time nor interest at this moment to update my D3 knowledge.
R is a language and environment for statistical computing and graphics.
I use R extensively when I was pursuing my Statistics bachelor degree. But I completely forgot R:wink:. It will take me an hour or two to pick it back up if I wish.
R is good for academia and researchers but not for a Data Scientist /& Software Engineer because it's too limited. Using Python, I can do everything that R does and more.
Based on my experience using Qlik (in 2015), the performance is horrible even if you only sync specific tables from the database.
The charts available are limited and not modern.
Similar problems with Qlik.
I'm not going to pay for a BI tool unless:
My chosen tool: Bokeh
Bokeh is a python interactive visualization library that targets modern web browsers for presentation.
It is a tough decision choosing between Bokeh and Plotly. I choose Bokeh because it has a stronger community.
Please watch this video to understand how to interact with this visualization:
This choropleth map is zoom to Singapore by default. I do this because:
The grey region on the map implies that I do not have data for these countries. Please click on the wheel-zoom button for the mouse scroll to work. The position (x-axis or y-axis) where you scroll the mouse affects the zoom, be sure to place the mouse on the right axis.
From here onwards, I am only using APAC countries startups data.
The year used is the founding date of these APAC startups.
This visualization helps to identify outlier startups based on their uncommon funding raised for their series.
In the seed round, 48 startups raised between $10-$100 Million USD. 1 startup raised $100-$500 Million USD. There are a few explanations:
If these startups are so understanding, getting their shares will instantly make you a millionaire on paper.
For this chart, I only use the year starting from 2000. The earliest year I have seen in this data is 1804.
The idea is to see if we can observe any trend in the startup industries. Perhaps in late 2017 and early 2018, we see more AI startups due to the hype.
I think industries information is not that critical for entrepreneurs. After all, you wouldn't change your industry just because it's less popular now.
From the data, I can do more, such as social media scoring and APAC VC analysis, but for this article, I decided to focus on solely the startups.
Now, I have a list of startups to keep an eye on.
In time, when I become a world-class executioner, I would like to get a piece of you:wink: