The Big Data Gold Rush, but where are all the miners?

In 2017, data replaced oil as the most valuable resource on Earth. The more data a company collects, the more it can control.

For that reason information has officially become the new “gold” in our modern technological economy.

Old time miner panning for gold

Over 70% of companies’ CIOs agree that ‘big data’ has the potential to revolutionize their way of doing business

In the recent past collection and analysis of so called "big data" was largely only available to the tech giants, such as Google and Facebook, because of the massive amount of resources required to process and store it. Think of all the information generated on the scale of billions of active users daily.

However, with processing and storage becoming less and less of a factor, almost every company has started exploring their own ‘big data’. Every company began to rush towards collecting as much as they could, but the workforce wasn’t quite as prepared as the corporations were.

data gold rushIn the excitement that accompanied this new wave of technology, the personnel required to support it, its safe to say, was a bit of an oversight. While over 70% of companies’ CIOs agree that ‘big data’ has the potential to revolutionize their way of doing business and even generate new lines of revenue, a staggering 83% of data scientists reported a shortage in their field.

 

A staggering 83% of data scientists reported a shortage in their field.

Additionally, information technology and data science employment opportunities were projected to grow 11% from 2014 to 2024. As time has shown this has been proven to be true, and was a quite conservative estimate. Especially considering that in 2018 there were 151,000 unfilled Data Science positions in the United States alone.

Data analysisThe problem with this is that data, like any other commodity, is useless until refined or processed. Companies are feening for resources capable of sorting through their data so they can exploit the leverage it offers.

The problem is that Data Scientists are few and far between. Talent retention is hard and infeasible, because larger companies with fatter wallets tend to woo employees away with ridiculously large salaries. Training new employees with different backgrounds is difficult because there is such a steep learning curve to data analysis.

This year on Indeed, there was a 15% difference between job postings and job searches regarding data science, further cementing the supply and demand crisis in this field. Some large companies are trying to combat this by offering Data Science bootcamps. For example, IBM launched a 24-month data scientist certification program in an effort to counteract this imbalance.

If no one is applying for your jobs, create your own applicants, right?

Considering that Data Science graduates currently have (and are projected to have) the highest entry level salary, it might just fix itself right. Highly Unlikely!

Some companies are trying to find ways to automate parts of the data scientist’s day to day activity. From giants like Splunk, a data storage company itself, down to startups like Query.AI, innovators are looking towards the future optimization of this field.

This train of thought makes sense, especially because 60% of the average data scientists time is spent cleaning and organizing data. 19% is spent collecting data sets, and only 9% is spent actually mining data for patterns. The rest is spent building or refining tools.

Only 9% of a data scientists time is actually spent mining data

That last 9% of time is what companies want more of, time spent identifying the patterns in the data that are invisible to humans, the ones that only ‘big data’ analysis can find.

The 79% of time split between organization and formatting can be useful, but is also mind-bendingly tedious and is only required because of our desire to replicate data from its original location to a centralized platform. Cleaning and organization can certainly be optimized, while collecting data sets is certainly a harder feat.

This problem isn't insolveable but the answer isn't what you may think. Its time we get past the legacy construct of a "central repo" for data. Data will never live in one place, so we need to explore better methods of accessing it where it lives, and in its native format, this is a key area of innovation currently underway.

Data will NEVER live in only one place

Freeing the data scientists from this less productive work daily will provide them with drastically more time to focus on what is important and increase overall production. While also providing an opportunity for tremendous cost savings at the same time.

Whether the salary incentive proves to be enough to bring in the necessary amount of new talent or if innovations in technology will have to stand in for the lack of bodies, only time will tell. For now, the problem remains. It is said that the market will meet its own needs, and right now this is its single biggest need.

Gold in pan

One thing is for certain, data is GOLD and when this problem is solved, companies will be able to tailor their businesses to their customers in a way that was previously unprecedented.

Have comments or suggestions, please share them below!

 

Sources

https://www.glassdoor.com/blog/highest-paying-entry-level-jobs-19/

https://searchbusinessanalytics.techtarget.com/feature/Demand-for-data-scientists-is-booming-and-will-increase

https://www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource-is-no-longer-oil-but-data

https://www.dataversity.net/the-state-of-data-science-expertise-shortage-creates-a-need-for-innovation/#

https://www.datanami.com/2019/01/30/whats-driving-data-science-hiring-in-2019/

https://taxandbusinessonline.villanova.edu/blog/the-talent-gap-in-data-analytics/

Posted by Eric Konynenbelt

I am a Computer Science student currently studying at South Dakota State University, and I am enthusiastic about artificial intelligence and its impact on the world. I’m an intern at Query.AI where we are introducing the world to Iris, our A.I. Analyst that utilizes natural language processing to make cybersecurity more feasible by lowering the learning curve of data analysis.