2/22/18

We’re in a time where more companies are trending towards data-drivenness but there are still so many frictional points to adoption – not just in my own experience – but from other leaders in the space as well.

For further affirmation a buddy of mine (a leader at another startup) sent me this article after I was griping to him about some of the issues I have to deal with.

I extracted the following awesome quotes from which clear up a lot of misconceptions and can attest to this first-hand.

Leading

Wrong Expectations

Doing data science and managing data science are not the same. Just like being an engineer and a product manager are not the same. There is a lot of overlap but overlap does not equal sameness. Sometimes, I envy the data scientist who just has to crunch the numbers. Being a data scientist without management duties is pretty easy. Most of the time it is cleaning data sets, testing algorithms and researching new methods. A pretty comfy job. Now compare that to someone who needs to run the practice.

A leader of data science practice, needs to focus on: data governance, MDM, compliance, legal issues around the use of algorithms created and ensuring documentation just in case someone sues for wrongful use. There are hiring issues, staffing problems to deal with, budget and funding to gain, P&L to run, business cases to build, market research to conduct, vendor meetings to hold, tech life-cycle management to run, evangelizing of projects (both internal and external) and turning the work into data products that are sell-able by the company and all this while trying to ensure a profit for the company. Big difference.

Most companies try to bait-in their first Data Scientist as a pure IC and ultimately making thier lives hell for the aforementioned reasons. They don’t realize that person’s working two roles especially if there isn’t additional budget for creating larger team. The allure to having just one person, instead of building out a team is simple: Cost. But it’s one of things in the long run it’ll payoff

Workflow

Bad Methods

Agile has taken the tech world by storm. It works fairly well for software development and as a result, many companies enforce it on data science. But data science is not software development, it’s really a field of discover whereas software development is about assembly. I have worked with companies that demand agile and scrum for data science and then see half their team walk in less than a year. You can’t tell a team they will solve a problem in two sprints. If they don’t’ have the data or tools it won’t happen.

Data science is a discipline that requires its own methods. Add to that, most companies are still treating data products like they do physical products, the economics are not the same. When I build a recommendation engine, my costs per unit is pretty much zero. Unlike a physical product which has a per unit cost. I can make a million product recommendations with that engine or just one. Other than the electricity, my cost per recommendation is the same. The cost to do the same in the physical world would be a lot more with very different cost variables involved. We have to understand that the economics of data products is different. A lot of large companies don’t even have this conversation. Finance is used to physical products economics which can cause frustration for those running data products as they are having one hand tied behind their back.

By the way, download this awesome book if you’re ever curious about workflow best-practices for a data scientist.

Recruiting

Academia can’t do For-Profit

Many large companies have fallen into the trap that you need a PhD to do data science, you don’t. The top 5 data scientists I have ever worked with, only 1 had a PhD and it wasn’t even in stats or data science, he was a bio physicist. I call this the academia trap because many companies believe they need that stats or data science PhD. There are some smart people who know a lot about a very narrow field, but data science is a very broad discipline. When these PhD’s are put in charge, they quickly find they are out of their league. They were never taught how to run a P&L, manage a team, deal with people, competitive intel, market assessments, building a business case, etc… They were taught numbers and a few tools.

Add to this, the world they come from. Many peer review papers in the academic world that are really good, don’t see the light of day. Why? The reviewers may have a competing theory and don’t want their current established ideas to get superseded. It is shocking how often this happens in the data science space. I always found the academic world more political than the corporate world and when your drive is profits and customer satisfaction, that academic mindset is more of a liability than an asset. Not to mention, I have yet to see a data science program I would personally endorse. It’s run by people who have never done the job of data science outside of a lab. That’s not what you want for your company.

_Most of my friends in Data Science are PhDs. I’ll be the first to say they’re very enjoyable people to be around, collaborate etc. But the BEST performers I’ve seen, never had a PhD, mind you aside from being an IC to building teams I’ve spent a lot time as a T.A. at a bootcamp where I came across a wide variety of backgrounds: The top performers for each cohort was never a PhD while I was there.

This is one of the painful realities about external perception as most outsiders tend not to take you seriously as a Data Scientist until they see those advanced credentials_

Github LinkedIn Blog