Data Science is in a way similar to bread production. Both domains have raw materials as input and in the case of bread production, one has flour, yeast, sugar, oil, and water. All of which are nothing more than organic or chemical compounds which can be described by their chemical composition, which is data. Data Science, on the other hand, has data as its raw material. The data can be of any nature, shape, and form – ranging from scientific measurements, energy prices on a given day, or trending topics in a social network.
The ingredients that make up bread make little sense in isolation (they certainly would not be consumed that way) and the same holds for data. To get the bread we consume, the ingredients must be combined through a specific production process which does nothing more than rearranging and filtering the organic and chemical compounds to produce bread. Data must also be combined, filtered, and enhanced to produce the final product of Data Science – which we call information or insight.
Bread is a product people use to obtain energy, insight is a product people use to answer questions. And just like bread, insight is best enjoyed freshly baked.
Satisfying the appetite. In time.
With the industrial revolution, it is now easy to set up a bread factory. The only thing one really needs is capital (in addition to some delicious recipes and a touch of the secret sauce) as all the machinery needed to transform raw materials to bread can be purchased from several different companies. The delivery of raw material is taken care of by several logistics providers. The packaging of final products can be customized to target whatever consumer need, who in turn gets it cheaply delivered by yet another logistics provider. This whole process is heavily standardized with very few specialized components – making the operation low in risk and easy to scale.
Data Science is currently very non-standardized. Instead of purchasing machines to make a factory that can perform each step of the process, one hires software engineers to build machines. These machines, or programs, take time to build, are expensive, and typically do not handle new recipes very well. As a result, programmers are hired to modify and maintain the machines to suit a particular recipe. If you are in the business of Data Science you also are in the business of building machines.
As the most valuable insight tends to come with the shortest expiry, this lack of standardization and high operational risk is the seminal challenge of Data Science as we know it today.
Giving shape to the Secret Sauce.
Bread produced in a factory comes in shapes recognized by most people. However, in the case of Data Science, where software engineers build machines for a very specific purpose, the factories make bread with shapes that more often than not are unfamiliar. Sure, some produce loaves or bagels, but you get a bunch of factories producing dinosaur- or airplane-shaped bread which consumers find hard to combine with a burger or some jam.
When deciding on which bread to go for, it typically boils down to preference, purpose, and allergies. Ultimately, it is the recipe and secret sauce that will be guiding our decision. In the case of bread, the ingredients are found on the packaging and it is straightforward to decide whether the bread is for us or not. Also, the packaging often reveals hints of the secret sauce as well; like whether the bread is stone-baked or made using pure long-fermented Grade A flour. Some bread even contains gold dust.
Eating bread is considered a safe activity. Inspections and rules set by the Food Safety Authorities ensure that we eat what we think we eat. Unfortunately, this is not the case in Data Science as we know it todaay. Getting to the point of thinking we know what we eat is surprisingly hard, and certainly not something the consumers can figure out on their own. This topic of transparency in Data Science is a rather important one and will be covered in a later post.
Keeping it real. Keeping it fresh. Keeping it safe.
Unfortunately, Data Science today tends to be less like factories and more like data sweatshops. When consumers, compliance, and other interested parties legitimately request the nutrition facts, employees often need to work double shifts. As employees have built the machines, fixing them when they break down quickly becomes troublesome when employees move on. With the analogs of gluten-free and organic trends, like GDPR and MiFID, providing fresh daily bread becomes tricky, especially since transforming often means rebuilding half of the machines.
If you want to go fast, you need to go well.
Qapio is an operating environment for Data Science and as a company, we do not produce information or insight. What we provide is the ecosystem of standardized components and APIs (machinery) that can be wired together (like LEGO®) and specialized using our SDK (toolboxes) – so others can produce insight from whatever raw data is available.
The operating environment (conveyor belt) and marketplace (more on this later) make turning data into products streamlined, much like a bread factory. So that you can keep serving that yummy and good-looking insight.
Data Science is inherently hard. Equipping the right people with the right tools make all the difference.