Modern Data Science
While traditional Data Science has been around for years, it has taken front-and-center-stage today with "Modern Data Science". Businesses are now looking to use information from Big Data sources to understand relationships with internal and external factors on Revenue and Profitability. Customer and Prospect behaviors and buying patterns are very key to the Modern Data Science push. Big Data or "Rich Data" is the main driver for Modern Data Science and has advanced the use of numerical and statistical techniques that can be leveraged for Large Enterprises.
"The defining characteristic of our times is access to an abundance of data. Effectively, everything that we do is recorded. What we eat, what we watch, how we socialize, what we buy, and what our machines do. It almost appears like the DNA of our lives is recorded."
Philippe Rigollet, Professor with the Department of Mathematics and the Institute for Data, Systems, and Society at MIT
We have taken a strong focus recently in Modern Data Science to contrast it with our Lean Six Sigma (LSS) areas around the use of statistics and how to leverage the two bodies of knowledge within our Advanced Business Analytics solution set. Amazon is an outstanding example of a company we believe integrates these areas very successfully in day-to-day business activities.
Below is a graphic that shows Alexicon's view of Modern Data Science including Big Data and the important areas for overall Company success:
We found that Business Management, LSS and Modern Data Science complement each other individually and together as a learning platform. All areas can be coordinated and used efficiently and effectively with large Enterprises today from Strategy to the Store or Shop-floor.
If you are working to meet the next quarterly goal or looking for strategic or future solutions, we can help. Don’t just survive, thrive in the Modern Data Science era and welcome the Digital Age by getting your Company familiar with a cohesive enterprise approach to Advanced Business Analytics. With the current hotbed of rapid technology innovations with Analytics and Modern Data Science, major corporations are challenged with seamlessly integrating these major methodologies for performance improvements.
Business Management, Lean Six Sigma and Modern Data Science (Integrated Enterprise Approach)
Managing a number of analytic software tools with limited in-house expertise can slow analytic advancement and online service demands by company users. A comprehensive analytic strategy is needed to utilize the best mix of tools with an overall plan for the entire system including databases as the main focus. Business requests for more Visualizations or Data Sources is an ongoing process which requires new integrations in the enterprise data model or landscape. This is where we believe our cross-methodology experience mentioned in this webpage will help us to provide the correct guidance to help you and your Company to clarify current progress, direction and benefit you expect or envision from spending project dollars around Advanced Business Analytics.
Since our founding, we have heavily used Business Management methodologies in BI systems (18 years). This also includes the use of advanced calculations and some use of statistics or computations around Regression, Prediction and Classification areas. We believe all business areas highlighted in orange below present opportunities today for Modern Data Science for most major corporations.
Most often statistical techniques and methods are handled outside BI systems with external applications (or Excel) and then result data is often piped back to BI through the data warehouse or joined in the frontend BI tool. Many times, data for the external tool is even taken from a BI download in the first place. Much of our work is moving this work to advanced backend data models while preserving the cutting-edge exploration and discovery work done by valuable Analysts using a variety of frontend tools including Minitab, SAS, MS Excel and others.
In-database Computations and Better and Faster
The place for proven computations is in the database (not client desktop tools). Algorithmically efficient computations are important with big data sets. They must also be meticulously vetted for accuracy. Users expect response times from newer databases in under two seconds versus many seconds or minutes from traditional big reporting systems (still need to balance detail and summary levels). The goal is to move computations to run in the database at super-fast speeds with summary or aggregate results sent to the visualization layer or report. Smaller result sets from the database are easier on the network as well. Fast query results help BI Tool Users with quick access to get what they want when needed. User and Analyst time typically runs at a premium. Many BI systems are a learning hub for a company by fostering user exploration and use (speed is important). Highly used systems provide great Return On Investments or ROIs across the Income Statement (P&L) and Balance Sheet over many years.
Lean Six Sigma (LSS)
LSS gives us Process-based Statistics for Organizational Management Change and system changes. There is a heavy use of Regression, Prediction, Classification and Hypothesis Testing in LSS. A good example is "Design of Experiments" (DOE) or controlled tests to predict accurate outcomes by introducing changes in settings or parameters to tune the desired outcome. This is different than "Training Data" with Modern Data Science; however, it shares the same concept (tuning outcome accuracy). LSS uses "Control Charts" to monitor variation in processes to identify things that are out-of-control or out-of-the-norm (variation). Larger deployments are where these systems help users spot unwanted variation and deal with it quickly within high data volume and velocity environments (control limits are also used in databases as triggers for alerts and/or automated actions). LSS systems provide clarity around Root Causes for undesired occurrences and how they are resolved. Many companies share quality data at some level with all users in the enterprise to foster organization communication on issues or problems.
We equally feel all Lean Six Sigma statistical areas below can be used at scale with Big Data sources and with Modern Data Science
. Also, that Modern Data Science will evolve to use many LSS methods and techniques for process-based monitoring and detection as well as many other improvement uses.
LSS fosters company-wide performance with proven process-based improvement techniques for major and smaller corporations. It is a tool box for senior managers looking to improve the way a Company operates through coordinated Change Management. Our focus over years has been to integrate LSS with Business Management methodologies to add and promote measurement method knowledge, user visibility and hands-on “Root Cause” identification and ways to resolve problems.
Modern Data Science
Today, Modern Data Science stands on the shoulders of LSS and traditional Data Science for statistical uses and goes much further with big and rich data. Modern Data Science uses Regression and Prediction, Classification, Hypothesis Testing and adds Deep Learning and methodologies for “Recommendation Systems”. Some people see these as modern Artificial Intelligence (AI) systems. Early advances were made by Google pioneers for Hadoop while managing a world-class internet search service at scale. Facebook has been key for advanced Social enablement, data collection and recommendation engines. Our world has advanced rapidly with “Big Data” and Modern Data Science because of these commendable and early efforts. Data Science is an emerging area of study for many college programs.
"Data Scientist" has also become a popular occupation with Harvard Business Review dubbing it "The Sexiest Job of the 21st Century"
"Data Science Process" Diagram (Wikipedia) - Decomposition (orange points)
Below is the process shown in Wikipedia for Data Science. Your will notice that steps 1 through 3 are traditional steps for the Extract, Transform and Load (ETL) process for Enterprise Data Warehouses (EDWs). Data Science can include formal or end-user informal ETL. For Big Data sources, Trifacta is used to load Hadoop (HDFS) from multiple sources including high-velocity streaming data with fast algorithms running during the transformation (T) process for real-time complex cleaning, integrating and loading. The orange circle numbers and notes were added by Alexicon to demonstrate our understanding and focus on embedding learned know-how from Data Science activates (4) into the formal (1, 2 & 3) ETL process and/or included in the database layer (3 & 5) as embedded Models & Algorithms for runtime computations. The EDW will end up having the "Clean Dataset" or table(s) and Algorithms or Computation code for database use.
"From the business perspective, data science is an integral part of competitive intelligence, a newly emerging field that encompasses a number of activities, such as data mining and data analysis".
Advanced uses in massive multidimensional spaces is key. These new systems consider many dimensions and numeric values including where you live, likes, dislikes, gender and many other aspects of your publicly available information or “Consumer Data”. There is also an abundance of Worldwide Government and Business Data. Common use areas for data are financial, geographic points/areas, weather and Industrial IoT (machine or sensor data). We see a very strong relationship between Multidimensional BI and High-dimensional Statistics. Both have potentially many dimensions and many numeric values in one related or analyzed schema on conformed dimensions or matched data to relate and contrast different measurable aspects of a business.
Modern Data Science makes newer and valuable capabilities like “Recommendation Engines” possible. We see these Recommendation Engines in-play when shopping online with similar product recommendations or ads being displayed while we browse or shop. These engines are able to understand and associate people-to-products or people-to-people because of High-dimensional Statics. This is the big win! Looking at more than two dimensions gets difficult for people to see and comprehend (too complex for the human mind). We typically use a two-dimensional (2D) space or plane or X,Y Plot to correlate or associate variables. Moving from 2 to 5 dimensions really gets difficult much less tens to hundreds of dimensions and/or numeric values that exist in Enterprise and Big Data environments today.
“Data Science promises to find the needle(s) of information in a haystack of data that could not be done well or at all manually”, Philippe Rigollet, Professor MIT. This is exactly right. We are doing things today that were not possible years back because of High-dimensional Statistics. This is where Modern Data Science and Machine Learning capabilities have taken a quantum-leap in recent years around prediction within massive data sets that have High Volume, Variety and Velocity (3V's). This creates complex and fast moving or even streaming data environments where Modern Data Science is used as ETL to Transform and Load data on-the-fly and/or perform real-time actions.
Data that is classified correctly allows users to contrast and compare different aspects of the Enterprise with Advanced Business Analytics capability. We could draw a resemblance between Modern Data Science and Radar when it was originally introduced. Radar was used for military purposes to allow our forces to have long-range visibility even in darkness and fog (humans could do this without radar). This is much like High-dimensional Statistics and finding needed or important information in massive multidimensional spaces without the machine’s help to see what is not humanly possible otherwise just like with radar.
Recommendation Systems have fostered modern advancements with High-dimensional Statistics through Deep Learning and other techniques used by early adapters like Amazon, Netflix, Yelp, Pandora and Tinder for online use and matching (recommending) People-to-products or People-to-people.
Amazon (integrated example)
As mentioned in the opening, Amazon is a fine example of a company that sees the use of Big Data, Modern Data Science and Lean Six Sigma for internal operations, website operations, product recommissions and taking orders. We believe there is much crossover that occurs within a company like Amazon with these bodies of knowledge. They have mastered the management of process and data so well that they offer Amazon Web Service (AWS) so other companies can take advantage of their advanced system infrastructure and architecture to store, compute and analyze data with advanced capabilities.
Alexicon is committed to helping Customers increase enterprise performance capabilities with Advanced Business Analytics using proven Business Management methods, Modern Data Science and Lean Six Sigma. We believe this combination is unique and valuable to all companies when integrated and coordinated (integrated techniques and methods). This is especially true for major corporations with initiatives to increase sales and/or profit while controlling costs.
The time to get involved in Modern Data Science is now
Your Company can start with an "Enterprise Analytics Health Check" and a review of known external and internal data sources or data landscapes to identify where Modern Data Science capability can be used to provide added analytic power for your Enterprise.
Modern Data Science is key in our mind to the success of Major Corporations in coming years as competition increases. Also with new opportunities that are coming for American Business, the ability to ramp-up operations to meet that new demand to ensure the best possible portion of the market before new entries or competitors take that early and important share.
Big Data Links:
► Big Data Analytics
► Trifacta - Big Data Wrangling (Modern ETL for Hadoop)
► MapD - GPU-powered Datadase and High-resolution Visualizations (Big Data power with much less hardware)
Contact Us to learn more about Modern Data Science