Introduction

What is Data Science?

Data Science is about storytelling and making sense of numbers.

A good preliminary step of most analyses is to visual the data and examine it.

We can gain insights from larger data sets with various algorithms so we can turn all types of data into actionable insights.

Basically :

  1. Typing Codes
  2. Create a Dendrogram
  3. Analyzing a heatmap
  4. Finding the optimal number of clusters

Skillset needed

  1. Intro to data and data science
  2. Math
  3. Statistics / Advanced statistics
  4. Python
  5. Machine Learning

Keywords of Data Science

  • Data - The data used for analytic
  • Algorithm - The tool you used for analytic
  • Insight - The observation

Various Data Science Disciplines

The data science team will use some business analytics or data analytics tools to develop models that could predict future outcomes.

History

Statistician (25 years ago)

Statistic

Responsible for :

  • gathering and cleaning data sets

  • appling statistical methods

    +Growth of data

    +radical improvement of technology

  • extracting patterns from data

Data mining specialist (20 years ago)

Data mining

Responsible for :

  • gathering and cleaning data sets

  • appling statistical methods

    +Growth of data

    +radical improvement of technology

  • extracting patterns from data

    +new models

  • performing more accurate forecasts

Predictive analytics specialist (10 years ago)

Predictive analytics

Responsible for :

  • gathering and cleaning data sets
  • appling statistical methods
  • extracting patterns from data
  • performing more accurate forecasts

Data Scientist (Now)

Data Science

Responsible for :

  • gathering and cleaning data sets
  • appling statistical methods
  • extracting patterns from data
  • performing more accurate forecasts

Difference Between Analysis and Analytics

Analysis

  • Separate dataset into chunks and study them individually and examine how they relate to other parts.
  • Analyses on things in the PAST (already happened things)
  • To Explain How and Why

Qualitative Analysis

intuition + analysis (e.g. BI)

Quantitative Analysis

formulas + algorithms (e.g. SWOT)

Analytics

  • Explores potential FUTURE
  • the application of logical and computational reasoning to the component parts obtained in an analysis
  • Looking for patterns in exploring what you can do with them in the future

Qualitative Analytics

intuition + analysis

Quantitative Analytics

formulas + algorithms

Business Analytics, Data Analytics and Data Science

not a strict representation

Left = Analysis, Right = Analytics

The above graph just to help you understand it better. No need to remember this.

Some terms explained

Data Science

  • A discipline reliant on data availability, while business analytics does not completely rely on data
  • can be used to improve the accuracy of predictions based on data extracted from various activities typical for drilling efficiency.
  • Tools include : Statistical, mathematical, programming, problem-solving, data-management

Digital signal

  • used to represent data in the form of discrete values which is an example of numeric data
  • data analytics can be applied to digital signal in order to produce a higher quality signal

Preliminary data report

  • first step of any data analysis

Artificial intelligence (AI)

  • simulating human knowledge and decision making with computers

Symbolic reasoning

  • based on high-level human-readable representation of problems and logic
    • Speech recognition
    • Image recognition

Connecting the Data Science Disciplines


When and Why - keyword explained

Data

Information stored in a digital format which can then be used as a base for performing analyses and decision making.

Traditional data

  • In the form of tables containing numeric or text values data
  • Structured and stored in databases
  • Can be managed from one computer

Big Data

  • Extremely large data
  • can be structured, semi-structured or unstructured
  • 3V of big data
    • Volume - Require a huge amount of memory space distribbuted between many computers
    • Variety - Dealing not only numbers and text, but also images, audio, mobile datas
    • Velocity - Result could be computed immediately after the source data has been obtained

Data Science

Business Intelligence (BI)

  • the process of analysing and reporting historical business data
  • aims to explain past events using business data
  • Preliminary step of predictive analytics
  • For marking decisions, extracting insights, extracting ideas

Traditional methods

  • Perfect for forecasting future performance with great accuracy
  • Analysis include:

Machine Learning (ML)

  • The ability of machines to predict outcomes without being explicitly programmed to
  • creating and implementing algorithms that let machines receive data and use this data to:
    • make predictions
    • analyse patterns
    • Give recommendations
  • Cannot be implemented without data


What and Where - keyword explained

Tradional Data - What

Data Gathering

  • Manual - e.g. Surveys
  • Automatic - e.g. Cookies

Data Pre-Processing

  • Such data must be marked as invalid or corrected
  • Class labeling - data point to correct data type or arranging data by category
  • Data cleansing - deal with inconsistent data (e.g. spell check)
  • Dealing with missing values

Case Specific

  • Balancing
  • Shuffing datasets - prevents unwanted patterns and improves predictive performance
  • E.R diagram (Entity-relationship diagram)
  • Relational Schema

Big Data - What

Case Specific

  • Text data mining - the process of deriving valuable unstructured data from a text
  • Data Masking
    • analyse the information without compromising private details
    • conceals the original data with random and false data
    • conduct analysis
    • keep all confidential information in a secure place

Business Intelligence - What

Analyze The Data

  • metrics - aims at guaging business performance or progress
  • KPI (Key Performance Indicators) - a metric that is tightly aligned with your business objectives

Traditional Methods

Linear Regression

a model used for quantifying causal relationships among the different variables included in your analysis

  • Regression Line y=bxy = bx where b is co

Logistic Regression

the values on the vertical line will be 1s and 0s only

Cluster Analysis

  • Grouping observations together

Time Series

To visualize thing performed well or not by ploting values against time.

Mechine Learning - What

Data -> Model -> Objective Function -> Optimization Algorithm

To Better Illustrate:

Types of Mechine Learning

Supervised Learning - training an algorithm resembles a teacher supervising her students (work with labelled data)

  • SVMs (support vector machines)
  • NNs (Neural Networks)
  • Deep Learning

Unsupervised Learning - no targets, no supervising (save time) (work with unlabelled data)

  • K-means
  • Deep Learning

Reinforcement Learning - with a reward system to maximize the objective function

  • Deep Learning


How - keyword explained

Programming Languages

Python and R

  • suitable for mathematical and statistical computations
  • adaptable
  • not able to address problems specific to some domains, e.g. relational database management systems

SQL

  • working with relational database management systems

MATLAB

  • working with mathematical functions or matix manupulations
  • it is a paid service

Java and Scala

  • Useful when combining data from multiple sources (Big Data)

Software / Software Framework

Excel

  • able to do relatively complex computations and good visualizations quickly

SPSS

  • famous for working with traditional data and applying statistical analysis

MongoDB and hadoop

  • designed for working with big data

Power BI , SAS, Qlik and Tableau

  • designed for business intelligence visualizations in terms of predictive analytics use

EViews

  • working with econometric time-series models

Stata

  • for academic statistical and econometric research


Careers in Data Science

Data - Careers

  • Data Architect
    • designs the way data will be retrieved, processed and consumed
  • Data Engineer
    • processes the obtained data so that it is ready for analysis
  • Database Administrator
    • handles this control of data and works with traditional data

Business Intelligence - Careers

  • BI Analyst
    • performs analyse and reporting of past historical data
  • BI Consultant
    • external BI analyst
  • BI developer
    • perform analyses specifically designed for the company with python and sql

Mechine Learning - Careers

  • Data Scientist
    • employs traditional statistical methods or unconventional machine learning techniques for making predictions
  • Data Analyst
    • prepares more advanced types of analyses
  • ML Engineer
    • applies state-of-the-art computational models

Reference

The Data Science Course 2020: Complete Data Science Bootcamp