Data Science - Introduction
Introduction
What is Data Science?
Data Science is about storytelling and making sense of numbers.
A good preliminary step of most analyses is to visual the data and examine it.
We can gain insights from larger data sets with various algorithms so we can turn all types of data into actionable insights.
Basically :
- Typing Codes
- Create a Dendrogram
- Analyzing a heatmap
- Finding the optimal number of clusters
Skillset needed
- Intro to data and data science
- Math
- Statistics / Advanced statistics
- Python
- Machine Learning
Keywords of Data Science
- Data - The data used for analytic
- Algorithm - The tool you used for analytic
- Insight - The observation
Various Data Science Disciplines
The data science team will use some business analytics or data analytics tools to develop models that could predict future outcomes.
History
Statistician (25 years ago)
Statistic
Responsible for :
-
gathering and cleaning data sets
-
appling statistical methods
+Growth of data
+radical improvement of technology
-
extracting patterns from data
Data mining specialist (20 years ago)
Data mining
Responsible for :
-
gathering and cleaning data sets
-
appling statistical methods
+Growth of data
+radical improvement of technology
-
extracting patterns from data
+new models
-
performing more accurate forecasts
Predictive analytics specialist (10 years ago)
Predictive analytics
Responsible for :
- gathering and cleaning data sets
- appling statistical methods
- extracting patterns from data
- performing more accurate forecasts
Data Scientist (Now)
Data Science
Responsible for :
- gathering and cleaning data sets
- appling statistical methods
- extracting patterns from data
- performing more accurate forecasts
Difference Between Analysis and Analytics
Analysis
- Separate dataset into chunks and study them individually and examine how they relate to other parts.
- Analyses on things in the PAST (already happened things)
- To Explain How and Why
Qualitative Analysis
intuition + analysis (e.g. BI)
Quantitative Analysis
formulas + algorithms (e.g. SWOT)
Analytics
- Explores potential FUTURE
- the application of logical and computational reasoning to the component parts obtained in an analysis
- Looking for patterns in exploring what you can do with them in the future
Qualitative Analytics
intuition + analysis
Quantitative Analytics
formulas + algorithms
Business Analytics, Data Analytics and Data Science
Left = Analysis, Right = Analytics
The above graph just to help you understand it better. No need to remember this.
Some terms explained
Data Science
- A discipline reliant on data availability, while business analytics does not completely rely on data
- can be used to improve the accuracy of predictions based on data extracted from various activities typical for drilling efficiency.
- Tools include : Statistical, mathematical, programming, problem-solving, data-management
Digital signal
- used to represent data in the form of discrete values which is an example of numeric data
- data analytics can be applied to digital signal in order to produce a higher quality signal
Preliminary data report
- first step of any data analysis
Artificial intelligence (AI)
- simulating human knowledge and decision making with computers
Symbolic reasoning
- based on high-level human-readable representation of problems and logic
- Speech recognition
- Image recognition
Connecting the Data Science Disciplines
When and Why - keyword explained
Data
Information stored in a digital format which can then be used as a base for performing analyses and decision making.
Traditional data
- In the form of tables containing numeric or text values data
- Structured and stored in databases
- Can be managed from one computer
Big Data
- Extremely large data
- can be structured, semi-structured or unstructured
- 3V of big data
- Volume - Require a huge amount of memory space distribbuted between many computers
- Variety - Dealing not only numbers and text, but also images, audio, mobile datas
- Velocity - Result could be computed immediately after the source data has been obtained
Data Science
Business Intelligence (BI)
- the process of analysing and reporting historical business data
- aims to explain past events using business data
- Preliminary step of predictive analytics
- For marking decisions, extracting insights, extracting ideas
Traditional methods
- Perfect for forecasting future performance with great accuracy
- Analysis include:
Machine Learning (ML)
- The ability of machines to predict outcomes without being explicitly programmed to
- creating and implementing algorithms that let machines receive data and use this data to:
- make predictions
- analyse patterns
- Give recommendations
- Cannot be implemented without data
What and Where - keyword explained
Tradional Data - What
Data Gathering
- Manual - e.g. Surveys
- Automatic - e.g. Cookies
Data Pre-Processing
- Such data must be marked as invalid or corrected
- Class labeling - data point to correct data type or arranging data by category
- Data cleansing - deal with inconsistent data (e.g. spell check)
- Dealing with missing values
Case Specific
- Balancing
- Shuffing datasets - prevents unwanted patterns and improves predictive performance
- E.R diagram (Entity-relationship diagram)
- Relational Schema
Big Data - What
Case Specific
- Text data mining - the process of deriving valuable unstructured data from a text
- Data Masking
- analyse the information without compromising private details
- conceals the original data with random and false data
- conduct analysis
- keep all confidential information in a secure place
Business Intelligence - What
Analyze The Data
- metrics - aims at guaging business performance or progress
- KPI (Key Performance Indicators) - a metric that is tightly aligned with your business objectives
Traditional Methods
Linear Regression
a model used for quantifying causal relationships among the different variables included in your analysis
- Regression Line where b is co
Logistic Regression
the values on the vertical line will be 1s and 0s only
Cluster Analysis
- Grouping observations together
Time Series
To visualize thing performed well or not by ploting values against time.
Mechine Learning - What
Data -> Model -> Objective Function -> Optimization Algorithm
To Better Illustrate:
Types of Mechine Learning
Supervised Learning - training an algorithm resembles a teacher supervising her students (work with labelled data)
- SVMs (support vector machines)
- NNs (Neural Networks)
- Deep Learning
Unsupervised Learning - no targets, no supervising (save time) (work with unlabelled data)
- K-means
- Deep Learning
Reinforcement Learning - with a reward system to maximize the objective function
- Deep Learning
How - keyword explained
Programming Languages
Python and R
- suitable for mathematical and statistical computations
- adaptable
- not able to address problems specific to some domains, e.g. relational database management systems
SQL
- working with relational database management systems
MATLAB
- working with mathematical functions or matix manupulations
- it is a paid service
Java and Scala
- Useful when combining data from multiple sources (Big Data)
Software / Software Framework
Excel
- able to do relatively complex computations and good visualizations quickly
SPSS
- famous for working with traditional data and applying statistical analysis
MongoDB and hadoop
- designed for working with big data
Power BI , SAS, Qlik and Tableau
- designed for business intelligence visualizations in terms of predictive analytics use
EViews
- working with econometric time-series models
Stata
- for academic statistical and econometric research
Careers in Data Science
Data - Careers
- Data Architect
- designs the way data will be retrieved, processed and consumed
- Data Engineer
- processes the obtained data so that it is ready for analysis
- Database Administrator
- handles this control of data and works with traditional data
Business Intelligence - Careers
- BI Analyst
- performs analyse and reporting of past historical data
- BI Consultant
- external BI analyst
- BI developer
- perform analyses specifically designed for the company with python and sql
Mechine Learning - Careers
- Data Scientist
- employs traditional statistical methods or unconventional machine learning techniques for making predictions
- Data Analyst
- prepares more advanced types of analyses
- ML Engineer
- applies state-of-the-art computational models
Reference
The Data Science Course 2020: Complete Data Science Bootcamp