Data Science Beginner’s Guide.

Harun Mwenda Mbaabu
3 min readJan 21, 2021

Data Science is the area of study which involves extracting insights from vast amounts of data by the use of various scientific methods, algorithms, and processes. It helps you to discover hidden patterns from the raw data. The term Data Science has emerged because of the evolution of mathematical statistics, data analysis, and big data.

Data Science is an interdisciplinary field that allows you to extract knowledge from structured or unstructured data. Data science enables you to translate a business problem into a research project and then translate it back into a practical solution.

Components of Data Science.

1). Statistics:

Statistics is the most critical unit of Data Science basics. It is the method or science of collecting and analyzing numerical data in large quantities to get useful insights.

2). Visualization:

Visualization technique helps you to access huge amounts

of data in easy to understand and digestible visuals.

3). Machine Learning:

Machine Learning explores the building and study of algorithms which learn to make predictions about unforeseen/future data.

4). Deep Learning:

Deep Learning method is new machine learning research where the algorithm selects the analysis model to follow.

Data Science Process

1. Discovery:

Discovery step involves acquiring data from all the identified internal & external sources which helps you to answer the business question.

The data can be:

  • Logs from webservers
  • Data gathered from social media
  • Census datasets
  • Data streamed from online sources using APIs

2. Preparation:

Data can have lots of inconsistencies like missing value, blank columns, incorrect data format which needs to be cleaned. You need to process, explore, and condition data before modeling. The cleaner your data, the better are your predictions.

3. Model Planning:

In this stage, you need to determine the method and technique to draw the relation between input variables. Planning for a model is performed by using different statistical formulas and visualization tools. SQL analysis services, R, and Python are some of the tools used for this purpose.

4. Model Building:

In this step, the actual model building process starts. Here, Data scientist distributes datasets for training and testing. Techniques like association, classification, and clustering are applied to the training data set. The model once prepared is tested against the “testing” dataset.

5. Operationalize:

In this stage, you deliver the final baselined model with reports, code, and technical documents. Model is deployed into a real-time production environment after thorough testing.

6. Communicate Results

In this stage, the key findings are communicated to all stakeholders. This helps you to decide if the results of the project are a success or a failure based on the inputs from the model.

Data science has numerous job roles and opportunity, below is a list of common data science job roles:

  • Data Scientist
  • Data Engineer
  • Data Analyst
  • Statistician
  • Data Architect
  • Data Admin
  • Business Analyst
  • Data/Analytics Manager

Thank you for Reading, if this was helpful remember to check data science east Africa on twitter and LinkedIn for updates and more insight sand free boot camp opportunities

Data Science East Africa : https://twitter.com/DataScience_Ea

--

--

Harun Mwenda Mbaabu

Software Engineer || Data Scientist || Building Data Science East Africa && Lux Tech Academy