Preface

Data science is transforming the way we solve problems, make decisions, and uncover insights from data. From generative AI chatbots—such as ChatGPT, DeepSeek, and Gemini—to personalized recommendations on Netflix and fraud detection in banking, data-driven techniques are revolutionizing industries. Whether you are new to data science or an experienced professional, Data Science Foundations and Machine Learning Using R provides an accessible, hands-on introduction to this exciting field—no prior analytics or programming experience required.

This book is designed for those who want to understand data science and machine learning, as well as develop data-driven solutions for real-world applications. It provides a structured and accessible introduction to the fundamental concepts, making it suitable for beginners while also equipping readers with the skills to apply machine learning algorithms effectively. Whether you are a student, a business professional, or a researcher, this book offers a practical and intuitive approach to learning data science using R. Having taught these topics myself, I have found this hands-on approach to be highly effective and well-received by students.

Unlike many theoretical texts, this book emphasizes a learning-by-doing approach. Each topic is introduced through step-by-step explanations, practical examples, and exercises—some worked out manually to build intuition, while others involve coding implementations in R. Readers will not only develop a solid theoretical foundation in machine learning and intelligent systems but also gain experience with widely used tools and techniques for applying these concepts in practice.

The remainder of this preface outlines the book’s structure, how it can be used in different learning environments, and the foundational knowledge needed to get the most out of it.

Why This Book?

Data science is a rapidly evolving field that integrates machine learning, statistical techniques, and computational tools to extract meaningful insights from data. This book provides a structured introduction to data analysis and machine learning using R, a powerful open-source programming language widely adopted for statistical computing, visualization, and predictive modeling.

Unlike many textbooks that assume prior programming experience, this book is designed to be intuitive and hands-on, engaging readers through real-world applications to accelerate the learning curve. Concepts are introduced step by step and immediately applied to real datasets, reinforcing both theoretical foundations and practical skills. Readers will not only learn key data science techniques but also gain hands-on experience in solving analytical problems.

Compared to proprietary software like SAS or SPSS, R offers a free, flexible, and highly extensible environment for data science. With its vast ecosystem of packages, R is a preferred choice for data mining, machine learning, and exploratory analysis across academia, industry, and research.

Who Should Read This Book?

This book is designed for anyone interested in learning data science and machine learning, particularly those new to the field. It is suitable for:

Business professionals seeking to leverage data for better decision-making,
Students and researchers applying data analysis in their studies or projects,
Beginners with no prior experience in programming or analytics,
Anyone interested in exploring data science and machine learning using R.

Whether used for self-study, in a classroom setting, or as a reference for professionals, this book provides a practical and structured approach to applying data science techniques in real-world scenarios.

What You Will Learn

This book provides a structured, hands-on introduction to data science and machine learning using R. Through practical examples and real-world datasets, you will:

Gain proficiency in R programming – Master the core syntax, data structures, and essential functions for data analysis.
Implement the data science workflow – Apply a structured approach to data cleaning, transformation, and exploration.
Uncover patterns through exploratory data analysis (EDA) – Use statistical and graphical techniques to identify insights.
Build predictive models – Develop machine learning models for classification, regression, and clustering.
Assess and refine models – Evaluate performance using key metrics and optimize algorithms for better accuracy.
Apply data science techniques in real-world scenarios – Work with datasets from marketing, finance, and other domains to solve practical problems.

By the end of this book, you will have a strong foundation in data science and machine learning, equipped with the skills to analyze complex datasets and develop effective data-driven solutions.

How This Book Is Structured

This book is designed as a hands-on guide, progressing from foundational concepts to more advanced machine learning techniques. Each chapter builds upon previous topics, ensuring a logical learning experience and reinforcing practical skills.

To illustrate concepts effectively, we use real-world datasets (see Table 0.1), which are available in the liver package. These datasets make it easy for readers to follow along with examples and exercises. Below is an overview of the book’s chapters:

Chapter 1 – Introduction to R, including installation and essential functions.
Chapter 2 – The foundations of data science and its methodology.
Chapter 3 – Techniques for cleaning and transforming data.
Chapter 4 – Exploratory Data Analysis (EDA) using visualization and summary statistics.
Chapter 5 – Fundamentals of statistical analysis and hypothesis testing.
Chapter 6 – Overview of machine learning models.
Chapter 7 – The k-Nearest Neighbors (k-NN) algorithm.
Chapter 8 – Model evaluation techniques and performance metrics.
Chapter 9 – The Naïve Bayes classifier for probabilistic modeling.
Chapter 10 – Linear regression for predictive modeling.
Chapter 11 – Decision trees and ensemble methods like Random Forests.
Chapter 12 – Introduction to neural networks.
Chapter 13 – Clustering techniques such as k-means.

Each chapter concludes with practical exercises that reinforce key concepts and provide hands-on experience with real-world datasets. These exercises are designed to help readers apply what they’ve learned, deepening their understanding of data science and machine learning.

How to Use This Book

This book is designed for self-study, classroom instruction, and professional learning. Readers can follow the chapters sequentially for a structured learning experience or refer to specific topics as needed.

To maximize learning and practical engagement:

Run the Code Examples – All examples are designed for interactive execution in R to reinforce key concepts.
Complete the Exercises – Practical exercises at the end of each chapter strengthen understanding and problem-solving skills.
Modify and Experiment – Tweaking code and testing variations enhance comprehension and adaptability.
Use It as a Reference – Once familiar with the fundamentals, this book serves as a valuable resource for real-world applications.

This book has been successfully used in data science courses at the University of Amsterdam and is well-suited for similar academic programs and professional training initiatives.

Datasets Used in This Book

This book incorporates real-world datasets to illustrate key data science and machine learning concepts. Table 0.1 provides an overview of the datasets used throughout the book, all of which are included in the liver package.

Table 0.1: List of datasets used for case studies in different chapters. Available in the R package liver.
Name	Description	Chapter
churn	Customer churn dataset.	Chapters 4, 6, 7, 8, 10
bank	Direct marketing data from a Portuguese bank.	Chapters 7, 12
adult	US Census data for income prediction.	Chapters 3, 11
risk	Credit risk dataset.	Chapter 9
marketing	Marketing campaign performance data.	Chapter 10
house	House price prediction dataset.	Chapter 10
diamonds	Diamond pricing dataset.	Chapter 3
cereal	Nutritional information for 77 breakfast cereals.	Chapter 13

These datasets were selected to provide hands-on exposure to real-world problems across domains such as marketing, finance, and predictive modeling. By working with these datasets, readers will develop the practical skills necessary to apply data science techniques effectively in professional settings.

The datasets listed in Table 0.1 are primarily used in case studies throughout the book. Additionally, the liver package contains over 15 datasets, providing readers with further opportunities for exploration and hands-on practice. While the main text focuses on a subset of these datasets, additional datasets are integrated into the exercises at the end of each chapter, allowing readers to extend their learning beyond the core examples presented in the book.

Using This Book for Teaching

This book is well-suited for introductory data science and machine learning courses or as a supplementary resource for analytics training. Its structured approach, hands-on exercises, and real-world case studies make it an effective learning tool for students at various levels.

To support structured learning, the book includes over 500 exercises, categorized into three levels:

Conceptual Exercises – Reinforce theoretical understanding and fundamental principles.
Applied Exercises – Require hands-on data analysis using real-world datasets.
Advanced Exercises – Explore complex applications and machine learning techniques, encouraging deeper engagement with the material.

Instructors using this book will have access to supplementary teaching materials, including an instructor’s manual, slides, and test banks. These resources provide a comprehensive framework for delivering data science and machine learning courses, whether in academic settings or professional training programs.

Welcome

1 The Basics for R