Computational Statistics

Chapter 1 - Introduction

Dr. Mehdi Maadooliat

Marquette University
MATH 4750 - Spring 2025

Overview of Statistical Computing

  • Computational approaches to solving statistical problems
  • Difference between statistical computing and computational statistics
  • Importance of Monte Carlo methods, optimization, and random number generation

Getting Started with R and RStudio

  • R and RStudio: Installation steps
  • Basic syntax in R
  • Common operations: Assignment, sequences, and arithmetic
x <- sqrt(2 * pi)
print(x)
[1] 2.506628
  • Assignment operator <- vs =
  • Console vs script execution in RStudio

R Help System and Distributions

  • help(), ? for accessing R documentation
  • Searching for help topics and datasets
  • Generating random variables and performing statistical tests in R
rnorm(100)
  [1] -0.614283392  0.362314144 -0.426585309 -0.099877549 -0.704261983
  [6]  0.375345066 -0.876744975  0.974284966  0.361530089  0.610527918
 [11] -0.639200344  1.128270862 -0.786435671  0.446588760  0.035015755
 [16]  0.388661249 -2.405671996  0.126799465  1.465539285 -1.723486774
 [21] -0.271741278 -0.075320140 -0.295650594  1.352586012 -0.984940803
 [26]  0.714869667 -1.158775711 -0.300157161  0.533037778 -1.560220212
 [31] -0.198796297  0.255556953  0.532473382 -0.611367302 -1.346839848
 [36]  1.923306693 -0.881336203 -1.577552360 -0.255025852  0.644805346
 [41]  0.087762678  0.146524663  1.026879001 -0.887934333 -0.978584572
 [46] -0.575728785 -0.654710694 -0.665435870 -0.656650752 -1.637254178
 [51] -0.137433261 -0.301031388 -0.344693968  1.195182454  1.109894126
 [56] -0.980553416 -0.070578986 -2.767024620  1.849400883  0.249359717
 [61] -0.217920999  0.461227598  0.521264212 -0.194657208 -0.456713338
 [66]  0.671987258 -1.832151934 -1.568633355  1.635227194  2.960119441
 [71]  0.467409291 -0.570968173 -0.178055901 -0.662760123  0.171546313
 [76] -0.067521306 -0.415675764  0.594005340 -0.346278893  0.973922732
 [81]  1.722803985  0.271547256 -0.042450192  0.089900657  0.547756550
 [86] -0.791733614  1.772325965 -1.134225953 -0.783710790  1.400260838
 [91] -0.173405277  0.158365773  0.094374710 -0.594017432  0.614984988
 [96]  0.053694329 -0.162635229 -0.812644375  0.459066468 -0.008546149
t.test(rnorm(10), rnorm(10))

    Welch Two Sample t-test

data:  rnorm(10) and rnorm(10)
t = -0.27225, df = 16.642, p-value = 0.7888
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.1900714  0.9184321
sample estimates:
 mean of x  mean of y 
-0.2039074 -0.0680878 

Distributions and Statistical Tests

Probability Distributions

  • dnorm(), pnorm(), qnorm()
  • Exploring statistical tests in R

Statistical Tests

  • t.test(), chisq.test()

Functions in R

Defining Functions

  • function(arglist) expr
  • Return values and default arguments
sumdice <- function(n) {
  k <- sample(1:6, size=n, replace=TRUE)
  return(sum(k))
}
sumdice(2)
[1] 4

Data Structures: Arrays, Data Frames, and Lists

Data Structures in R

  • Arrays, matrices, and data frames
# Creating vectors and matrices
x <- c(1, 2, 3, 4)
matrix_x <- matrix(x, nrow=2, ncol=2)
print(matrix_x)
     [,1] [,2]
[1,]    1    3
[2,]    2    4
  • Example: Iris data set
data(iris)
summary(iris)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

Table: Comparison of Data Structures

Type Definition Example
Vector 1D array of elements x <- c(1, 2, 3)
Matrix 2D array, same data type matrix(1:4, nrow=2)
Data Frame 2D array, different data types data(iris)
List Collection of objects (any type) list(a=1, b="hello")

Graphics in R

Basic Plots

  • plot(), hist(), boxplot()
plot(iris);
# boxplot(iris);
# hist(iris[,1])

Introduction to ggplot2

  • Visualizing using ggplot2
library(ggplot2)
ggplot(iris, aes(x=Sepal.Length, y=Petal.Length)) + 
  geom_point() + 
  geom_smooth(method="lm", se=FALSE, color="blue") + 
  theme_minimal() + 
  labs(title="Sepal vs Petal Length in Iris Dataset", x="Sepal Length", y="Petal Length")

Workspace and Files - Markdown and knitr

Managing Files

  • Working directories and file input/output
  • Using scripts and automation

Dynamic Documents

  • Creating reports with R Markdown
  • Introduction to knitr package

Conclusion

  • Recap of key points: R setup, data structures, functions, and plotting
  • Next steps: Exploring probability distributions in R