Home Subscribe

6. Probability and Statistics

Engineering, say, mechatronics engineering, may involve designing, building, and maintaining complex systems that integrate mechanical, electrical, and computer components. Engineers in this field must analyze uncertainties, make inferences from data, and design robust systems. This article will cover probability distributions, statistical hypothesis testing, and regression analysis, focusing on their applications in mechatronics engineering. We will also provide examples of using Python libraries like NumPy, SciPy, and pandas for statistical analysis.

6.1. Probability Distributions

Probability distributions are essential for understanding the likelihood of different outcomes in a random process. In mechatronics, they can be used to model uncertainties in sensor measurements, component reliability, and system performance. This section will focus on two common probability distributions used in engineering: the normal (Gaussian) distribution and the Poisson distribution.

6.1.1. Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a continuous probability distribution defined by two parameters: the mean (\(\mu\)) and the standard deviation (\(\sigma\)). The probability density function (PDF) of the normal distribution is given by:

\[f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2 \sigma^2}}\]

In mechatronics, the normal distribution can be used to represent noise in sensor measurements, variations in manufacturing tolerances, or fluctuations in control signals. Engineers can apply the central limit theorem, which states that the sum of a large number of independent random variables tends toward a normal distribution, to approximate complex systems.

6.1.2. Poisson Distribution

The Poisson distribution is a discrete probability distribution that models the number of events occurring in a fixed interval of time or space. The Poisson distribution has a single parameter λ, which represents the average rate of events. The probability mass function (PMF) is given by:

\[P(k) = \frac{\lambda^k e^{-\lambda}}{k!}\]

In mechatronics, the Poisson distribution can be used to model the number of failures of a component in a given time period, the number of incoming requests in a communication network, or the number of particles detected by a sensor in a specific time interval.

6.2. Statistical Hypothesis Testing

Statistical hypothesis testing is a method used to determine if there is enough evidence to reject a null hypothesis in favor of an alternative hypothesis. In mechatronics, hypothesis testing can be applied to compare the performance of different algorithms, sensors, or components. This section will focus on two types of hypothesis tests: the t-test and the chi-square test.

6.2.1. T-test

The t-test is used to compare the means of two independent samples, such as the accuracy of two different sensors. The t-statistic is calculated as:

\[t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}\]

where \(\bar{x}_1\) and \(\bar{x}_2\) are the sample means, \(s_1^2\) and \(s_2^2\) are the sample variances, and \(n_1\) and \(n_2\) are the sample sizes. The t-statistic is then compared to a critical value from the t-distribution to determine if the null hypothesis can be rejected.

6.2.2. Chi-square Test

The chi-square test is used to determine if there is a significant relationship between two categorical variables, such as the type of motor and the occurrence of failures. The chi-square statistic is calculated as:

\[\chi^2 = \sum_{i=1}^{n} \frac{(O_i - E_i)^2}{E_i}\]

where \(O_i\) represents the observed frequency, \(E_i\) represents the expected frequency, and n is the number of categories. The chi-square statistic is then compared to a critical value from the chi-square distribution to determine if the null hypothesis can be rejected.

6.3. Regression Analysis

Regression analysis is a statistical technique used to model the relationship between a dependent variable and one or more independent variables. In mechatronics, this can be applied to predict the remaining useful life of a motor, estimate system performance, or optimize control parameters. This section will focus on two types of regression analysis: linear regression and logistic regression.

6.3.1. Linear Regression

Linear regression models the relationship between two variables as a straight line. The equation for a linear regression model is:

\[y = \beta_0 + \beta_1 x\]

where \(y\) is the dependent variable, \(x\) is the independent variable, and \(\beta_0\) and \(\beta_1\) are the model parameters.

Using Python, you can perform linear regression with the scikit-learn library as follows:

import numpy as np
from sklearn.linear_model import LinearRegression

# Sample data
x = np.array([0, 1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([0, 1, 1.5, 2.5, 3.5, 4.5])

# Perform linear regression
model = LinearRegression().fit(x, y)

# Model parameters
beta_0 = model.intercept_
beta_1 = model.coef_[0]

print("Model parameters:", beta_0, beta_1)

6.3.2. Logistic Regression

Logistic regression is used to model the probability of a binary outcome, such as a component failure or successful communication. The logistic function, also known as the sigmoid function, is given by:

\[p(x) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x)}}\]

where \(p(x)\) is the probability of the binary outcome, \(x\) is the independent variable, and \(\beta_0\) and \(\beta_1\) are the model parameters.

Using Python, you can perform logistic regression with the scikit-learn library as follows:

import numpy as np
from sklearn.linear_model import LogisticRegression

# Sample data
x = np.array([0, 1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([0, 0, 0, 1, 1, 1])

# Perform logistic regression
model = LogisticRegression().fit(x, y)

# Model parameters
beta_0 = model.intercept_[0]
beta_1 = model.coef_[0][0]

print("Model parameters:", beta_0, beta_1)

6.4. Exercises

Idea

Example 1

Suppose you have a dataset of motor temperatures and corresponding motor lifetimes. How would you use linear regression to estimate the remaining useful life of a motor given its current temperature?

Solution:

In this example, we will perform linear regression with the motor temperatures as the independent variable and the motor lifetimes as the dependent variable. Then, we will use the fitted model to predict the remaining useful life given a new motor temperature.

import numpy as np
from sklearn.linear_model import LinearRegression
# Sample data (replace with actual data)

motor_temperatures = np.array([50, 60, 70, 80, 90]).reshape(-1, 1)
motor_lifetimes = np.array([10000, 9000, 8000, 7000, 6000])
# Perform linear regression

model = LinearRegression().fit(motor_temperatures, motor_lifetimes)
# Predict remaining useful life for a new motor temperature

new_motor_temperature = 75
predicted_lifetime = model.predict(np.array([[new_motor_temperature]]))
print("Predicted remaining useful life:", predicted_lifetime[0])

This code uses the LinearRegression class from the scikit-learn library to fit a linear model to the dataset. It then predicts the remaining useful life of a motor with a temperature of 75 degrees using the predict method of the fitted model.

Idea

Example 2

You have collected data from two different sensors measuring the same physical quantity. Perform a t-test to determine if there is a significant difference between the means of the two sensors.

Solution:

In this example, we will use the ttest_ind function from the scipy.stats module to perform a t-test on the collected data from the two sensors.

from scipy.stats import ttest_ind
# Sample data (replace with actual data)

sensor1_data = np.array([1.1, 1.2, 1.3, 1.4, 1.5])
sensor2_data = np.array([1.3, 1.4, 1.5, 1.6, 1.7])
# Perform the t-test

t_stat, p_value = ttest_ind(sensor1_data, sensor2_data)
print("t-statistic:", t_stat)
print("p-value:", p_value)

This code uses the ttest_ind function to perform a t-test on the sample data from two sensors. The t-statistic and p-value are printed, which can be used to determine if there is a significant difference between the means of the two sensors.

Idea

Example 3

Given the following data points, fit a linear regression model and find the model parameters (\(\beta_0\) and \(\beta_1\)):

x = [2, 4, 6, 8, 10]
y = [1.5, 3.1, 4.8, 6.6, 8.5]

Solution:

In this example, we will use the LinearRegression class from the scikit-learn library to fit a linear model to the dataset and extract the model parameters (\(\beta_0\) and \(\beta_1\)).

import numpy as np
from sklearn.linear_model import LinearRegression
# Given data points

x = np.array([2, 4, 6, 8, 10]).reshape(-1, 1)
y = np.array([1.5, 3.1, 4.8, 6.6, 8.5])
# Perform linear regression

model = LinearRegression().fit(x, y)
# Extract model parameters

beta_0 = model.intercept_
beta_1 = model.coef_[0]
print("Model parameters: beta_0 =", beta_0, ", beta_1 =", beta_1)

This code uses the LinearRegression class to fit a linear model to the given dataset. It then extracts the model parameters \(\beta_0\) (intercept) and \(\beta_1\) (slope) and prints them.

Idea

Example 4

A mechatronics system experiences random component failures over time. The number of failures in a given month follows a Poisson distribution with \(\lambda = 3\). Calculate the probability of observing exactly 2 failures in a month.

Solution:

In this example, we will calculate the probability of observing exactly 2 failures in a month using the Poisson distribution formula and the poisson.pmf function from the scipy.stats module.

from scipy.stats import poisson

lambda_ = 3
k = 2

P = poisson.pmf(k, lambda_)
print("Probability of observing exactly 2 failures in a month:", P)

This code uses the poisson.pmf function to calculate the probability of observing exactly 2 failures in a month, given that the number of failures follows a Poisson distribution with \(\lambda = 3\).



Add Comment

* Required information
1000
Drag & drop images (max 3)
Enter the word shark backwards.

Comments (1)

Avatar
New
Mwangi Muriithi

Learnt a lot about modelling.

The number of the total global nuclear arsenal is around 12500