Introduction to Basic Statistics: Understanding the Fundamentals

Statistics is the lifeblood of all data-driven processes. Population censuses, marketing analytics, Netflix recommendations, Amazon’s Alexa, stock market predictions – statistical analysis is that one thread that ties all these diverse entities together. While the subject has been around since man first started playing around with numbers, the astounding proliferation of data science, analytics, and AI in recent times has re-shone the spotlight on statistics.

The essential and all-pervasive nature of the subject has made statistics a crucial aspect of high school mathematics. And if you wish to pursue a career path where data analysis is critical, you have to develop a certain level of mastery over statistics and its myriad branches. It isn’t easy, however, as the subject demands dedicated effort, sharp problem-solving skills, and clear-cut knowledge of elementary mathematics.

Kickstart your journey towards a statistics master with this write-up from statistics assignment help experts of MyAssignmentHelp Australia. It offers a concise but insightful overview of rudimentary statistics.

Fundamentals of Statistics & its Partner, Probability

You don’t need us to tell you how statistics and probability intertwine. Statistics is all about investigating data, and probability helps to tackle all the uncertainties, errors, limitations, etc., intrinsic to the data. Thus, it is quite natural to start by examining the basic concepts of probability.

- Probability concerns itself with the outcome of trials and experiments.

Probability measures are real-valued functions that are applied to sample spaces, which contain all the different outcomes of a trial. The probability of the entire sample space is always 1.

Probability measures, also known as probability distributions, are defined as the collection of events that are the most probable outcomes of a trial or experiment. The probability of any event lies between 0 and 1.

The union of multiple events leads to a compound event that occurs when one or more of the constituent events occur.

The intersection of two or more events leads to a compound event that occurs only when all the constituents occur simultaneously.

Mutually exclusive events can never occur together, while in the case of independent trials or events, the outcome of one does not affect the other.
Permutations of a space or set present all myriad ways the elements in a set can be arranged. The order of events is important.

Here’s the formula

Combinations are similar to permutations. They depict the different ways the elements in a set can be selected. The order of selection is NOT important.

The formula is

The probability of multiple events occurring in tandem is a crucial aspect of data analysis. Conditional probability is an equally crucial measure. Conditional probability and Bayes’s Theorem are two exceptionally powerful concepts in statistics and probability, and they are the foundation of numerous AI techniques.

Their importance demands a closer look.

Conditional Probability & Bayes Theorem

You may already know about unions and intersections. The union of two or more events denotes the probability of one event or the other occurring. The intersection of two or more events denotes the probability of all the events occurring together or simultaneously.

Say there are two events/outcomes, A and B, that can occur when experimenting, and we repeat the experiment N times. Let the probability of A occurring is P(A) and, for B, it is P(B). P(A∩B) denotes A and B occurring jointly/simultaneously. P(AՍB) or P(A union B) denotes either A or B occurring.

- Now, if B and A are independent, that is, the probability of A does not depend on B. Then P(A∩B) = P(A) * P(B) and P(AUB) = P(A) + P(B).

If A and B are mutually exclusive, then they cannot occur jointly, so P(A∩B) = NULL. But they can have a union; that is, one or another can occur. So, here too, P(AUB) = P(A) + P(B)
If A and B are not mutually exclusive, then P(AUB) = P(A) + P(B) – P(A∩B).

Conditional probability denotes the probability of an event occurring given the fact that another event has occurred. So, as per the above example, if A and B are not mutually exclusive, then the probability of B occurring, given that A has already occurred, is given as

P(B|A) = P(A∩B) / P(A)= Joint probability of A and B / Probability of A

From the above expression, we obtain a clearer idea about the intersection or joint probability of events. The joint probability of A and B is the product of their conditional probability and P(A), that is, P(B|A) * P(A) = P(A∩B)

If A and B are independent, then P(B|A) = P(B).
For non-mutually exclusive events, P(AUB) = P(A) + P(B) – [P(A) * P(B|A)]. Can you tell me why?
Remember, mutually exclusive events cannot have conditional probabilities, as in such cases, one event cannot occur if the other one occurs.

Baye’s theorem builds upon the axioms of conditional probability. The formula for Bayes Theorem derives from conditional probability and is denoted as

P(B|A) = [P(A|B) * P(B)]/ P(A)

As P(A|B) = P(A∩B) / P(B), we can see that simplifying the above equation yields P(B|A) = P(A∩B) / P(A).

Learn more about conditional probability and Bayes theorem through this intuitive article.

We wrap up this write-up with a glimpse of some more fundamental concepts & terms in statistics.

Essential Statistical Terms and Definitions

Discrete data has finite values, while continuous data can be infinite in their magnitude. Nominal or categorical data belong to specific categories but do not have any intrinsic ordering or ranking. Ordinal data possess some natural or predetermined order.
Random errors occur due to chance and showcase no specific pattern. Systematic errors come with certain observable patterns, and their causes can be identified.
Sample statistics are used to estimate the parameters of a large population. Sampling can be probability or non-probability-based. Non-probability sampling data can be prone to sampling bias.
Examples of non-probability sampling methods are volunteer samples, convenience samples, and quota sampling. Examples of probability sampling methods are simple random sampling, systematic sampling, and cluster sampling.
Sample data should be representative of an entire population. That’s why it’s necessary to eliminate any bias from sampled data. Bias leads to systematic errors and creeps into data during the selection and retention of the subjects of study, as well as due to flaws in the data collation process.

Different kinds of biases exist, each impacting data and inferences in its own way. Some of the most common are selection bias, volunteer bias, nonresponse bias, informative censoring, interview bias, recall bias, detection bias, social desirability bias, response bias, and conscious bias.

Measurement of the central tendency of data is a common statistical measure. Three key parameters of measurement are mean or average, median, and mode.
Mean is the average of a dataset. The formula is given as

Outliers can heavily affect the value of the mean.

The median is the middle value in an ordered dataset. It can be a great measurement of central tendency if there are too many outliers, and it affects the mean substantially.
Mode is the most frequently occurring value in a dataset.
Range, variance, and standard deviation are vital measures for estimating dispersion in datasets.
The range is the simplest dispersion measure; it denotes the difference between the highest and lowest values in a set.
Variance is the most common way to measure dispersion in a continuous dataset. It estimates the average difference of every value in a dataset from the mean.

For a population, the variance formula is

, µ is the population mean

And for a sample, it’s

, x is the sample mean

The standard deviation is calculated by taking the square root of variance for both populations and samples.

And, that about wraps up this write-up. Hope this write-up help you brush up your stat fundamentals. Know that mastery of statistics and probability doesn’t come easy. You need to study & practice a lot and master all concepts & problems with all you have got. Work hard, and if need be, get some expert statistics assignment help from MyAssignmentHelp.expert, a leading academic service provider in Australia.

All the best!

Dennis T

FBI accuses China of planting malicious software in US computer networks

Usher announces Europe tour dates after electrifying Super Bowl performance