How to Perform Stratified Sampling in Excel

Stratified sampling is a method of sampling in which you divide your population into different groups (strata), and then randomly select samples from each group. This ensures that your sample represents different segments of your population and can lead to more accurate results. In this guide, we'll show you how to perform stratified sampling in Excel.

Preparing Your Data

  1. Organize your data: Make sure your data is in a single Excel sheet, with each row representing an individual data point and columns representing different variables. Ideally, your data should be sorted based on the variable you want to stratify.
  2. Create a column for your strata: Add a new column to your dataset to indicate the stratum each row belongs to. You can use the IF function or other formulas to assign strata based on the values in the other columns.
  3. Count the number of data points in each stratum: You can use the COUNTIF function to count the number of data points in each stratum. This will help you later when determining the number of samples to select from each stratum.

Performing Stratified Sampling

  1. Determine the sample size for each stratum: Decide the total sample size you want, and then allocate the sample size to each stratum proportionally based on the number of data points in each stratum. You can use the following formula:Sample size for stratum i = (Number of data points in stratum i / Total number of data points) * Total sample size
  2. Use the RAND function to assign random numbers to each row: Add a new column to your dataset and use the RAND function to assign a random number between 0 and 1 to each row. This will be used to randomly select samples from each stratum.
  3. Sort your data by the stratum column and the random number column: First, sort your data based on the stratum column, so that all data points in the same stratum are grouped together. Then, sort the data within each stratum based on the random number column in ascending order.
  4. Select the top samples from each stratum: For each stratum, select the top rows based on the allocated sample size in step 1. These rows will be your stratified random sample.

Example

Let's assume we have a dataset of 100 students with their test scores and we want to perform stratified sampling based on their grade level (Freshman, Sophomore, Junior, Senior).

Preparing Your Data

  1. Organize your data: Assume the dataset is in columns A and B, with column A being the student's grade level and column B being the test score.
  2. Create a column for your strata: In this case, column A already represents the strata (grade level).
  3. Count the number of data points in each stratum: Use the COUNTIF function in separate cells to calculate the number of students in each grade level:Freshman: =COUNTIF(A:A, "Freshman") Sophomore: =COUNTIF(A:A, "Sophomore") Junior: =COUNTIF(A:A, "Junior") Senior: =COUNTIF(A:A, "Senior")

Performing Stratified Sampling

  1. Determine the sample size for each stratum: Let's say we want a total sample size of 20 students. Calculate the sample size for each stratum using the formula:Freshman: (Number of Freshman / 100) * 20 Sophomore: (Number of Sophomore / 100) * 20 Junior: (Number of Junior / 100) * 20 Senior: (Number of Senior / 100) * 20
  2. Use the RAND function to assign random numbers to each row: Add a new column C and use the formula =RAND() for each row.
  3. Sort your data by the stratum column and the random number column: First, sort the data based on column A (grade level) and then sort within each stratum based on column C (random number).
  4. Select the top samples from each stratum: Based on the sample sizes calculated in step 1, select the top rows from each stratum. These rows represent your stratified random sample.
Did you find this useful?