고객관리

고객관리

차별화 된 기술력으로 새로운 트렌드를 열어가고 있습니다.

자료실

게시물 검색

[유용한TIP] Missing Data | Types, Explanation, & Imputation

  • 2025-03-12 15:19:00
  • hit4366

 

Missing data, or missing values, occur when you don’t have data stored for certain variables or participants. Data can go missing due to incomplete data entry, equipment malfunctions, lost files, and many other reasons.

In any dataset, there are usually some missing data. In quantitative research, missing values appear as blank cells in your spreadsheet.

 

Types of missing data

Missing data are errors because your data don’t represent the true values of what you set out to measure.

The reason for the missing data is important to consider, because it helps you determine the type of missing data and what you need to do about it.

There are three main types of missing data.

 

Example: Research project

You collect data on end-of-year holiday spending patterns. You survey adults on how much they spend annually on gifts for family and friends in dollar amounts.

Missing completely at random

When data are missing completely at random (MCAR), the probability of any particular value being missing from your dataset is unrelated to anything else.

The missing values are randomly distributed, so they can come from anywhere in the whole distribution of your values. These MCAR data are also unrelated to other unobserved variables.

Example: MCAR data

You note that there are a few missing values in your holiday spending dataset. Some people started answering your survey but dropped out or skipped a question.

 

However, you note that you have data points from a wide distribution, ranging from low to high values.

Therefore, you conclude that the missing values aren’t related to any specific holiday spending amount range.

Data are often considered MCAR if they seem unrelated to specific values or other variables. In practice, it’s hard to meet this assumption because “true randomness” is rare.

When data are missing due to equipment malfunctions or lost samples, they are considered MCAR.

 

Missing at random

Data missing at random (MAR) are not actually missing at random; this term is a bit of a misnomer.

This type of missing data systematically differs from the data you’ve collected, but it can be fully accounted for by other observed variables.

The likelihood of a data point being missing is related to another observed variable but not to the specific value of that data point itself.

Example: MAR data

You repeat your data collection with a new group. You notice that there are more missing values for adults aged 18–25 than for other age groups.

 

But looking at the observed data for adults aged 18–25, you notice that the values are widely spread. It’s unlikely that the missing data are missing because of the specific values themselves.

Instead, some younger adults may be less inclined to reveal their holiday spending amounts for unrelated reasons (e.g., more protective of their privacy).

Missing not at random

Data missing not at random (MNAR) are missing for reasons related to the values themselves.

Example: MNAR data

In the new dataset, you also notice that there are fewer low values. Some participants with low incomes avoid reporting their holiday spending amounts because they are low.

This type of missing data is important to look for because you may lack data from key subgroups within your sample. Your sample may not end up being representative of your population.

 

Attrition bias

In longitudinal studies, attrition bias can be a form of MNAR data. Attrition bias means that some participants are more likely to drop out than others.

For example, in long-term medical studies, some participants may drop out because they become more and more unwell as the study continues. Their data are MNAR because their health outcomes are worse, so your final dataset may only include healthy individuals, and you miss out on important data.

 

Are missing data problematic?

Missing data are problematic because, depending on the type, they can sometimes cause sampling bias. This means your results may not be generalizable outside of your study because your data come from an unrepresentative sample.

In practice, you can often consider two types of missing data ignorable because the missing data don’t systematically differ from your observed values:

  • MCAR data
  • MAR data

For these two data types, the likelihood of a data point being missing has nothing to do with the value itself. So it’s unlikely that your missing values are significantly different from your observed values.

On the flip side, you have a biased dataset if the missing data systematically differ from your observed data. Data that are MNAR are called non-ignorable for this reason.

 

How to prevent missing data

Missing data often come from attrition bias, nonresponse, or poorly designed research protocols. When designing your study, it’s good practice to make it easy for your participants to provide data.

Here are some tips to help you minimize missing data:

  • Limit the number of follow-ups
  • Minimize the amount of data collected
  • Make data collection forms user friendly
  • Use data validation techniques
  • Offer incentives

After you’ve collected data, it’s important to store them carefully, with multiple backups.

 

How to deal with missing values

To tidy up your data, your options usually include accepting, removing, or recreating the missing data.

You should consider how to deal with each case of missing data based on your assessment of why the data are missing.

  • Are these data missing for random or non-random reasons?
  • Are the data missing because they represent zero or null values?
  • Was the question or measure poorly designed?

Your data can be accepted, or left as is, if it’s MCAR or MAR. However, MNAR data may need more complex treatment.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Grammar
  • Style consistency

 

Acceptance

The most conservative option involves accepting your missing data: you simply leave these cells blank.

It’s best to do this when you believe you’re dealing with MCAR or MAR values. When you have a small sample, you’ll want to conserve as much data as possible because any data removal can affect your statistical power.

You might also recode all missing values with labels of “N/A” (short for “not applicable”) to make them consistent throughout your dataset.

These actions help you retain data from as many research subjects as possible with few or no changes.

Deletion

You can remove missing data from statistical analyses using listwise or pairwise deletion.

Listwise deletion

Listwise deletion means deleting data from all cases (participants) who have data missing for any variable in your dataset. You’ll have a dataset that’s complete for all participants included in it.

A downside of this technique is that you may end up with a much smaller and/or a biased sample to work with. If significant amounts of data are missing from some variables or measures in particular, the participants who provide those data might significantly differ from those who don’t.

Your sample could be biased because it doesn’t adequately represent the population.

Example: Listwise deletion

You decide to remove all participants with missing data from your survey dataset. This reduces your sample from 114 to 77 participants.

 

You notice that most of the participants with missing data left a specific question about their opinions unanswered. Many of those participants were also women, so your sample now mainly consists of men.

Pairwise deletion

Pairwise deletion lets you keep more of your data by only removing the data points that are missing from any analyses. It conserves more of your data because all available data from cases are included.

It also means that you have an uneven sample size for each of your variables. But it’s helpful when you have a small sample or a large proportion of missing values for some variables.

When you perform analyses with multiple variables, such as a correlation, only cases (participants) with complete data for each variable are included.

Example: Pairwise deletion

You decide to only remove missing values, while retaining the other data points for these participants. This does not reduce your overall sample size.

 

  • 12 people didn’t answer a question about their gender, reducing the sample size from 114 to 102 participants for the variable “gender.”
  • 3 people didn’t answer a question about their age, reducing the sample size from 114 to 11 participants for the variable “age.”

You are able to retain more values this way, but the sample size now differs across variables.

Imputation

Imputation means replacing a missing value with another value based on a reasonable estimate. You use other data to recreate the missing value for a more complete dataset.

You can choose from several imputation methods.

The easiest method of imputation involves replacing missing values with the mean or median value for that variable.

 

Hot-deck imputation

In hot-deck imputation, you replace each missing value with an existing value from a similar case or participant within your dataset. For each case with missing values, the missing value is replaced by a value from a so-called “donor” that’s similar to that case based on data for other variables.

Example: Hot-deck imputation

In a survey, you ask participants to answer questions about how they rate a new shopping app from 1 to 5. You notice that two participants skipped Question 3, so these cells are empty.

 

You sort the data based on other variables and search for participants who responded similarly to other questions compared to your participants with missing values.

You take the answer to Question 3 from a donor and use it to fill in the blank cell for each missing value.

Cold-deck imputation

Alternatively, in cold-deck imputation, you replace missing values with existing values from similar cases from other datasets. The new values come from an unrelated sample.

Example: Cold-deck imputation

Instead of replacing the missing values with answers from participants from the same sample, you open a different dataset from a coworker. They conducted a similar survey but used a different sample.

 

You search for participants who responded similarly to other questions compared to your participants with missing values.

You take the answer to Question 3 from the other dataset and use it to fill in the blank cell for each missing value.

 

Use imputation carefully

Imputation is a complicated task because you have to weigh the pros and cons.

Although you retain all of your data, this method can create research bias and lead to inaccurate results. You can never know for sure whether the replaced value accurately reflects what would have been observed or answered. That’s why it’s best to apply imputation with caution.

 

 

Frequently asked questions about missing data

What are missing data?

Missing data, or missing values, occur when you don’t have data stored for certain variables or participants.

In any dataset, there’s usually some missing data. In quantitative research, missing values appear as blank cells in your spreadsheet.

Why are missing data important?

Missing data are important because, depending on the type, they can sometimes bias your results. This means your results may not be generalizable outside of your study because your data come from an unrepresentative sample.

How do I deal with missing data?

To tidy up your missing data, your options usually include accepting, removing, or recreating the missing data.

  • Acceptance: You leave your data as is
  • Listwise or pairwise deletion: You delete all cases (participants) with missing data from analyses
  • Imputation: You use other data to fill in the missing data
What are the types of missing data?

There are three main types of missing data.

Missing completely at random (MCAR) data are randomly distributed across the variable and unrelated to other variables.

Missing at random (MAR) data are not randomly distributed but they are accounted for by other observed variables.

Missing not at random (MNAR) data systematically differ from the observed values.

 

 

 

Bhandari, P. (2023, June 21). Missing Data | Types, Explanation, & Imputation. Scribbr. Retrieved July 22, 2024, from https://www.scribbr.com/statistics/missing-data/

게시글 공유 URL복사
게시물 검색
List of articles
번호 제목 작성일 조회수
165 [유용한TIP] 동어 반복 오류란? photo 2026-04-13 hit4101
164 [유용한TIP] ⚠️ 성급한 일반화의 오류란? | 정의와 예시 photo 2026-04-09 hit3765
163 [유용한TIP] Grawlix | Definition, Meaning, Use & Examples photo 2026-04-01 hit5155
162 [유용한TIP] Appeal to Emotion Fallacy | Definition & Examples photo 2026-03-31 hit5954
161 [유용한TIP] 감정적 허위(Pathetic Fallacy)란? | 정의와 예시 [What Is Pathetic Fallac photo 2026-03-26 hit6183
160 [유용한TIP] ? 허수아비 논법이란? | 논점 흐리기의 정의와 예시 photo 2026-03-24 hit3474
159 [유용한TIP] ❓무지에 호소하는 오류란? photo 2026-03-18 hit3722
158 [유용한TIP] 논문컨설팅 전문가가 알려주는 초보자를 위한 가이드, 연구 입문 가이드 photo 2026-03-09 hit3283
157 [유용한TIP] ? 논문컨설팅 진행 전 많이들 하는 실수! 감정에 호소하는 오류란? photo 2025-11-30 hit4829
156 [유용한TIP] ? 인과 오류란? photo 2025-10-16 hit3400
155 [유용한TIP] Hasty Generalization Fallacy | Definition & Examples photo 2025-04-15 hit5724
154 [유용한TIP] What Is Ecological Fallacy? | Definition & Example photo 2025-04-14 hit5646
153 [유용한TIP] Circular Reasoning Fallacy | Definition & Examples photo 2025-04-13 hit5657
152 [유용한TIP] What Is Base Rate Fallacy? | Definition & Examples photo 2025-04-11 hit5612
151 [유용한TIP] Appeal to Pity Fallacy | Definition & Examples photo 2025-04-10 hit6580
150 [유용한TIP] Appeal to Authority Fallacy | Definition & Examples photo 2025-04-08 hit7380
149 [유용한TIP] What Is Ad Populum Fallacy? | Definition & Examples photo 2025-04-07 hit6048
148 [유용한TIP] Ad Hominem Fallacy | Definition & Examples photo 2025-04-06 hit7669
147 [유용한TIP] Begging the Question Fallacy | Definition & Examples photo 2025-04-06 hit5924
146 [유용한TIP] A Beginner's Guide to Starting the Research Process photo 2025-04-05 hit4218
145 [유용한TIP] How to Avoid Repetition and Redundancy in Academic Writing photo 2025-04-04 hit10595
144 [유용한TIP] Tautology | Meaning, Definition & Examples photo 2025-04-03 hit5203
143 [유용한TIP] What Is a Metaphor? | Definition & Examples photo 2025-04-02 hit7035
142 [유용한TIP] What Is a Simile? | Meaning, Definition & Examples photo 2025-04-01 hit4873
141 [유용한TIP] How to Choose a Dissertation Topic | 8 Steps to Follow photo 2025-03-30 hit5020
140 [유용한TIP] hesis & Dissertation Title Page | Free Templates & E photo 2025-03-29 hit5119
139 [유용한TIP] How to Write a Dissertation or Thesis Proposal photo 2025-03-28 hit5048
138 [유용한TIP] How to Write More Concisely | Tips to Shorten Your Sentences photo 2025-03-27 hit4941
137 [유용한TIP] What Is a Dissertation? | Guide, Examples, & Template photo 2025-03-26 hit4497
136 [유용한TIP] How to Choose a Dissertation Topic | 8 Steps to Follow photo 2025-03-25 hit4836
135 [유용한TIP] How to Find the Range of a Data Set | Calculator & Formu photo 2025-03-24 hit3900
134 [유용한TIP] How to Find the Geometric Mean | Calculator & Formula photo 2025-03-23 hit4765
133 [유용한TIP] How to Find the Mean | Definition, Examples & Calculator photo 2025-03-22 hit5523
132 [유용한TIP] How to Find the Median | Definition, Examples & Calculat photo 2025-03-21 hit5566
131 [유용한TIP] How to Find the Mode | Definition, Examples & Calculator photo 2025-03-19 hit5598
130 [유용한TIP] Central Tendency | Understanding the Mean, Median & Mode photo 2025-03-18 hit7014
129 [유용한TIP] [Descriptive Statistics | Definitions, Types, Examples] photo 2025-03-17 hit4401
128 [유용한TIP] 슬리퍼리 슬로프(미끄러운 경사면) 오류란? photo 2025-03-16 hit3373
127 [유용한TIP] How to Find Outliers | 4 Ways with Examples & Explanatio photo 2025-03-13 hit3812
126 [유용한TIP] Missing Data | Types, Explanation, & Imputation photo 2025-03-12 hit4366
125 [유용한TIP] What Is Data Cleansing? | Definition, Guide & Examples photo 2025-03-11 hit5910
124 [유용한TIP] Ratio Scales | Definition, Examples, & Data Analysis photo 2025-03-10 hit5285
123 [유용한TIP] Interval Data and How to Analyze It | Definitions & Exam photo 2025-03-06 hit4366
122 [유용한TIP] Ordinal Data | Definition, Examples, Data Collection & A photo 2025-03-04 hit4866
121 [유용한TIP] Nominal Data | Definition, Examples, Data Collection & A photo 2025-02-27 hit5760
120 [유용한TIP] Levels of Measurement | Nominal, Ordinal, Interval and Ratio photo 2025-02-26 hit4231
119 [유용한TIP] Sampling Methods | Types, Techniques & Examples photo 2025-02-24 hit5013
118 [유용한TIP] Population vs. Sample | Definitions, Differences & Examp photo 2025-02-20 hit4284
117 [유용한TIP] Data Collection | Definition, Methods & Examples photo 2025-02-19 hit5205
116 [유용한TIP] T-Distribution | What It Is and How To Use It (With Examples photo 2025-02-18 hit4672

네이버 톡톡으로 연결됩니다