Skip to main content
HomeTutorialsSpreadsheets

How to Clean Data in Excel: A Beginner's Guide

Learn essential data cleaning techniques in Excel, including removing duplicates, handling missing values, and maintaining consistent formatting.
Updated Jul 2024  · 15 min read

In this tutorial, you’ll learn how to clean your data in Excel and prepare it for analysis. We'll cover essential techniques such as removing duplicates, handling missing values, and standardizing formatting. By the end, you'll be equipped with practical skills to ensure your dataset is accurate and ready for deeper analysis.

For a more in-depth exploration, consider taking the comprehensive Data Preparation in Excel companion course, which dives into advanced techniques and best practices for optimizing your data cleaning workflow.

What Are the Components of Clean Data?

Ensuring high data quality involves several key components, including accuracy, completeness, consistency, uniformity, and validity. These components are essential for reliable analysis and decision-making. Let's take a look at each.

  • Accuracy: Accuracy means your data correctly displays the real-world values it represents. It ensures that the information shown is precise and free from errors, reflecting the true state of the data. For example, if your data is accurate, it will have the following: Precise and Error-Free Information, Numerically Correct Values, Typo-Free Text Data, and Precise Dates.
  • Completeness: Complete data contains all the necessary information for the analysis. If your data is incomplete and key details are missing, this can skew results. Make sure to fill in gaps or account for missing values in your analysis. To address missing data, make sure to Input Values Based on Other Observations, Replace Missing Values with Placeholders, and Remove Incomplete Records.
  • Consistency: Consistency means data remains even across different datasets and periods. Clean data will contain the same formats and units of measurement throughout.
  • Uniformity: Uniformity or standardization means that all data will exhibit one coherent format and structure—it should either be of one data type or one category. This will include using the same units of measure or formatting dates and labeling them in categories.
  • Validity: Validity means the data values fall within the predetermined acceptable range and conform to the expected patterns. For example, if a valid age entry should be between 0 and 120 years, validation rules and checks will be applied to ensure that the data meets this criterion. This prevents outliers and erroneous entries from skewing the results.

How to Clean Data in Excel

Cleaning up data in Excel means refining raw data. Unlike data validation, which is a specific feature in Excel’s toolbar, data cleaning is a more general term that involves a broader range of tools and techniques. In the following section, we will cover each of the following ideas:

  • Basic Cleaning: Basic cleaning involves addressing common issues like extra spaces, blank cells, and spelling errors to ensure a clean and consistent dataset.
  • Error Handling and Validation: Error handling and validation focus on identifying and correcting errors and ensuring that your data maintains its integrity by removing duplicates.
  • Text Operations: Text operations involve manipulating and formatting text data to ensure it meets your requirements, including concatenation and case changes.
  • Data Transformation: Data transformation techniques are used to reorganize and reshape your data for better analysis, including parsing text and using tools like Flash Fill.
  • Numerical and Date Fixes: Numerical and date fixes involve correcting and standardizing numerical data and dates to ensure they are accurate and formatted consistently.
  • Advanced Data Management: Advanced data management includes reconciling and combining datasets to create a comprehensive and cohesive dataset for analysis.

Basic Data Cleaning in Excel

Let's take a look at basic cleaning, which involves addressing common issues like removing extra spaces.

Get rid of extra spaces

Trailing white spaces can be annoying or problematic because they can lead to inconsistencies in data analysis and formatting issues.

There are two major ways to get rid of extra spaces in Excel. Let’s review both.

Get rid of extra spaces using Find and Replace

Here are the steps to get rid of extra spaces.

  • Select the range of cells where you want to remove extra spaces.

  • Use the Ctrl + H shortcut to open the Find and Replace dialog box.

  • In the Find what field, press the spacebar twice to enter two spaces.

  • In the Replace with field, press the spacebar once to enter a single space.

  • Click on Replace All.

  • Repeat the process until no more double spaces are found.

Find and replace the dialog box. Source: Image by Author

Get rid of extra spaces using TRIM

  • Create a new column for the cleaned data.

  • Enter the formula =TRIM(cell_with_extra_spaces) in the first cell of the new column.

  • Double-click the bottom right corner of the cell to apply the formula to the rest of the rows.

  • Copy the cleaned data and paste it as values to remove the formulas.

Select and treat all blank cells

Blank cells can cause problems by breaking formulas, leading to errors in calculations and producing inaccurate analysis results. Here’s how to handle this issue.

  • Highlight the range of cells where you want to find and treat blank cells.

  • Press Ctrl + G to open the Go To dialog box.

  • Click on the “Special…” button. This opens the Go To Special dialog box.

  • Select the Blanks option and click OK. Now, you will see that all the blank cells in the chosen range are selected.

  • You can now enter a value or formula. Press Ctrl + Enter.

Select and treat blank cells. Source: Image by Author

Spell checking

Misspelled words can make data look unprofessional and difficult to understand, so it's important to correct them.

  • Check the entire worksheet or a specific range.

  • Go to the Review tab on the Ribbon.

  • Click the Spelling button in the Proofing group.

  • The Spelling dialog box will open, showing the first detected misspelled word and suggestions for correction.

  • Review and correct the misspelled words as needed.

Spell checking. Source: Image by Author

Error Handling and Validation in Excel

Let's take a look at error handling, which involves addressing common issues like highlighting errors or removing duplicates.

Highlight errors

Errors in your data can lead to incorrect results and poor decision-making, so it's crucial to identify and address them. Let’s see how to tackle this problem.

  • Select the range of cells you want to check for errors.

  • Go to the Home tab on the Ribbon.

  • In the Styles group, click on Conditional Formatting.

  • Select New Rule from the dropdown menu.

  • Choose Use a formula to determine which cells to format.

  • Enter the formula =ISERROR(cell) in the Format values where this formula is true field.

  • Click the Format… button to choose your desired formatting options.

  • Click OK after selecting your formatting options.

  • Click OK again in the New Formatting Rule dialog box to apply the rule.

Using conditional formattingUsing conditional formatting. Source: Image by Author

Remove duplicates

Duplicate entries can distort analysis and insights, so removing them ensures data accuracy.

  • Select the worksheet or range of cells from which you want to remove duplicates.

  • Go to the Data tab on the Ribbon.

  • Click on Remove Duplicates in the Data Tools group.

  • In the Remove Duplicates dialog box, specify which column to check for duplicates.

  • Click OK. Excel will display a message showing the number of duplicate values removed.

Remove duplicate rows

Duplicate rows can clutter your data and affect the reliability of your analysis. Here’s how to manage this.

  • Select the range from where you want to remove the duplicate rows.

  • Go to the Data tab on the Ribbon and click on Advanced in the Sort & Filter group.

  • In the Advanced Filter dialog box, select Copy to another location.

  • Set the List range to your selected range.

  • Set the Copy to field to the cell where you want the unique rows to be copied to.

  • Check the Unique Records Only box and then click OK.

Text Operations in Excel

Let's take a look at text operations like concatenating strings.

Concatenate

Combining multiple text strings into one cell can help create more meaningful and organized data by merging information from different sources into a single, cohesive format.

  • Select the cell where you want the concatenated result to appear.
  • Type =CONCATENATE().
  • Select the cells you want to concatenate, separated by commas or the & operator.
  • Press Enter to see the result.

Concatenating cells in ExcelConcatenating cells. Source: Image by Author

Changing the case of text

Uniform text casing can improve readability and maintain consistency in your data. Let’s see how to adjust this.

  • Create a new column for the converted text.

  • Enter the formula in a cell: =UPPER() or =LOWER() or =PROPER()

  • Press Enter to apply the formula.
  • Drag the fill handle to apply the formula to other cells if needed.

Changing the case of text in ExcelChanging the case of text. Source: Image by Author

Removing nonprinting characters from text

Nonprinting characters such as tabs, line breaks, and special characters, can cause problems by disrupting data processing, making it difficult to analyze or visualize data correctly.

  • Identify the nonprinting characters you need to remove.

  • Select the cell containing the text with nonprinting characters.

  • In a new cell, use the following function: =CLEAN(text).

  • To remove extra spaces and nonprinting characters, use: =TRIM(CLEAN(text)).

  • Copy and paste the cleaned text to replace the original.

Using the clean function in ExcelUsing the clean function in Excel. Source: Image by Author

Data Transformation in Excel

Data transformation techniques are used to reorganize and reshape your data for better analysis, including parsing text and using tools like Flash Fill.

Data parsing from text to column

Separating text data into individual columns makes it easier to analyze specific components and ensures that each piece of information is distinctly categorized. This is the opposite of concatenation and is useful for breaking down complex data into manageable parts. Here’s how to go about it.

  • Select the data you want to split.

  • Go to the Data tab and click Text to Columns.

  • Choose Delimited or Fixed Width and click Next.

  • For Delimited, check the delimiters your data uses and click Next.

  • For Fixed Width, set column breaks in the Data Preview window and click Next.

  • Choose the data format for each column.

  • Click Finish.

Using text to columns for data parsingUsing text to columns for data parsing. Source: Image by Author

Flash fill

Flash Fill automatically fills in values based on patterns it detects in your data. Here’s how you can use Flash Fill in Excel:

  • Enter data following a pattern in a cell next to your existing data.

  • Provide another example in the next cell to help Excel detect the pattern.

  • Select the cell with the example.

  • Go to the Data tab on the Ribbon.

  • Click on Flash Fill in the Data Tools group.

  • Excel will automatically fill the remaining cells based on the detected pattern.

Merging and splitting columns

Merging and splitting columns help organize data in a way that best suits your analysis needs. Follow these steps to merge cells:

  • Go to the Home tab, and click on the Merge & Center drop-down menu in the Alignment group.
  • Select your preferred merge option.

Follow these steps to split columns:

  • Go to the Data tab and click Text to Columns in the Data Tools group.
  • Choose delimited or fixed width based on what your data needs.
  • Select the split data's destination and click Finish.

Transforming and re-arranging columns and rows

Rearranging data can help present it in a more logical and accessible format. For transforming rows to rows and columns to columns:

  • Highlight the data you want to transform (including headers if necessary).

  • Right-click the selection and choose Copyor press Ctrl+C.

  • Select the cell where the transformed data begins.

  • Right-click the destination cell, choose Paste Special and then select Transpose.

  • You can also use Ctrl+Alt+V to open the Paste Special dialog box, then check the Transpose option and click OK.

To rearrange columns/rows:

  • Click the column header to select the entire column you want to move.

  • Right-click and choose Cutor press Ctrl+X.

  • Select the column where you want to move the cut column, right-click the column header, and choose Insert Cut Cells.

  • Similarly, for rows, select and cut the row you want to transfer to another row and paste it there.

Numerical and Date Fixes in Excel

Numerical and date fixes involve correcting and standardizing numerical data and dates to ensure they are accurate and formatted consistently.

Fixing numbers and number signs

Incorrect number formats can cause problems by leading to misinterpretations and calculation errors, as well as data sorting and comparison issues.

  • Select the cells containing the numbers you need to fix.
  • Go to the Home tab, click the Number drop-down menu in the Number group, and select the appropriate number format (e.g., General, Number, Currency).

Fixing dates and time

Properly formatted dates are crucial for accurate time-based analysis and reporting.

  • Highlight the cells containing the dates.

  • Go to the Home tab.

  • Click the Number Format drop-down menu and select Short Date or Long Date.

Advanced Data Management in Excel

Advanced data management includes reconciling and combining datasets to create a comprehensive and cohesive dataset for analysis.

Reconciling table data by joining or matching

Joining or matching data from different tables ensures comprehensive and cohesive analysis.

Using VLOOKUP to match data:

  • Ensure both tables are accessible on the same worksheet.

  • Choose the cell where you want to display the matched data.

  • Use the following function: =VLOOKUP(lookup_value, table_array, col_index_num, FALSE)

  • Drag the fill handle to copy the formula to other cells as needed.

Using INDEX and MATCH for more flexibility

  • Choose the cell where you want to display the matched data.

  • Use the combination of these functions: =INDEX(array, MATCH(lookup_value, lookup_array, 0))

  • Drag the fill handle to copy the formula to other cells as needed.

Final Thoughts

There are multiple data cleaning functions in Excel that allow you to clean and validate data to meet the set standards. These can help you reduce errors and improve the quality of your datasets. For a deeper dive into mastering Excel's capabilities, consider enrolling in our Introduction to Excel course.

Beyond this, whether you are scrubbing the data to remove duplicate fields or standardizing the formats of the data entries, Excel has all the tools you need to make this process easier. If you're looking to enhance your data preparation skills, our Data Preparation in Excel course offers comprehensive coverage on this topic.

To further enhance your analytical skills, you might find our Data Analysis in Excel course particularly useful. This course dives into the intricacies of analyzing data post-cleaning. Additionally, if you are interested in how clean data supports financial forecasting, our Financial Modeling in Excel course could be of great interest.

If you're looking to expand your data cleaning skills, consider Power Query, which isa flexible and effective tool built into Excel and Power BI that allows you to import and transform data seamlessly. Consider also exploring Data Cleaning in Pythonor Cleaning Data in R. These courses offer robust techniques and best practices for data cleaning using popular programming languages.


Photo of Laiba Siddiqui
Author
Laiba Siddiqui

I'm a content strategist who loves simplifying complex topics. I’ve helped companies like Splunk, Hackernoon, and Tiiny Host create engaging and informative content for their audiences.

Frequently Asked Questions

What are the five steps of the data cleaning process?

The five steps are removing duplicates, handling missing data, correcting structural errors, filtering outliers, and validating data.

What is the difference between data cleaning and data transformation?

Cleaning data involves identifying and correcting errors and inaccuracies within the dataset to ensure reliability. in data transformation, you convert data from one format or structure to another to make it suitable for analysis. So data cleaning ensures that datasets are accurate, whereas transformation converts cleaned data into the necessary format for analysis or reporting.

Topics

Learn Excel with DataCamp

course

Introduction to Excel

4 hours
53.4K
Master the Excel basics and learn to use this spreadsheet tool to conduct impactful analysis.
See DetailsRight Arrow
Start Course
See MoreRight Arrow
Related

tutorial

Data Types in Excel and Their Uses: A Complete Guide

Learn to identify and format all types of data in Excel, then explore valuable tips on converting between data types to make your spreadsheet more functional.
Laiba Siddiqui's photo

Laiba Siddiqui

9 min

tutorial

Getting Started with Spreadsheets

This tutorial will give you a basic understanding of the terminology in spreadsheets along with learning how to create a basic table.
Ryan Sheehy's photo

Ryan Sheehy

5 min

tutorial

Conditional Formatting in Excel: A Beginner’s Guide

Explore conditional formatting in Excel with simple to advanced examples and their best practices.
Joleen Bothma's photo

Joleen Bothma

7 min

tutorial

Data Cleaning Tutorial

Data cleaning is a very basic building block of data science. Learn the importance of data cleaning and how to use python and carry out the process.
DataCamp Team's photo

DataCamp Team

16 min

tutorial

A Beginner’s Guide to Data Cleaning in Python

Explore the principles of data cleaning in Python and discover the importance of preparing your data for analysis by addressing common issues such as missing values, outliers, duplicates, and inconsistencies.
Amberle McKee's photo

Amberle McKee

11 min

tutorial

What is Data Validation in Excel? A Comprehensive Guide to Add, Edit, and Remove Data Validation Rules in Excel

Applying data validation in Excel is simple: Open the 'Data' tab. Go to the 'Data Tools' group. Click on the 'Data Validation' button.
Elena Kosourova's photo

Elena Kosourova

12 min

See MoreSee More