BIBB
M
•

Power Query Text Cleaning Function

Oscar MartĂ­nez
Oscar MartĂ­nez Data, BI and AI | Operational Lead | Power BI & Azure Expert | Governance-Driven Strategy | Product Owner

Exploring Power Query Text Cleaning Function: Unlocking Efficient Text Manipulation in Power Query

As data analysts, we often spend numerous hours manually cleaning up messy text data. Throughout my career, I’ve refined a specific Power Query text cleaning function that streamlines the cleaning of text columns in your datasets. This function offers a robust solution for conducting multiple text cleaning operations with ease, reducing the necessity for manual character replacement, unwanted element removal, or individual trimming.

Introduction to the Text.Clean Function

The Text.CleanString function is designed to streamline text manipulation, allowing users to replace characters, swap words, eliminate specific characters, and even replace entire cell contents based on user-defined parameters. This function is versatile and can be adapted to various text cleaning needs in Power Query.

Power Query Text Cleaning Function UI

Power Query Text Cleaning Function UI

How the Function Works

The function operates through several steps, each tailored to handle different aspects of text cleaning:

  1. Character Replacement: The function allows for the replacement of individual characters based on a user-defined list. This is particularly useful for handling special characters, accents, or symbols that need standardization.

  2. Word Replacement: It also supports the replacement of whole words, which is particularly useful for correcting common typographical errors or standardizing terminology across your dataset.

  3. Character Removal: Unwanted characters can be removed entirely from the text. This includes punctuation, special symbols, or any characters that don’t belong in your cleaned dataset.

  4. Word Removal: Words that are not needed are removed from the text. This can help eliminate filler words, unnecessary terms, or specific words that need to be filtered out.

  5. Full Cell Content Replacement: For more comprehensive modifications, the function can replace the entire contents of a cell with new text. This is useful for standardizing entries that have multiple variations but should all represent the same value.

Text.CleanString Function Code

View Complete M Function Code

Using the Text.Clean Function

To utilize the Text.CleanString function, follow these instructions:

1. Copy the Function

Copy and rename the provided function code into a new query in Power Query. This code defines the function and its parameters, which include the text to be cleaned and optional parameters for characters to replace, words to replace, characters to remove, words to remove, and full-cell content replacements.

2. Apply the Function

Apply the function to your dataset’s text or column(s) that require cleaning. The function will process each row in the column according to the specified parameters, ensuring your data is cleaned consistently and accurately.

Example

After copying the code and renaming it “CleanString,” we added a new custom function column and filled in the parameters:

  • Replace characters: “ü” for “u”
  • Replace words: “Zuerich” for “Zurich” and “Uni” for “University”
  • Remove characters: ”,”
  • Remove Words: null
  • Replace full cell contents: “UZH” for “University of Zurich”

Example Usage of the Text Cleaning Function

Example Usage of the Text Cleaning Function

Conclusion

The Text.CleanString function is a powerful tool for any data professional looking to improve the quality of text data. With its flexible parameter settings and comprehensive cleaning capabilities, it significantly reduces the effort required to clean data. Try it out in your next data project and experience the efficiency it brings to your workflow!