You have an excel document with a column filled with addresses in it but you are only interested in the postcodes and are not interested in the rest of the address. You could extract the postcodes with some clever use of the built in functions. For example you could use the cell formula:
=MID(A1,FIND(“zzz”,SUBSTITUTE(A1,” “,”zzz”,SUMPRODUCT(1*((MID(A1,ROW(INDIRECT(“1:”&LEN(A1))),1))=” “))-1))+1,LEN(A1))
While this works perfectly well, it is not easily readable and it has limited reuse value….
A better way to solve this problem is to use regular expressions. Regular expressions are implemented in most programming languages and enable you to define a pattern of text. You can use the patterns to search for matches, carry out replacements or extract the matches from the source text.
Regular expressions are included in VBA but are not switched on by default. Follow the steps below to create an example of using regular expressions.
- Create a blank workbook in Excel.
- Open up the VBA editor by clicking ALT + F11 in excel.
- Select Tools -> References from the menu
- Check Microsoft VBScript Regular Expressions 5.5
- Add a module to your VBA project
- Insert some sample functions that utilise regular expressions from GitHub into the module.
You can now use the functions GetFirstPostcode, GetFirstEmailAddress, GetAllPostcodes and GetAllEmailAddresses within cell formulas to extract structured text from source text. These are demonstrated in the sample spreadsheet.
Within the code the two most important functions are:
- GetFirstMatch – uses the regular expression object to get the value of the first match in the source text which it returns to the caller.
- GetAllMatches – uses the regular expression object to get all the matches and concatenates them into a new line delimited string. This can then be directly returned to the caller.
These two functions are called by the functions GetFirstPostcode, GetFirstEmailAddress, GetAllPostcodes and GetAllEmailAddresses which define the regular expressions to use in the searches. There is much documentation already available on regular expressions so anyone not familiar with them should start at www.regular-expressions.info These functions are called themselves within the cell formulas on Sheet1.
Using regular expressions in Excel requires some work but it does make your formulas much more readable. Also, once you have mastered the technique you can use it for any text extraction and you can benefit from the many libraries of regular expressions that are available online.
Some key points to consider when deciding whether to use them:
- time to learn
- they are more suitable to complicated text extraction – there are many excel string functions such as mid which may be sufficient for very basic patterns
- any excel spreadsheet that uses these functions will need to have macros enabled