Does Hive have a String split function?
HadoopHiveHadoop Problem Overview
I am looking for a in-built String split function in Hive? e.g. if String is:
A|B|C|D|E
Then I want to have a function like:
array<string> split(string input, char delimiter)
So that I get back:
[A,B,C,D,E]
Does such a in-built split function exist in Hive.
I can only see regexp_extract
and regexp_replace
. I would love to see a indexOf()
and split()
string functions.
Hadoop Solutions
Solution 1 - Hadoop
There does exist a split function based on regular expressions. It's not listed in the tutorial, but it is listed on the language manual on the wiki:
split(string str, string pat)
Split str around pat (pat is a regular expression)
In your case, the delimiter "|
" has a special meaning as a regular expression, so it should be referred to as "\\|
".
Solution 2 - Hadoop
Another interesting usecase for split in Hive is when, for example, a column ipname
in the table has a value "abc11.def.ghft.com" and you want to pull "abc11" out:
SELECT split(ipname,'[\.]')[0] FROM tablename;
Solution 3 - Hadoop
Just a clarification on the answer given by Bkkbrad.
I tried this suggestion and it did not work for me.
For example,
split('aa|bb','\\|')
produced:
["","a","a","|","b","b",""]
But,
split('aa|bb','[|]')
produced the desired result:
["aa","bb"]
Including the metacharacter '|' inside the square brackets causes it to be interpreted literally, as intended, rather than as a metacharacter.
For elaboration of this behaviour of regexp, see: http://www.regular-expressions.info/charclass.html