That said, the ARRAY method is massively faster than splitting to a table. Keep in mind the tried and true method of string replacement outside of needing to anchor the string is still the fastest and most efficient native method however clunky it may be, length(str) - replace(str, replacement) Next I found the following as being the fastest native method, length(str) - regexp_replace(str, replacement, g) ![]() Under these constraints, I found that the procedural method with plperl was the fastest. This method turned out to the be the fastest overall, CREATE LANGUAGE plperlĬREATE FUNCTION count_occurances(inputStr text, regex text)įollowing the same format of data, performance implications can be obtained with CREATE TABLE fooĪRRAY(SELECT trunc(random()*100+1)::int % 100 FROM generate_series(1,5000) AS t(x)), SELECT count(*)ĬROSS JOIN LATERAL regexp_matches(data, '\m2\M', 'g') Here we get away from splitting and instead use the \m, and \M anchors for word-boundaries. SELECT id, xĬROSS JOIN LATERAL regexp_split_to_table(data, ' ')Īnd, from there you can run regular SQL on it. In this method you're using GROUP BY and count(). Using a TABLE Splitting into a table with regexp_split_to_table CROSS JOIN LATERAL string_to_array(data, '2') AS t(x) Īlternatively, we can use string_to_array to separate something that's space-delimited and then count the matches, SELECT id, array_length(array_positions(x, '2'), 1)ĬROSS JOIN LATERAL string_to_array(data, ' ') AS t(x) SELECT array_length(x, 1) - 1ĬROSS JOIN LATERAL regexp_split_to_array(data, '\m2\M') AS t(x) Here we have to subtract one match splits a string into two fragments, and thus the occurrences is one less than the fragment counts: this xyx split on y, produces and we want the length to be 1 corresponding to the occurrences of y. SELECT length(data) - length(regexp_replace(data, '\m42\M', '', 'g')) That reduces to a no-op, but if we we're search for something that was longer than one character, it'd be required. This is why we explicitly include / length('2'). SELECT length(data) - length(regexp_replace(data, '\m2\M', '', 'g'))īecause we're not splitting on simple spaces (' ') though we could with more complexity, we may also want to accommodate sub-strings of different lengths like in this question. We can remedy that by using regexp_replace to anchor the substring. As an example, the above replaces the 2 in 329. This method isn't applicable here because without an anchor, we can't be sure if we're replacing the substring inside of something-space-delimited. Most RDBMS's provide some method to calculate substring occurrences like this, SELECT length(data) - length(replace(data, '2', '')) Syntax: substring (string from for ) Parameters PostgreSQL Version: 9.![]() Possible solutions String Using length and regexp_replace The PostgreSQL substring function is used to extract a string containing a specific number of characters from a particular position of a given string. SLOW Something that converts from a string to a table.Something that converts from a string to an array.FASTEST was the pl/perl method which I placed last on this list because it requires pl/perl, and is likely not needed for most workloads.įAST A string function, such as one the pattern explained below length(str) - length(*replace(str, replaceStr)).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |