Is it safe to let a user type a regex as a search input?
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
78
down vote
favorite
I was in a mall a few days ago and I searched for a shop on an indication panel.
Out of curiosity, I tried a search with (.+)
and was a bit surprised to get the list of all the shops in the mall.
I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).
Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)
denial-of-service regex
 |Â
show 12 more comments
up vote
78
down vote
favorite
I was in a mall a few days ago and I searched for a shop on an indication panel.
Out of curiosity, I tried a search with (.+)
and was a bit surprised to get the list of all the shops in the mall.
I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).
Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)
denial-of-service regex
24
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
– gowenfawr
2 days ago
79
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
– gowenfawr
2 days ago
11
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
– Daniel
2 days ago
21
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
– Bent
2 days ago
27
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
– jpa
2 days ago
 |Â
show 12 more comments
up vote
78
down vote
favorite
up vote
78
down vote
favorite
I was in a mall a few days ago and I searched for a shop on an indication panel.
Out of curiosity, I tried a search with (.+)
and was a bit surprised to get the list of all the shops in the mall.
I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).
Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)
denial-of-service regex
I was in a mall a few days ago and I searched for a shop on an indication panel.
Out of curiosity, I tried a search with (.+)
and was a bit surprised to get the list of all the shops in the mall.
I've read a bit about evil regexes but it seems that this kind of attack can only happen when the attacker has both control of the entry to search and the search input (the regex).
Can we consider the mall indication panel safe from DOS considering that the attacker only has control of the search input? (Leaving aside the possibility that a shop might be called some weird name like aaaaaaaaaaaa.)
denial-of-service regex
edited 2 days ago
asked 2 days ago


Xavier59
1,5182525
1,5182525
24
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
– gowenfawr
2 days ago
79
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
– gowenfawr
2 days ago
11
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
– Daniel
2 days ago
21
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
– Bent
2 days ago
27
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
– jpa
2 days ago
 |Â
show 12 more comments
24
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
– gowenfawr
2 days ago
79
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
– gowenfawr
2 days ago
11
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
– Daniel
2 days ago
21
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
– Bent
2 days ago
27
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
– jpa
2 days ago
24
24
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
– gowenfawr
2 days ago
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
– gowenfawr
2 days ago
79
79
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
– gowenfawr
2 days ago
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
– gowenfawr
2 days ago
11
11
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
– Daniel
2 days ago
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
– Daniel
2 days ago
21
21
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
– Bent
2 days ago
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
– Bent
2 days ago
27
27
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
– jpa
2 days ago
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
– jpa
2 days ago
 |Â
show 12 more comments
5 Answers
5
active
oldest
votes
up vote
69
down vote
accepted
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principle is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process.
Most regex libraries are mature and part of the standard library in many languages, which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution.
That is to say, it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent.
For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks.
RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
6
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
– Bob
2 days ago
6
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
– Boris the Spider
2 days ago
1
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
– JimmyJames
2 days ago
3
@Nat it relies on cooperative multitasking - i.e. it willcancel(true)
the task, which willinterrupt()
theThread
- if the task is interruptible then this may work, most likely it won't however.
– Boris the Spider
2 days ago
4
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
– Philipp
2 days ago
 |Â
show 2 more comments
up vote
13
down vote
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
13
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
– Boris the Spider
2 days ago
4
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
– AJ Henderson
2 days ago
1
People have crashed servers with regular expressions, and I personally know of one site that had hundreds of thousands of users getting crashed with that kind of a construct. Can't agree with such damage being minimal, as it took them some time to get it back online.
– eis
yesterday
@eis did they exploit the regex engine or was performance safe guards not properly configured and a series of run away regex took down the server trying to solve? I said the risk of exploitation of the engine is low. Slow running queries, even in a dos sense, is a performance concern as legitimate queries could also take down the server without proper performance safe guards.
– AJ Henderson
yesterday
@AJHenderson you're right in that it's the latter, not about exploiting the engine. However even without any exploit I think the end user impact might be something else than minimal, even if the regex won't modify any values.
– eis
yesterday
 |Â
show 1 more comment
up vote
7
down vote
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
add a comment |Â
up vote
7
down vote
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
add a comment |Â
up vote
2
down vote
No, ReDoS does not require the attacker to craft unnatural search results.
The basic idea of ReDoS is that you have a sub-expression that can match in multiple ways and matches almost everywhere in the searched string except the end, and you iterate that sub-expression to get catastrophic backtracking. So for example if your shop description is Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
, you can just use something like ([^q]|[^q][^q])+
(or more complex constructs with e.g. lookaheads).
Whether that's a problem depends - as other answers have explained, you can just limit the time available to the regex engine.
I would mention that there is regexp implementations that does not do backtracking - and those avoids this problem.
– Taemyr
9 hours ago
RE2 is already mentioned in another answer. It's not really an implementation though, it's a safe subset of the language - so you'd lose features compared to something like PCRE (arguably features that no one cares about in a product search form).
– Tgr
5 hours ago
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
69
down vote
accepted
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principle is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process.
Most regex libraries are mature and part of the standard library in many languages, which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution.
That is to say, it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent.
For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks.
RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
6
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
– Bob
2 days ago
6
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
– Boris the Spider
2 days ago
1
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
– JimmyJames
2 days ago
3
@Nat it relies on cooperative multitasking - i.e. it willcancel(true)
the task, which willinterrupt()
theThread
- if the task is interruptible then this may work, most likely it won't however.
– Boris the Spider
2 days ago
4
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
– Philipp
2 days ago
 |Â
show 2 more comments
up vote
69
down vote
accepted
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principle is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process.
Most regex libraries are mature and part of the standard library in many languages, which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution.
That is to say, it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent.
For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks.
RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
6
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
– Bob
2 days ago
6
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
– Boris the Spider
2 days ago
1
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
– JimmyJames
2 days ago
3
@Nat it relies on cooperative multitasking - i.e. it willcancel(true)
the task, which willinterrupt()
theThread
- if the task is interruptible then this may work, most likely it won't however.
– Boris the Spider
2 days ago
4
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
– Philipp
2 days ago
 |Â
show 2 more comments
up vote
69
down vote
accepted
up vote
69
down vote
accepted
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principle is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process.
Most regex libraries are mature and part of the standard library in many languages, which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution.
That is to say, it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent.
For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks.
RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
I would compare accepting user supplied regular expressions to parsing most sorts of structured user input, such as date strings or markdown, in terms of risk of code execution. Regular expressions are much more complex than date strings or markdown (although safely producing html from untrusted markdown has its own risks) and so represents more room for exploitation, but the basic principle is the same: exploitation involves finding unexpected side effects of the parsing/compilation/matching process.
Most regex libraries are mature and part of the standard library in many languages, which is a pretty good (but not certain) indicator that it's free of major issues leading to code execution.
That is to say, it does increase your attack surface, but it's not unreasonable to make the measured decision to accept that relatively minor risk.
Denial of service attacks are a little trickier. I think most regular expression libraries are designed with performance in mind but do not count mitigation of intentionally slow input among their core design goals. The appropriateness of accepting user supplied regular expressions from the DoS perspective is more library dependent.
For example, the .NET regex library accepts a timeout which could be used to mitigate DoS attacks.
RE2 guarantees execution in time linear to input size which may be acceptable if you know your search corpus falls within some reasonable size limit.
In situations where availability is absolutely critical or you're trying to minimize your attack surface as much as possible it makes sense to avoid accepting user regex, but I think it's a defensible practice.
edited yesterday


Jan Doggen
91921021
91921021
answered 2 days ago
Ryan Jenkins
66166
66166
6
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
– Bob
2 days ago
6
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
– Boris the Spider
2 days ago
1
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
– JimmyJames
2 days ago
3
@Nat it relies on cooperative multitasking - i.e. it willcancel(true)
the task, which willinterrupt()
theThread
- if the task is interruptible then this may work, most likely it won't however.
– Boris the Spider
2 days ago
4
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
– Philipp
2 days ago
 |Â
show 2 more comments
6
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
– Bob
2 days ago
6
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
– Boris the Spider
2 days ago
1
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
– JimmyJames
2 days ago
3
@Nat it relies on cooperative multitasking - i.e. it willcancel(true)
the task, which willinterrupt()
theThread
- if the task is interruptible then this may work, most likely it won't however.
– Boris the Spider
2 days ago
4
Here is an example of a regular expression which takes exponential execution times on Java:(0*)*A
– Philipp
2 days ago
6
6
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
– Bob
2 days ago
Yes, a timeout is the first thing that comes to mind for mitigating a DoS. Even ignoring library support, it's fairly trivial in most languages/frameworks to spin off the search to a background thread, and have a timeout against that thread.
– Bob
2 days ago
6
6
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
– Boris the Spider
2 days ago
@Bob that's trivial yes, but stopping the background task is not. For example in a language like Java there is no way to forcibly terminate a thread, so even if your timeout had expired you would not be able to do anything about it.
– Boris the Spider
2 days ago
1
1
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
– JimmyJames
2 days ago
Ages ago when I became aware of regex and moved beyond the basics to start getting fancy, I was able to create some really horrifically slow regex patterns. A lot of this depends on the regex engine but if you are working with one that supports backreferences, lookaheads/lookbehind and/or greedy quantifiers, it's not too hard to bog things down. Of course the length of the strings you are searching makes a big difference. Multi-line regex on large documents can really be a dog.
– JimmyJames
2 days ago
3
3
@Nat it relies on cooperative multitasking - i.e. it will
cancel(true)
the task, which will interrupt()
the Thread
- if the task is interruptible then this may work, most likely it won't however.– Boris the Spider
2 days ago
@Nat it relies on cooperative multitasking - i.e. it will
cancel(true)
the task, which will interrupt()
the Thread
- if the task is interruptible then this may work, most likely it won't however.– Boris the Spider
2 days ago
4
4
Here is an example of a regular expression which takes exponential execution times on Java:
(0*)*A
– Philipp
2 days ago
Here is an example of a regular expression which takes exponential execution times on Java:
(0*)*A
– Philipp
2 days ago
 |Â
show 2 more comments
up vote
13
down vote
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
13
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
– Boris the Spider
2 days ago
4
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
– AJ Henderson
2 days ago
1
People have crashed servers with regular expressions, and I personally know of one site that had hundreds of thousands of users getting crashed with that kind of a construct. Can't agree with such damage being minimal, as it took them some time to get it back online.
– eis
yesterday
@eis did they exploit the regex engine or was performance safe guards not properly configured and a series of run away regex took down the server trying to solve? I said the risk of exploitation of the engine is low. Slow running queries, even in a dos sense, is a performance concern as legitimate queries could also take down the server without proper performance safe guards.
– AJ Henderson
yesterday
@AJHenderson you're right in that it's the latter, not about exploiting the engine. However even without any exploit I think the end user impact might be something else than minimal, even if the regex won't modify any values.
– eis
yesterday
 |Â
show 1 more comment
up vote
13
down vote
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
13
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
– Boris the Spider
2 days ago
4
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
– AJ Henderson
2 days ago
1
People have crashed servers with regular expressions, and I personally know of one site that had hundreds of thousands of users getting crashed with that kind of a construct. Can't agree with such damage being minimal, as it took them some time to get it back online.
– eis
yesterday
@eis did they exploit the regex engine or was performance safe guards not properly configured and a series of run away regex took down the server trying to solve? I said the risk of exploitation of the engine is low. Slow running queries, even in a dos sense, is a performance concern as legitimate queries could also take down the server without proper performance safe guards.
– AJ Henderson
yesterday
@AJHenderson you're right in that it's the latter, not about exploiting the engine. However even without any exploit I think the end user impact might be something else than minimal, even if the regex won't modify any values.
– eis
yesterday
 |Â
show 1 more comment
up vote
13
down vote
up vote
13
down vote
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
The main threat in accepting regular expressions will be in your regex execution engine rather than accepting regex itself. I'd expect the threat to be very, very low in any well implemented engine. The engine shouldn't need access to any privileged system resources and should only need to run logic on input provided directly to the engine. This means that even if someone finds an exploit in the interpreter, the damage that can be done should be minimal.
Overall, all regex is designed to do is look for patterns within a value. As long as proper security is followed on the values you check against, there is no reason the engine itself should have any access to modify values. I'd classify it as generally pretty safe.
That said, I'd also only provide it in situations where it made reasonable sense to do so. Regex is complex, potentially time consuming to run, and used in the wrong places could have some undesirable impacts on an application outside of a security context, but in the right use case they are hugely powerful and immensely valuable. (I'm a software architect who refactors hundreds of thousands of lines of code regularly using regex.)
answered 2 days ago
AJ Henderson
39k554105
39k554105
13
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
– Boris the Spider
2 days ago
4
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
– AJ Henderson
2 days ago
1
People have crashed servers with regular expressions, and I personally know of one site that had hundreds of thousands of users getting crashed with that kind of a construct. Can't agree with such damage being minimal, as it took them some time to get it back online.
– eis
yesterday
@eis did they exploit the regex engine or was performance safe guards not properly configured and a series of run away regex took down the server trying to solve? I said the risk of exploitation of the engine is low. Slow running queries, even in a dos sense, is a performance concern as legitimate queries could also take down the server without proper performance safe guards.
– AJ Henderson
yesterday
@AJHenderson you're right in that it's the latter, not about exploiting the engine. However even without any exploit I think the end user impact might be something else than minimal, even if the regex won't modify any values.
– eis
yesterday
 |Â
show 1 more comment
13
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
– Boris the Spider
2 days ago
4
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
– AJ Henderson
2 days ago
1
People have crashed servers with regular expressions, and I personally know of one site that had hundreds of thousands of users getting crashed with that kind of a construct. Can't agree with such damage being minimal, as it took them some time to get it back online.
– eis
yesterday
@eis did they exploit the regex engine or was performance safe guards not properly configured and a series of run away regex took down the server trying to solve? I said the risk of exploitation of the engine is low. Slow running queries, even in a dos sense, is a performance concern as legitimate queries could also take down the server without proper performance safe guards.
– AJ Henderson
yesterday
@AJHenderson you're right in that it's the latter, not about exploiting the engine. However even without any exploit I think the end user impact might be something else than minimal, even if the regex won't modify any values.
– eis
yesterday
13
13
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
– Boris the Spider
2 days ago
This doesn't cover DoS attacks via, for example, catastrophic backtracking.
– Boris the Spider
2 days ago
4
4
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
– AJ Henderson
2 days ago
@boris I didn't consider that a security threat as expensive regex handling in a non interfering manner is necessary even in normal usage. People are going to make excessively complex regex statements without it being an attack plenty often. Rational timeouts is a necessary design decision for performance reasons, not just security. It would be a bit like saying a security risk of adding a complex report is people may DOS your site by running the report. That's a performance concern, not a security one.
– AJ Henderson
2 days ago
1
1
People have crashed servers with regular expressions, and I personally know of one site that had hundreds of thousands of users getting crashed with that kind of a construct. Can't agree with such damage being minimal, as it took them some time to get it back online.
– eis
yesterday
People have crashed servers with regular expressions, and I personally know of one site that had hundreds of thousands of users getting crashed with that kind of a construct. Can't agree with such damage being minimal, as it took them some time to get it back online.
– eis
yesterday
@eis did they exploit the regex engine or was performance safe guards not properly configured and a series of run away regex took down the server trying to solve? I said the risk of exploitation of the engine is low. Slow running queries, even in a dos sense, is a performance concern as legitimate queries could also take down the server without proper performance safe guards.
– AJ Henderson
yesterday
@eis did they exploit the regex engine or was performance safe guards not properly configured and a series of run away regex took down the server trying to solve? I said the risk of exploitation of the engine is low. Slow running queries, even in a dos sense, is a performance concern as legitimate queries could also take down the server without proper performance safe guards.
– AJ Henderson
yesterday
@AJHenderson you're right in that it's the latter, not about exploiting the engine. However even without any exploit I think the end user impact might be something else than minimal, even if the regex won't modify any values.
– eis
yesterday
@AJHenderson you're right in that it's the latter, not about exploiting the engine. However even without any exploit I think the end user impact might be something else than minimal, even if the regex won't modify any values.
– eis
yesterday
 |Â
show 1 more comment
up vote
7
down vote
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
add a comment |Â
up vote
7
down vote
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
add a comment |Â
up vote
7
down vote
up vote
7
down vote
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
As the other answers have pointed out, the attack vector would most possibly be the regex engine.
While you would assume that these engines are quite mature, robust and thoroughly tested, it did happen in the past:
CVE-2010-1792 Arbitrary Code Execution in Apple Safari and iOS.
Quote from the Patch notes:
A memory corruption issue exists in WebKit's handling
of regular expressions. Visiting a maliciously crafted website may
lead to an unexpected application termination or arbitrary code
execution.
But of course, the argument of a possibly flawed library holds for everything - even user-provided JPEG files.
The other aspect, albeit not inherently technical, would be the (.+)
case you mentioned: Should the product allow arbitrary data retrieval?
edited 2 days ago


Xavier59
1,5182525
1,5182525
answered 2 days ago


PhilLab
1713
1713
add a comment |Â
add a comment |Â
up vote
7
down vote
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
add a comment |Â
up vote
7
down vote
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
add a comment |Â
up vote
7
down vote
up vote
7
down vote
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
The problem is that regex engines "backtrack". When you have a reptition operation (e.g. + or * ) in your regex the regex engine will try to match it against as much of the input string as possible. If the match later fails then it will backtrack and try matching your repition against a smaller part of the input string.
Multiple repitition operations can lead to nested backtracking and this can lead to the time to evaluate the regex blowing up massively, especially if the repetition operators are nested.
https://www.regular-expressions.info/catastrophic.html
answered 2 days ago
Peter Green
3,77111421
3,77111421
add a comment |Â
add a comment |Â
up vote
2
down vote
No, ReDoS does not require the attacker to craft unnatural search results.
The basic idea of ReDoS is that you have a sub-expression that can match in multiple ways and matches almost everywhere in the searched string except the end, and you iterate that sub-expression to get catastrophic backtracking. So for example if your shop description is Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
, you can just use something like ([^q]|[^q][^q])+
(or more complex constructs with e.g. lookaheads).
Whether that's a problem depends - as other answers have explained, you can just limit the time available to the regex engine.
I would mention that there is regexp implementations that does not do backtracking - and those avoids this problem.
– Taemyr
9 hours ago
RE2 is already mentioned in another answer. It's not really an implementation though, it's a safe subset of the language - so you'd lose features compared to something like PCRE (arguably features that no one cares about in a product search form).
– Tgr
5 hours ago
add a comment |Â
up vote
2
down vote
No, ReDoS does not require the attacker to craft unnatural search results.
The basic idea of ReDoS is that you have a sub-expression that can match in multiple ways and matches almost everywhere in the searched string except the end, and you iterate that sub-expression to get catastrophic backtracking. So for example if your shop description is Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
, you can just use something like ([^q]|[^q][^q])+
(or more complex constructs with e.g. lookaheads).
Whether that's a problem depends - as other answers have explained, you can just limit the time available to the regex engine.
I would mention that there is regexp implementations that does not do backtracking - and those avoids this problem.
– Taemyr
9 hours ago
RE2 is already mentioned in another answer. It's not really an implementation though, it's a safe subset of the language - so you'd lose features compared to something like PCRE (arguably features that no one cares about in a product search form).
– Tgr
5 hours ago
add a comment |Â
up vote
2
down vote
up vote
2
down vote
No, ReDoS does not require the attacker to craft unnatural search results.
The basic idea of ReDoS is that you have a sub-expression that can match in multiple ways and matches almost everywhere in the searched string except the end, and you iterate that sub-expression to get catastrophic backtracking. So for example if your shop description is Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
, you can just use something like ([^q]|[^q][^q])+
(or more complex constructs with e.g. lookaheads).
Whether that's a problem depends - as other answers have explained, you can just limit the time available to the regex engine.
No, ReDoS does not require the attacker to craft unnatural search results.
The basic idea of ReDoS is that you have a sub-expression that can match in multiple ways and matches almost everywhere in the searched string except the end, and you iterate that sub-expression to get catastrophic backtracking. So for example if your shop description is Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
, you can just use something like ([^q]|[^q][^q])+
(or more complex constructs with e.g. lookaheads).
Whether that's a problem depends - as other answers have explained, you can just limit the time available to the regex engine.
answered 10 hours ago
Tgr
519210
519210
I would mention that there is regexp implementations that does not do backtracking - and those avoids this problem.
– Taemyr
9 hours ago
RE2 is already mentioned in another answer. It's not really an implementation though, it's a safe subset of the language - so you'd lose features compared to something like PCRE (arguably features that no one cares about in a product search form).
– Tgr
5 hours ago
add a comment |Â
I would mention that there is regexp implementations that does not do backtracking - and those avoids this problem.
– Taemyr
9 hours ago
RE2 is already mentioned in another answer. It's not really an implementation though, it's a safe subset of the language - so you'd lose features compared to something like PCRE (arguably features that no one cares about in a product search form).
– Tgr
5 hours ago
I would mention that there is regexp implementations that does not do backtracking - and those avoids this problem.
– Taemyr
9 hours ago
I would mention that there is regexp implementations that does not do backtracking - and those avoids this problem.
– Taemyr
9 hours ago
RE2 is already mentioned in another answer. It's not really an implementation though, it's a safe subset of the language - so you'd lose features compared to something like PCRE (arguably features that no one cares about in a product search form).
– Tgr
5 hours ago
RE2 is already mentioned in another answer. It's not really an implementation though, it's a safe subset of the language - so you'd lose features compared to something like PCRE (arguably features that no one cares about in a product search form).
– Tgr
5 hours ago
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsecurity.stackexchange.com%2fquestions%2f191017%2fis-it-safe-to-let-a-user-type-a-regex-as-a-search-input%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
24
If the user can enter a regex, and there's an interpreted language in use, I wouldn't be worried about DOS; I'd be worried about code injection.
– gowenfawr
2 days ago
79
I would not expect a mall map to be designed for sophisticated users that might use regexes. Therefore, if regexes work, it suggests the application is sort of blindly passing the input string in. That's usually a place to try various forms of code and SQL injection. It's that little voice saying "I bet they didn't do that by design..." that makes the antenna perk up. This is a Comment, not an Answer, because (for me) there's not enough info here to say anything more accurate than that.
– gowenfawr
2 days ago
11
Despite the security concerns I would love to perform RegEx filtration in indication panels of huge shopping malls!
– Daniel
2 days ago
21
Did you test any regex that should get matches to determine it actually used regex? If I were to design a mall search I would list all shops if the search result was empty. Either the user is trying to have fun (like you) and the result would not matter or the user isn't good at using the search functionality and they should see something that might be of use to them.
– Bent
2 days ago
27
It is also possible that the search field just ignores any punctuation, and is programmed to return all shops for an essentially empty query.
– jpa
2 days ago