You are viewing an historical archive of past issues. Please report new issues to the appropriate project issue tracker on GitHub.

Home » Issues » Bug #1602

Bug #1602: Degraded performance on large trees

Kind	bug
Product	Command-T
When	2010-07-11T06:38:05Z
Status	open
Reporter	Greg Hurrell
Tags	no tags

Description

Ticket #1598 describes the new recursive "best score" algorithm that was recently implemented, but it is a naïve implementation with no explicit optimization attempted.

This ticket is for recording ideas for improving the performance under the new algorithm:

Idea 1: do a full "pre-match" before trying recursion

Complete a full pass to confirm we definitely have a match before doing another recursive pass. In this way we don't waste time recursing through possibilities that will never end up being a match.

Note that this "pre-match" pass can be ultra-quick (no scoring, just a blind left-to-right match, and can probably skip most of the dot-file logic too).

So it is spending more time up-front doing work that might end up saving us a lot more work otherwise. It remains to be seen whether the overall effect will be a net gain.

For pathological cases it is easy to see that it will bring a benefit; eg. given search string "ax" and path aaaaaaaaaaaaaaaaaaaaax, we end up matching at every single position because we recurse to see if we can find a better score (and incidentally, we don't). If we do the same with a path like aaaaaaaaaaaaaaaaaaaaay we will never find the desired x and will have done the same amount of futile recursion, for nothing.

Depending on the number and composition of paths in a directory, and the nature of the search string, this kind of futile recursion could either be very frequent or not common at all.

Idea 2: do depth-first rather than breadth-first recursion

This is a variation of idea 1, with the idea being to avoid wasting time searching the "match space" of paths which will never actually end up being complete matches.

In this variant, instead of recursing in order to bump our search one char to the right and seek a better match, we recurse in order to try the next character in the search string.

So, given a search string like "abcd", we:

scan until we find an "a"

* recurse to find "b"
 * recurse to find "c"
  * recurse to find "d": at this point we have a guaranteed match, so we return
 * recursion returned a match, bump one char right and see if we can find a better one
  * recurse looking for better match
 * if recursion returned a better match, record it, keep bumping right until we run out of room
* recursion returned its best match, bump right and check for better scores etc

recursion returned its best match, bump right and check for better scores etc

So given a really long search string we are still going to potentially recurse to a fairly low depth (20 character search string means up to 20 levels of recursion), and this is not going to be as fast as an inline scan for a match, but it will saved us from doing lots of futile recursion on strings that will never match.

Idea 3: use a heuristic to determine when to give up early

I'm doubtful that this one will work, to be honest, but I'll mention it anyway. Given the example search string of "ax" and path aaaaaaaaaaaaaaaaax, it is fairly easy to see that the first match for a is going to be the best one that we'll ever get and trying the others is pointless.

It actually wouldn't be too hard to do an inline check for this kind of thing; ie: find a, and then keep scanning anyway to see if there is a later a with a higher score.

The problem with this approach is that we can only really "look ahead" one character at a time and that won't help us to discern between scorings like this:

a..............articles
^               ^^^^^^^
a..............articles
               ^^^^^^^^

Here if we search for "articles", we'll see the first a and when we look for a better one we'll find that the second a has a lower score. If we decide not to pursue that alternative path because that one character has a lower score, then we prematurely discard the possibility of matching a high-scoring run of "articles", which may actually give us a better score.

Comments

Add a comment

Comments are now closed for this issue.

Bug #1602: Degraded performance on large trees

Description

Idea 1: do a full "pre-match" before trying recursion

Idea 2: do depth-first rather than breadth-first recursion

Idea 3: use a heuristic to determine when to give up early

Comments

Add a comment

Menu