You are viewing an historical archive of past issues. Please report new issues to the appropriate project issue tracker on GitHub.

Home » Issues » Feature request #2130

Feature request #2130: Add "safety valve" to recursive matching algorithm

Kind	feature request
Product	Command-T
When	Created 2013-12-14T14:59:08Z, updated 2013-12-16T15:25:54Z
Status	open
Reporter	Greg Hurrell
Tags	no tags

Description

On a really large project (eg. Chrome) matching against half a million files can be very slow.

It might be worth adding a "safety value" that short-circuits recursion after some threshold is crossed. This would reduce the quality of the scores, but it would help speed considerably.

Perhaps should be configurable.

Comments

Greg Hurrell 2013-12-16T15:18:34Z
Just did a quick test of the possible gains here:

1.6 release (ie. standard recursion)
```
                        user     system      total        real
pathological        0.010000   0.000000   0.010000 (  0.005239)
command-t           0.430000   0.000000   0.430000 (  0.433271)
chromium (subset)   1.440000   0.010000   1.450000 (  0.576339)
chromium (whole)    4.990000   0.020000   5.010000 (  1.943098)
```
With recursion disabled
```
                        user     system      total        real
pathological        0.000000   0.000000   0.000000 (  0.000615)
command-t           0.340000   0.010000   0.350000 (  0.336757)
chromium (subset)   0.470000   0.010000   0.480000 (  0.292511)
chromium (whole)    1.610000   0.010000   1.620000 (  0.961244)
```
Analysis

Oddly, the numbers with recursion disabled were wildly variable, suggesting perhaps some additional, unpredictable cache pressure when running without recursion.

Obviously, the pathological case is much, much faster in this way. I'm surprised that the "chromium (whole)" case didn't see a bigger benefit, but perhaps I shouldn't be so surprised; the prevalence of matches that can be scored in multiple ways is probably pretty low (and the cases that are common, like matching against "e", are probably pretty cheap to recurse on).

So, based on these numbers, I am not sure whether this angle is worth pursuing, although I might try a simple safety valve threshold of, say, 3, and run the numbers again to see if I can get scores of comparable quality at a reduced efficiency cost.

Greg Hurrell 2013-12-16T15:25:54Z

Numbers with a depth limit of 3 (ie. recurse 3 times):
                        user     system      total        real
pathological        0.000000   0.000000   0.000000 (  0.000906)
command-t           0.440000   0.000000   0.440000 (  0.439680)
chromium (subset)   1.080000   0.020000   1.100000 (  0.465615)
chromium (whole)    3.820000   0.010000   3.830000 (  1.590747)
Depth limit of 2 (ie. recurse 2 times):
                        user     system      total        real
pathological        0.000000   0.000000   0.000000 (  0.000687)
command-t           0.420000   0.000000   0.420000 (  0.428034)
chromium (subset)   0.860000   0.020000   0.880000 (  0.404738)
chromium (whole)    3.110000   0.010000   3.120000 (  1.396657)
Depth limit of 1 (ie. recurse once only):
                        user     system      total        real
pathological        0.000000   0.000000   0.000000 (  0.000465)
command-t           0.390000   0.000000   0.390000 (  0.389400)
chromium (subset)   0.670000   0.020000   0.690000 (  0.366292)
chromium (whole)    2.320000   0.010000   2.330000 (  1.141482)
Depth limit of 0 (no recursion):
                        user     system      total        real
pathological        0.000000   0.000000   0.000000 (  0.000466)
command-t           0.330000   0.000000   0.330000 (  0.329658)
chromium (subset)   0.480000   0.020000   0.500000 (  0.289310)
chromium (whole)    1.620000   0.010000   1.630000 (  0.958717)
For the first three, the specs pass; with recursion disabled, we have one failing spec. Alas, I don't think I really have enough spec coverage around the ranking to get a truly realistic sense of how this is affecting match quality (and the benchmarks, too, might be a little sparse in terms of exposing the cases where these changes would really make a difference).

I'm going to stash this change on a branch of now and leave it sitting there.

Add a comment

Comments are now closed for this issue.

Feature request #2130: Add "safety valve" to recursive matching algorithm

Description

Comments

1.6 release (ie. standard recursion)

With recursion disabled

Analysis

Add a comment

Menu