≡

wincent.dev

  • Products
  • Blog
  • Wiki
  • Issues
You are viewing an historical archive of past issues. Please report new issues to the appropriate project issue tracker on GitHub.
Home » Issues » Feature request #2130

Feature request #2130: Add "safety valve" to recursive matching algorithm

Kind feature request
Product Command-T
When Created 2013-12-14T14:59:08Z, updated 2013-12-16T15:25:54Z
Status open
Reporter Greg Hurrell
Tags no tags

Description

On a really large project (eg. Chrome) matching against half a million files can be very slow.

It might be worth adding a "safety value" that short-circuits recursion after some threshold is crossed. This would reduce the quality of the scores, but it would help speed considerably.

Perhaps should be configurable.

Comments

  1. Greg Hurrell 2013-12-16T15:18:34Z

    Just did a quick test of the possible gains here:

    1.6 release (ie. standard recursion)

                            user     system      total        real
    pathological        0.010000   0.000000   0.010000 (  0.005239)
    command-t           0.430000   0.000000   0.430000 (  0.433271)
    chromium (subset)   1.440000   0.010000   1.450000 (  0.576339)
    chromium (whole)    4.990000   0.020000   5.010000 (  1.943098)

    With recursion disabled

                            user     system      total        real
    pathological        0.000000   0.000000   0.000000 (  0.000615)
    command-t           0.340000   0.010000   0.350000 (  0.336757)
    chromium (subset)   0.470000   0.010000   0.480000 (  0.292511)
    chromium (whole)    1.610000   0.010000   1.620000 (  0.961244)

    Analysis

    Oddly, the numbers with recursion disabled were wildly variable, suggesting perhaps some additional, unpredictable cache pressure when running without recursion.

    Obviously, the pathological case is much, much faster in this way. I'm surprised that the "chromium (whole)" case didn't see a bigger benefit, but perhaps I shouldn't be so surprised; the prevalence of matches that can be scored in multiple ways is probably pretty low (and the cases that are common, like matching against "e", are probably pretty cheap to recurse on).

    So, based on these numbers, I am not sure whether this angle is worth pursuing, although I might try a simple safety valve threshold of, say, 3, and run the numbers again to see if I can get scores of comparable quality at a reduced efficiency cost.

  2. Greg Hurrell 2013-12-16T15:25:54Z

    Numbers with a depth limit of 3 (ie. recurse 3 times):

                            user     system      total        real
    pathological        0.000000   0.000000   0.000000 (  0.000906)
    command-t           0.440000   0.000000   0.440000 (  0.439680)
    chromium (subset)   1.080000   0.020000   1.100000 (  0.465615)
    chromium (whole)    3.820000   0.010000   3.830000 (  1.590747)

    Depth limit of 2 (ie. recurse 2 times):

                            user     system      total        real
    pathological        0.000000   0.000000   0.000000 (  0.000687)
    command-t           0.420000   0.000000   0.420000 (  0.428034)
    chromium (subset)   0.860000   0.020000   0.880000 (  0.404738)
    chromium (whole)    3.110000   0.010000   3.120000 (  1.396657)

    Depth limit of 1 (ie. recurse once only):

                            user     system      total        real
    pathological        0.000000   0.000000   0.000000 (  0.000465)
    command-t           0.390000   0.000000   0.390000 (  0.389400)
    chromium (subset)   0.670000   0.020000   0.690000 (  0.366292)
    chromium (whole)    2.320000   0.010000   2.330000 (  1.141482)

    Depth limit of 0 (no recursion):

                            user     system      total        real
    pathological        0.000000   0.000000   0.000000 (  0.000466)
    command-t           0.330000   0.000000   0.330000 (  0.329658)
    chromium (subset)   0.480000   0.020000   0.500000 (  0.289310)
    chromium (whole)    1.620000   0.010000   1.630000 (  0.958717)

    For the first three, the specs pass; with recursion disabled, we have one failing spec. Alas, I don't think I really have enough spec coverage around the ranking to get a truly realistic sense of how this is affecting match quality (and the benchmarks, too, might be a little sparse in terms of exposing the cases where these changes would really make a difference).

    I'm going to stash this change on a branch of now and leave it sitting there.

Add a comment

Comments are now closed for this issue.

  • contact
  • legal

Menu

  • Blog
  • Wiki
  • Issues
  • Snippets