You are viewing an historical archive of past issues. Please report new issues to the appropriate project issue tracker on GitHub.

Home » Issues » Bug #2094

Bug #2094: Matching algorithm is inefficient

Kind	bug
Product	Command-T
When	Created 2013-06-19T09:03:03Z, updated 2013-12-25T18:43:27Z
Status	closed
Reporter	dzhioev
Tags	no tags

Description

Hi. I've recently tried Command-T plugin and it looks good. But I'm working on huge project (Chrome), and Command-T worked very slow on it. So I've changed 'wildignore' settings, and it started works good in many cases. But often it was extremely slow whatever. So I've looked little in C-sources, and found out that matching algorithm very inefficient for some cases. The reason is that matching considers all possible matches, and that number can be VERY large even for relatively short filenames. For example matching filename "aaaa..aaaa"(40 'a's) against pattern "aaaa...aaaa"(20 'a's) gives us 137846528820 variants and it's unacceptable.

Comments

anonymous 2013-06-19T09:52:33Z

Actually, it looks like I found algorithm with complexity O(s^2*p), where s and p are lengths of string and pattern respectively. Is it OK if I'll implement it?
Greg Hurrell 2013-06-20T03:52:48Z

Early versions of Command-T had a linear matching/scoring algorithm which was much faster.

The current matching algorithm provides much more intuitive ordering of the search results, but it approaches quadratic runtime in the worst case.

As it is written in C, even this quadratic runtime is fast enough for small-ish projects (of the order of 10k or 20k files). I'd love to find a faster algorithm that still provides scoring as good as or close to the quadratic one, or failing that, have some kind of self-tuning behavior which switches to a cheaper strategy as the search space crosses some threshold.
dzhioev 2013-06-20T12:11:23Z

The problem is that current algorithm is not quadratic at all. You can ensure on my example: 1) Create empty dir. 2) Create file "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa" in it. 3) Run vim in this dir and start Command-T. 4) Type several "a"-s in search string. With every 'a' search will greatly slow down. In fact you can't wait till it ends after ~10 'a'-s. In this case complexity of algorithm will be O(s!/(p! * (s - p)!)) (it's combinations number http://en.wikipedia.org/wiki/Combination ). It happens because algorithm memorize nothing during it's work.
dzhioev 2013-06-20T12:13:23Z

BTW, Is there any way to subscribe for updates to this thread?
Greg Hurrell 2013-09-15T07:33:03Z

I've got a patch which adds memoization; will push once I've done some more testing. It leads to a noticeable improvement using the "pathological" test case that you describe above.

Thanks for bringing this issue up. It had totally escaped me that the way the recursion was implemented was leading to unnecessary re-work in these cases.

(To answer you question about subscribing, there is an Atom feed; no email notifications yet, though.)
Greg Hurrell 2013-12-25T18:40:47Z

I'm going to close this ticket now as the memoization that's been merged seems to be working fairly well (since then, have also added threading to map-reduce the problem). I did some testing to see how close the recursive algorithm is to a non-recursive algorithm, or one in which the recursion depth is limited, and there's not much of a delta any more (see ticket #2130).

Thanks for bringing the issue to my attention. Things are much better now. Still interested in making them even better, though, but will have to get creative for that.
Greg Hurrell 2013-12-25T18:40:49Z
Status changed:
- From: new
- To: closed
Greg Hurrell 2013-12-25T18:43:27Z

See also feature request #2113, which is a placeholder for the possibility of memoizing more things.

Add a comment

Comments are now closed for this issue.

Bug #2094: Matching algorithm is inefficient

Description

Comments

Add a comment

Menu