Friday, May 8, 2009

Why Won't You Sort (By Number)?

‹prev | My Chain | next›

As mentioned yesterday, there are intermittent problems with one of the recipe search scenarios. Specifically, the scenario for "Sorting (name, date, preparation time, number of ingredients)" fails. But only sometimes. And only when when other scenarios are run at the same time.

When run by itself, the scenario always passes:
cstrom@jaynestown:~/repos/eee-code$ cucumber -n features/recipe_search.feature \
> -s "Sorting (name, date, preparation time, number of ingredients)"
Feature: Search for recipes

So that I can find one recipe among many
As a web user
I want to be able search recipes

Scenario: Sorting (name, date, preparation time, number of ingredients)
Given 50 "delicious" recipes with ascending names, dates, preparation times, and number of ingredients
And a 0.5 second wait to allow the search index to be updated
When I search for "delicious"
Then I should see 20 results
When I click the "Name" column header
Then the results should be ordered by name in ascending order
When I click the "Name" column header
Then the results should be ordered by name in descending order
When I click the next page
Then I should see page 2
And the results should be ordered by name in descending order
When I click the "Date" column header
Then I should see page 1
And the results should be ordered by date in descending order
When I click the next page
Then I should see page 2
When I click the "Date" column header
Then the results should be ordered by date in ascending order
And I should see page 1
When I click the "Prep" column header
Then the results should be ordered by preparation time in ascending order
When I click the "Ingredients" column header
Then the results should be ordered by the number of ingredients in ascending order

1 scenario
23 passed steps
When the entire recipe_search.feature feature is run (and only sometimes), the preparation time steps:
    When I click the "Prep" column header
Then the results should be ordered by preparation time in ascending order
expected following output to contain a <tr:nth-child(3) .prep>2</tr:nth-child(3) .prep> tag:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<table>
<tr>
<th>
<a href="/recipes/search?q=delicious&sort=sort_title" id="sort-by-name">Name</a>
</th>
<th>
<a href="/recipes/search?q=delicious&sort=sort_date&order=desc" id="sort-by-date">Date</a>
</th>
<th>
<a href="/recipes/search?q=delicious&sort=sort_prep&order=desc" id="sort-by-prep">Prep</a>
</th>
<th>
<a href="/recipes/search?q=delicious&sort=sort_ingredient" id="sort-by-ingredients">Ingredients</a>
</th>
</tr>
<tr class="row0">
<td>
<a href="/recipes/id-1-delicious">delicious recipe 1</a>
</td>
<td>
<span class="date">2008-04-29</span>
</td>
<td>
<span class="prep">1</span>
</td>
<td>
<span class="ingredients">ingredient 1, ingredient 2, ingredient 3, ingredient 4, ingredient 5, ingredient 6, ingredient 7, ingredient 8, ingredient 9, ingredient 10, ingredient 11, ingredient 12, ingredient 13, ingredient 14, ingredient 15, ingredient 16, ingredient 17, ingredient 18, ingredient 19, ingredient 20, ingredient 21, ingredient 22, ingredient 23, ingredient 24, ingredient 25, ingredient 26, ingredient 27, ingredient 28, ingredient 29, ingredient 30, ingredient 31, ingredient 32, ingredient 33, ingredient 34, ingredient 35, ingredient 36, ingredient 37, ingredient 38, ingredient 39, ingredient 40, ingredient 41, ingredient 42, ingredient 43, ingredient 44, ingredient 45, ingredient 46, ingredient 47, ingredient 48, ingredient 49, ingredient 50</span>
</td>
</tr>
<tr class="row1">
<td>
<a href="/recipes/id-10-delicious">delicious recipe 10</a>
</td>
<td>
<span class="date">2008-05-08</span>
</td>
<td>
<span class="prep">10</span>
</td>
For some reason, couchdb-lucene, or more specifically, Lucene itself, is interpreting the preparation time as a string, but only sometimes, instead of an integer.

As a sanity check, I go to the Lucene documentation, to find that the auto sort type:
Guess type of sort based on field contents. A regular expression is used to look at the first term indexed for the field and determine if it represents an integer number, a floating point number, or just arbitrary string characters.

For some reason, this RegExp heuristic is not working. Sometimes.

I am not qualified to work through the couchdb-lucene / Lucene code in sufficient detail to get to the root of this problem any time soon. Rather than hope for getting it to work by accident, I resolve to eliminate it as a variable. Rather than storing prep time as an integer (2, 10, 120), I will store it as a string (00002, 00010, 00120)—string comparisons will always work ("00002" < "00120").

To do zero-padding in javascript, something like this will do the trick:
  function zero_pad(i, number_of_zeroes) {
var ret = i + "";
while (ret.length < number_of_zeroes) {
ret = "0" + ret;
}
return ret;
}
To index with zero-padding, I use zero_pad thusly:
  ret.field('sort_prep',  zero_pad(doc['prep_time'], 5), 'yes', 'not_analyzed');
Oddly enough, the sort_ingredient never fails as did the sort_prep step. Even so, I am not going to rely on coincidence here, so I use zero_pad for sort_ingredient as well:
  ret.field('sort_ingredient', zero_pad(ingredient_count, 5), 'yes', 'not_analyzed');
With that, I have all scenarios passing. Reliably.
(commit)

No comments:

Post a Comment