Engineering: How to Tell a Better Story with Path Search

We always knew we wanted to build Path Search, but needed the timing to be just right. As our users create more and more moments, the ability to go back to their previous moments becomes more and more important. We didn’t want to build an “explore” or recommendation feature, but rather a way for users to find and tell their stories. This is how we built it.

The Search Engine: Elasticsearch
In the landscape of search engines today, there are basically two options: build your own or choose one of the existing solutions built on top of Lucene (http://lucene.apache.org/). We looked at the existing Lucene based solutions, such as Elasticsearch, Zoie, Solr, and Bobo. We ended up choosing elasticsearch due to its architecture (document-oriented, REST API, JSON format, distributed), features, and strong userbase and community (http://elasticsearch-users.115913.n3.nabble.com/). Choosing an existing search engine rather than building our own meant we were free to devote resources to building the actual search product, along with other new happenings at Path (stay tuned!).

Elasticsearch is feature-rich, including the ability to do geo distance filtering (a “nearby” feature on mobile is killer), facets (which power our search suggestions), and routing (which allows us to shard by user). Elasticsearch allows us to live-index our moments giving users near real-time search. Elasticsearch is designed to scale out easily by adding more nodes to the cluster. One does need to be thoughtful about how many shards are originally chosen as that cannot be changed once the index is built. Shay Banon, the creator of elasticsearch, was invaluable in helping us choose the right number.

Search Suggestions
Path Search is different than the popular web search engines. We needed a way to “teach” our users what was possible to search for on Path. Rather than having just an empty white bar staring at our users, we wanted to present our users with possible suggestions. These search suggestions would give users a quick and easy way to perform searches without typing a single letter. The search suggestions needed to ensure that actual moments would be behind the search for this user. This is where we leverage Elasticsearch’s facets. We use facets to query to see if certain searches (does this user have any photo moments? or happy moments?) are satisfied, and if so we store that particular search suggestion in our database.

image

Using Elasticsearch’s filter facets, we can do more than just single term search suggestions (such as “happy” or “photos”). We can actually do complex queries with nested booleans to narrow focus to searches on date + time range, or emotion + full text query, etc. This allows us to offer “search suggestion stories”. Search suggestion stories are a more powerful form of search suggestions that give users a more human feel, such as “Listening to music on the weekend” and “watching movies with friends”.

Query Parsing
A straightforward solution would be to take the user’s provided search query and run it against all searchable fields (friends’ names, places, holidays, moment types, etc.) and ascertain the relevant moments. Since our data is highly structured, we can do better by separating the query into two parts: keywords and full text search. We can find and parse out all keywords and match them to their relevant fields, then do a full text query against fields where it is impossible to know all possibilities (such as comments and places). This filtering nicely fits with the overall product design of finding a story (collection of moments) as opposed to finding the “best” moment.
image
We handle the standard stop words and stemming, but decided to stay away from the rabbit hole of true Natural Language Processing.
Autocomplete
For our autocomplete service, we wanted something that was simple and fast. We didn’t need a solution with a ton of features, just one that did this simple service well. We also needed a service to handle all 19 of our languages - which means it has to be unicode aware and ensures that the stored data and search keys were encoded correctly. Lastly, we needed something that works today but could be easily enhanced to scale/shard in the near future. We decided to build our own (https://github.com/jayridge/autocomplete).

For more information and tips on searching on Path, click here.