Like pigs hunting down a truffle

There are countless ways to find a good place to eat in a city. Yelp, Trip Advisor, and Google Reviews have that market near cornered.

The problem is, there are also countless reviews on those sites. Well, not literally countless. More in the range of 120 million, in the case of Yelp.

That makes finding a nice Italian place for dinner an undertaking in itself. This is further complicated by the fact that the great restaurant you do find is usually also one of the most popular, making the task of getting a reservation all the more difficult.

My team and I at Upstatement all enjoy a good night out at a solid restaurant, and all shared this same frustration. So for Tank Week (which is the name for a few days we take ever summer to work on internal projects), we decided to try to solve this problem. One of our team members, Matt, happened to be talking about his upcoming trip to Spain and how he was going to go on a guided truffle hunt. Kim, another team member, had not heard of a truffle pig before, so we described how people train pigs to sniff out truffles — hidden gems of the fungi world — and dig them up.

And so, we just happened upon a perfect name for our proposed application: A service which hunts out and finds the hidden gems of the restaurant world. Instead of being yet another review restaurant to slog through, Truffle Pig would select a restaurant for you based on your preferred interval, setup a reservation, send you an invitation, and even make transportation and babysitting plans if needed.

The Challenge

While the other team members tackled business logistics, branding, and presentation, I dove into the technical challenges of such a feat. For a comparison, we kept referring to Stitch Fix, a subscription service that sends out a box of clothing personalized to someone’s fashion requirements. It’s more than just putting in your sizes and color preferences though: They use machine learning) as well as a good dose of human review to send people clothes they will enjoy. We figured a similar approach would work great for Truffle Pig as well.

Also like Stitch Fix, we wanted to go beyond the basic requirements for personalization. Instead of just asking for someone’s location and their favorite cuisine, we were going to delve much deeper. Some of this type of information is surfaced in metadata on sites like Yelp, such as kid-friendliness or parking availability. We wanted to go deeper than just a binary answer though. For example, a restaurant might be kid friendly, but does mean unruly toddlers will be running through the aisles? Or does it mean you can bring your eight year old without getting dirty looks from couples out for a romantic evening?

We could go even further with personalization too, such as what kind of beer menu a bar has, how are the live acts, or how vegetarian friendly is it. The challenge was how to find out this type of information — especially starting from scratch.

Creating a Profile

We’d start with collecting some basic information about the diner. While this data would be the type you could filter with using existing services, it would help us with drastically narrowing down the list of potential restaurants. For example, while someone may normally enjoy fancy restaurants but doesn’t mind the occasional diner, they might be much more restrictive about the cost of the meal. Other questions along these lines might be dietary restrictions, or how far of a drive it is.

Next, the profile would start to get into more personalized questions, such as the aforementioned questions about kid friendliness, attire, or the importance of a good draft beer selection.

Some of these questions are binary (i.e., avoiding meat means a BBQ restaurant is a no go). This is something Yelp is good at. Other options are more nebulous though. When asking “Is parking important?”, if we give the user choices of “Nope, Somewhat, Very, or Required”, how do we weigh a response of “Somewhat?”

The end goal would be to look at our data set of responses for a restaurant, analyze them for ratings and comments about parking, give that a score, and then weight that score lighter or heavier depending on the user’s importance.

Populating the Initial Data Set

In theory, Truffle Pig might one day have a deeply populated database of tens of thousands of restaurant, each with a vast array of data about them. But what could it start with?

Luckily, Yelp has an excellent API as well as a generous dataset available for download. We’d start with that so that we could at least show real-life results, even if this would only be the more superficial layer of data (i.e. name, cuisine, address) to start with.

Selecting a Restaurant

We now have the dataset and user’s profile. Let’s see what selecting a restaurant would involve.

It starts with choosing a cuisine. We know any major ones to avoid based on the profile (i.e. “no fish” would nix any sushi places), so this can just be a random selection. After that cuisine is chosen it would be removed from the pool so that it wasn’t selected again next week. We could add an option to prefer certain cuisines too, allowing diners to discover new restaurants while sticking to more comfortable foods.

Next up is limiting the search results to a location. We would start by limiting all results to within a certain mileage from their home, based on factors like if their profile indicates they have a car but prefer walking. In that case, we might include a restaurant that is 20 minutes away (based on Google Map’s API response for travel time at 7 PM on a Friday drive), but give higher preference to a place that’s just a few block walk.

Another easy filter is the price. If the user indicated a hard cap of $100 for two people, we can exclude that highly reviewed but super expensive tapas place that just opened.

After some of these simple exclusions, we’re left with a list of restaurants any of which the user should have an enjoyable meal at. This is where some of the more advanced selection criteria would come into play.

For this example, let’s say that this user really loves craft beer. This is not a simple “yes/no” for whether a restaurant has a good craft beer list. This is something we’d have to delve more into our machine learning for. We’d look at user reviews mentioning craft beer and determine a positive or negative association with each mention, along with a weighting based on the confidence of that assessment. We’d end up with a number we could associate with the strength of the craft beer offerings at these establishments.

This number, along with dozens of others for the user’s other preferences, would then all be added together (along with their respective weights) to give each restaurant a final score. It’s then just a matter of selecting that restaurant at the top of the list as our prized truffle, and set up the user with a dinner there.

Challenge

The biggest challenge of this project would be getting an accurate analysis of the restaurant data set, so that we could be sure our algorithm was smartly understanding what reviewers were talking about, and parsing those reviews in a way that we could assign scores to various attributes of that restaurant. In theory, we would start with a team who would look at the algorithm’s results and manually help train it.

Oink!

Our team was very excited about this concept, and wish that something like it existed so that we would have our dinner plans all set for the weekend. There are some very big problems to tackle with such an ambitious project, including those outside of the technical ones, but we felt confident they were all ones that could be overcome to deliver a truly personalized restaurant experience.

Jon Heller