Finding Food with an Inverted Index

2025-03-07

Blog Cover

Background

Ohio State University has a lot of students (60k+). Students often get hungry, so OSU has a lot of dining halls, each with a lot of different food options. Finding a dining hall to eat at can be tricky, especially if a student has dietary restrictions or nutritional goals (bulking or losing weight). This blog post goes over how I made OhioState.food to make it easier to browse OSU food options and how I used an inverse index to make it fast.

Project Structure

OhioState.food Diagram

OhioState.food is a Nextjs deployed using Cloudflare Pages. It uses Cloudflares's Nextjs adapter to work with Cloudflare and has the drawback of only supporting the edge runtime for Nextjs (Cloudflare Workers acts as the server). Cloudflare D1 is used for persistence (essentially a hosted sqlite database).

The data is sourced from Nutrislice's API, which has menu information for every dining hall on OSU campus. Every morning, a cron job on Github Actions is used to run a script that updates the database with the latest menu items using the "populate" endpoint on the site. Once the data is uploaded, I use Nextjs Server Actions to query the database with the Drizzle ORM for type safety.

Food Searching

The easiest way to choose a dining hall to eat at is to start with the food you want to eat and work backwards. Lets start with a simple query that queries the "food" table in my database based on food name, dietary restrictions, and nutritional information.

-- Simplified Query

-- Get all columns from foods table, aliased as f
SELECT * FROM foods AS f
-- Filter foods where name contains the search query
WHERE f.name LIKE '%{query}%'
-- Filter by dietary restrictions
AND f.dietary_info == {dietary_filter} 
-- Filter by nutritional requirements
AND f.nutrition == {nutrition_filter} 

This query takes around 10-15 seconds to run on the foods table with 1k+ foods. I found this to be slow, especially since I wanted to display food results on the site live as the user was typing.

The traditional way to speed up a SQL query is to add indexes to frequently read columns in a table. This slightly slows down write speed and increases table storage size, but speeds up queries significantly as the database no longer needs to do a linear scan on the entire table. Dietary and nutrition information are easy to index on as they only need equality and range comparison for queries. However, indexing on full text search with conditions f.name LIKE '%{query}%' requires a bit more effort than simply adding a built in sqlite index. This is where inverted indexes come in handy.

Inverted Indexes

Inverted indexes are a special data structure made to speed up full text searching. They work by creating a mapping of tokens (processed words) to documents. I'll demonstrate how they work by showing what happens when a food document is added to an already existing inverted index.

Step 1: Get Tokens From Document

"Seasoned Fries" getting tokenized

The first step to adding a document to the index is to get a list of all words in the document. This is called tokenization. Lemmatization or stemming is also typically done here to ensure words like "steak" and "steaks" are map to the same documents.

Step 2: Construct or Add to Inverted Index

"Seasoned Fries" added to inverted index

Once the tokens are retrieved from the document, a check is done to see if there is an entry in the inverted index for that token. If there is, a reference to the document (typically its ID) is added to the structure. If there is not, a new entry in the index is made with the document reference as its initial value.

Step 3: Querying the Index

Querying Seasoned Fries"

There are many ways to handle full text queries with an inverted index, but a simple way is to tokenize the query the same way the document was tokenized, get relevant documents using the inverted index, and ranking them on relevancy using an algorithm like BM25. There are many possible optimizations here, such as using data structures like tries to enable searching on prefixes or by tokenizing the index on trigrams to allow substring matching. Overall though any of these techniques should make full text queries drastically faster than a linear scan with sting comparisons.

Implementing an inverted index and effectively using it for full text search is a lot of work. Luckily, there are many ready to use database extensions that allow you to leverage these structures without the hassle. FTS5 is a sqlite extension that handles full text search with inverted indexes for you.

FTS5 works by allowing you to create virtual tables (essentially code that pretends to be a table) backed by an efficient full text search engine (which uses inverted indexes). It also includes additional features like proximity and phrase search. I've attached the FTS5 setup from OhioState.food below.

-- Create FTS5 virtual table on name field in food table
CREATE VIRTUAL TABLE IF NOT EXISTS foods_fts 
USING fts5(name, content=foods, content_rowid=id);

-- Create triggers to keep FTS table in sync
CREATE TRIGGER IF NOT EXISTS foods_ai AFTER INSERT ON foods BEGIN
    INSERT INTO foods_fts(rowid, name)
    VALUES (new.id, new.name);
END;

CREATE TRIGGER IF NOT EXISTS foods_ad AFTER DELETE ON foods BEGIN
    INSERT INTO foods_fts(foods_fts, rowid, name)
    VALUES('delete', old.id, old.name);
END;

CREATE TRIGGER IF NOT EXISTS foods_au AFTER UPDATE ON foods BEGIN
    INSERT INTO foods_fts(foods_fts, rowid, name)
    VALUES('delete', old.id, old.name);
    INSERT INTO foods_fts(rowid, name) 
    VALUES (new.id, new.name);
END;

-- Populate FTS table with existing data
INSERT OR IGNORE INTO foods_fts(rowid, name)
SELECT id, name FROM foods; 

The virtual table can now act as a full text search index that can be used to match a search query for a food item much faster than before. For more details on how FTS5 works, check out this blog. Below is an example of a SQL query that uses the table to do a search on food name, with dietary and nutrition restrictions.

-- Simplified FTS Query

-- Get all columns from foods table
SELECT f.* FROM foods AS f
-- Join with the FTS virtual table
JOIN foods_fts AS fts ON f.id = fts.rowid
-- Use FTS MATCH instead of LIKE for faster search
WHERE fts MATCH '{query}'
-- Filter by dietary restrictions
AND f.dietary_info == {dietary_filter}
-- Filter by nutritional requirements
AND f.nutrition == {nutrition_filter}
-- FTS tables automatically provide ranking
ORDER BY rank;

Our new query now has a full text index on food name along with the existing indexes on dietary nutrition information, bringing down our search delay to <1 second1. Try it live here!

OhioState.food Search Page

Conclusion

In this post we

  • Walked through the design of a food search app for Ohio State Dining halls.
  • Explored how inverted indexes can be used to speed up full text search.
  • Implemented the FTS5 extension into a sqlite database.

The next time you are trying to design a system with full text search, remember that you will likely need a specialized system like an inverted index if you expect any scale at all.

Footnotes

  1. Measured against a local sqlite database. ↩︎