Food Ontology
Improving product categorization using Natural Language Processing for Latam largest food delivery service working in 9+ countries
About the client
PedidosYa is part of Delivery Hero and they are the market leader for food delivery in LATAM. They are located in multiple countries in the region and are expanding by buying competitors like Glovo.
The goal
The client has a vast and diverse food catalog of all restaurants in the region that offer their services through PedidosYa application.
We were given the task to improve existing categorization and to extract further structured information that could allow PedidosYa improve their search results, recommendations and decisions.
It was a typical natural language processing problem.
hamburguer and fries
hamburguer and fries
The Data
In this case data was vast but it was not labeled. Labelling the entire dataset was not a possibility because of time constraints. Natural language complexities were abundant as data was inputted by small restaurant owners following, and language variations from country to country only made things worse.
What was a sandwich in some locations was an emparedado in others and when some ice cream shops sell by kilogram, others sell by litre. Although french fries are usually a side dish, if sold alone they can be a plate you share with others and what is called Peruvian cuisine in Argentina is just a typical plate in Peru.
Our Solution
Alongside PedidosYa's great team we designed and built a data processing pipeline that was able to apply natural language processing combined with classifiers in multiple stages. Accuracy of the pipeline components reached +94%.
Embeddings were used to extract tags from individual components and XGBoost and Catboost were used at different stages to detect multiple components in a single item and to correctly classify each of them.
Dependency parsing and part of speech tagging allowed us to extract complex information about quantities and item descriptions.

We built a data pipeline using Google’s Dataproc in Python that could have multiple inputs and outputs to adapt to changing architecture definitions and to facilitate future experimentation.

The system is capable of inputting and outputting information from Google’s Big Query or MongoDB among others.

google cloud logo python logo
Do you want to know more? Contact us.
Or drop us an email at
[email protected]
Got a project?
Let’s talk
Tell us about it