Case study
Food Ontology
Improving product categorization using Natural Language Processing
for Latam largest food delivery service working in 9+ countries
PedidosYa is part of Delivery Hero and they are the market leader for food delivery in LATAM. They are located in multiple countries in the region and are expanding by buying competitors like Glovo.
hamburguer and fries
woman eating
The goal
The client has a vast and diverse food catalog of all restaurants in the region that offer their services through PedidosYa application.
We were given the task to improve existing categorization and to extract further structured information that could allow PedidosYa improve their search results, recommendations and decisions.
It was a typical natural language processing problem.
The Data
In this case data was vast but it was not labeled. Labelling the entire dataset was not a possibility because of time constraints. Natural language complexities were abundant as data was inputted by small restaurant owners following, and language variations from country to country only made things worse.
What was a sandwich in some locations was an emparedado in others and when some ice cream shops sell by kilogram, others sell by litre. Although french fries are usually a side dish, if sold alone they can be a plate you share with others and what is called Peruvian cuisine in Argentina is just a typical plate in Peru.
Our Solution
Alongside PedidosYa's great team we designed and built a data processing pipeline that was able to apply natural language processing combined with classifiers in multiple stages. Accuracy of the pipeline components reached +94%.
Embeddings were used to extract tags from individual components and XGBoost and Catboost were used at different stages to detect multiple components in a single item and to correctly classify each of them.

We built a data pipeline using Google’s Dataproc in Python that could have multiple inputs and outputs to adapt to changing architecture definitions and to facilitate future experimentation.

The system is capable of inputting and outputting information from Google’s Big Query or MongoDB among others.

Do you want to know more? Contact us.
Or drop us an email at
[email protected]
Privacy policy
Get in touch with one of our specialists. Let's discover how can we help you.
Training, developing and delivering machine learning models into production
Got a project?
Let’s talk