Cooking is an art, most of us millennials would agree to that, won’t we? Those pots and pans of burnt pasta and inedible rice bear testimony to how delicate an art it is. It takes patience, time, practice and oodles of skill to blend all the ingredients together in the perfect proportions to create that tasteful dish. It takes years of practice for a person to become a professional chef. And then, here walk in the neural networks, who are learning how to cook by being given only an image of our all-time favorite dinner (and even breakfast, and lunch and snack and everything in between!) – pizza.
A new study published on popular distribution service and open-access archive, arxiv.org, in the field of deep learning, titled, “How to make a pizza: Learning a compositional layer-based GAN model”, explores how machine learning can be used to transform a single image of pizza into a step-by-step guide on how to create that pizza. This is the PizzaGAN project. This project is described as, ‘an experiment in how to teach a machine to make a pizza by recognizing aspects of cooking, such as, adding and subtracting ingredients or cooking the dish.
The abstract of the study, reads as follows:
“A food recipe is an ordered set of instructions for preparing a particular dish. From a visual perspective, every instruction step can be seen as a way to change the visual appearance of the dish by adding extra objects (e.g. adding an ingredient) or changing the appearance of the existing ones (e.g. cooking the dish). In this paper, we aim to teach a machine how to make a pizza by building a generative model that mirrors this step-by-step procedure. To do so, we learn composable module operations which are able to either add or remove a particular ingredient. Each operator is designed as a Generative Adversarial Network (GAN). Given only weak image-level supervision, the operators are trained to generate a visual layer that needs to be added or removed from the existing image. The proposed model is able to decompose an image into an ordered sequence of layers by applying sequentially in the right order the corresponding removing modules. Experimental results on synthetic and real pizza images demonstrate that our proposed model is able to:
- Segment pizza toppings in a weakly-supervised fashion
- Remove them by revealing what is occluded underneath them (i.e. inpainting), and
- Infer the ordering of the toppings without any depth ordering supervision.
To put it in a nutshell, the Generative Adversarial Network or GAN deep learning model is trained to recognize different steps and objects involved in making pizza, and in that way, is able to look at a single image of a pizza, dissect and break it down into different objects and layers, thereby being able to recreate the whole pizza, step-by-step.
Now making a pizza involves multiple steps – kneading the dough, rolling the dough into the pizza base bread, applying the sauce on the base, putting in the cheese, adding/removing various toppings, etc. With the completion of each of these steps, the image of the pizza changes. The visuals after each of these steps is fed into the neural networks. Once this is done, the machine would begin recognizing and connecting each of the steps to the finished product.
When the pizza data set was in its first stage, there were approximately 5500 images, with just about every image being synthetic and created like a clip art. According to the team, doing this helped them save valuable time and enabled them to separate the toppings from the base, while improving the results they got from the neural networks.
Once these synthetic images were added to the pizza dataset, the research team progressed to feeding the GAN with an additional 9213 ‘real’ pizza images procured from the web. The dataset finally was chosen to have 12 toppings, which included arugula, broccoli, corn, basil, mushrooms, tomatoes, bacon, olives, peperoni, etc. Images of each of these was fed to the model.
Now, when a test image is given to this model, it begins by detecting the toppings that appear in the pizza, classifying them into separate toppings. After that, the model predicts the order in which the toppings were added, based on how they appear in the image that was fed, in an order from top to bottom.
Thus, when the PizzaGAN was shown an image of a pizza, it was able to predict and create a step-by-step guide for creating that pizza as an output.
So, Input: Image of a pizza
Output: Step-by-step guide on how to make that pizza
So far, the PizzaGAN has been highly successful delivering great accuracy in the results it produces. Undoubtedly the highest accuracy was observed in the dataset of synthetic images created in clip-art style.
Currently, the model is only tested on pizzas, but its applications could extend beyond pizzas to other food items that are also layered, such as, burgers, sandwiches, salads, etc. The research team hopes to expand its model to be used in the varied domains beyond food too, like the digital fashion shopping assistants, where one of the key operations is the virtual combination of the different layers of clothing.
You could also create such amazing models, we are sure we would love to. Your name could be associated with some of the coolest projects the world has ever seen. But for that, you need to know machine learning like the back of your hand. And the first step in that direction would be to get trained and certified in the field. Cognixia – world’s leading digital talent transformation company is a digital technology preceptor striving to deliver the most updated machine learning training programs to individuals and corporate workforce. To know more about training programs in the field of machine learning, reach out to us today!