AI Zalando

Zalando

Name of Nominated Company (or Person): Anthony Brew

Nominated Award:
Best Application of AI in a Large Enterprise and Intelligent Automation – Best Use of RPA & Cognitive

Website of Company:
www.zalando.ie

Founded in 2008 in Berlin, Zalando is Europe’s leading online fashion platform. We connect customers, brands and partners in 23 countries.

We are actively working to become the operating system for fashion. To achieve this, we have built the infrastructure that brings together a variety of players in the growing digital market for fashion and therefore we create a whole new ecosystem that connects customers, brands and partners. This is all made possible thanks to our strong expertise in the fields of fashion, technology, AI and convenience, which allows us to offer our customers an unlimited range and convenient services designed to suit their requirements.

The Zalando shop is the core of our platform. This is where customers can find exactly the clothes they are looking for: from leading international brands, through “fast fashion ”, to Zalando’s private labels. From the outset, we have been obsessed with serving our customers, for example we were the first European retailer to offer free shipping and returns within a 100 day window. Customers of the Zalando fashion store can find a huge assortment provided by more than 4,500 brands and more than 950,000 products. For Zalando, this obsessive focus on customers guides our mindset and actions.

In Q2 of 2021 we had 45 million active customers, who after 560 million visits drove mor e than 66 million orders. Each order contains multiple fashion items from a n assortment of over one million products. Being customer centric at this scale means that we need to preserve the personal touch with thoughtfully created user experiences and scalable processes. It also means we strive to meet customers where they are, we allow design, technology and AI to bridge the communication gap technology creates. Our ability to do this at scale i s the reason that Zalando has successfully moved from selling flip-flops in 2008 to being the starting point for fashion.

Reason for Nomination:

Zalando’s success is because of its obsession with serving customers. As we have scaled this becomes more challenging. Behind our digital experience we have multiple processes where we drive innovative customer obsession at scale.

We process ~66 million multi-item orders each year. Around 50% of items are returned because when customers try them, they do not suit. Most returns are due to not matching customers preferences, however about 100K monthly complaints indicate issues with the quality of products (e.g. loose seem, broken zipper). A small portion of these complaints indicates a broader safety issue (e .g. broken heel) requiring manual lab based investigations to ensure the safety of all customers. It’s imperative to prioritize these cases, act fast and remove unsafe items from being sold to customers.

Al l complaints are initially processed by our front-line customer care team. When they identify a safety issue, they escalate the complaint to the product safety inspection team. However, the rarity of complaints requiring escalation creates a challenge. Namely, the front line team doesn’t exercise the escalation muscle regularly enough. This results in the front-line escalations having low precision and more worryingly potentially missing cases. This human escalation process is an industry and compliance standard.

Leveraging and extending the Zalando created state-of-the-art N LP open-source framework Flair (https://github.com/flairNLP/flair), we systematized a human-in-the-loop escalation system to augment the front-line escalation process. We constrained the system to attain high recall (>95%) and optimized it so that it would not overload the safety inspection team. Our monitoring shows precision ~10x higher than the human escalation process, while it forwards ~5x more cases for inspection. This helps to ensure customer safety while significantly reducing our compliance risk.

The development of this system has also led to significant advances in the state-of-the-art for machine learning and NLP.

State-of-the-art advances:

1. Zero-shot text classification: while product safety issue s fall into several clear taxonomies (e.g. sharp objects or chemical smells) new problems that we have not yet encountered can arise. For example, mould was found in a line of popular sneakers that needed to be reacted to fast. Zero-shot learning enables us to rapidly deploy a classifier on new classes with little or no training data. This work was published in COLING’2020 (https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf)

2. Label noise: The data that we receive is text-based and as such human judgement on what should and should not be investigated varies. Label noise has a direct effect on how well our system performs. To address this we modified the classification task to build on a pre-training task which leverages a siamese neural network that learn’s if two examples are from the same product quality and product safety subclasses, we call this task-specific -embeddings. The classification performance of this model is on par with multilingual BERT based models but provides the added benefit of being able to rapidly index similar items for presentation using the learnt embedding stored in FAISS. This enables us to systematize “cleaning” the annotated data when the algorithm observes (potential) human error.

3. Noisy Negatives: Our data is very skewed (i.e. very very few potentially dangerous items!), we needed to build a high recall system and we noticed at first our precision, while good, overwhelmed the capacity of inspection agents. We developed a system that samples the un-inspected feedback to include data points that are probably negative but have not been labelled as such because they were not seen by the product safety team (hence the name of “noisy negatives”). By augmenting our training set to include these examples we observed a 7.5x decrease in volume of cases we forwarded for inspection while retaining high recall.

4. Parallel Corpus: We observed multilingual language models outperform monolingual models where we translated all data into a target language (e.g. English). However, we f aced a cold-start problem when rolling out to all 28 languages, particularly less active markets. To bootstrap, we augment each languages starting dataset by translating from all 28 existing languages to each language and over time fading out the translated data as we start to fully rely on the new languages labelled data.

This is a live system, working in real time. As new customer feedback arrives, it continually and automatically retrains. Analytical and technical performance is monitored and it is fully GDP R compliant. This now is serving all Zalando customers in 23 countries who speak 18 different languages helping us do what we do best, obsessing about serving them at scale.

Additional Information:

In the last year we have submitted three publications (one published and two in review) on the back of this work. Below are more detailed summaries of each

1. State-of-the-art approaches f or text classification leverage a transformer architecture with a linear layer on top that outputs a class distribution for a given prediction problem . While effective, this approach suffers from conceptual limitations that affect its utility in few-shot or zero-shot transfer learning scenarios. First, the number of classes to predict needs to be pre-defined. In a transfer learning setting, in which new classes are added to an already trained classifier, all information contained in a linear layer is therefore discarded, and a new layer is trained from scratch. Second, this approach only learns the semantics of classes implicitly from training examples, as opposed to leveraging the explicit semantic information provided by the natural language names of the classes.

For instance, a classifier trained to predict the topics of news articles might have classes like “business” or “sports” that themselves carry semantic information. Extending a classifier to predict a new class named “politics” with only a handful of training examples would benefit from both leveraging the semantic information in the name of a new class and using the information contained in the already trained linear layer. This paper presents a novel formulation of text classification that addresses these limitations. It imbues the notion of the task at hand into the transformer model itself by factoring arbitrary classification problems into a generic binary classification problem.

Our first publication called TAR’s has been added to the Flair  open source library

(https://github.com/flairNLP/flair/blob/master/resources/docs/TUTORIAL_10_TRAINING_ZERO_SHOT_MODEL.md).

This has picked up significant interest in the wider NLP community  (e.g. towardsdatascience (https://towardsdatascience.com/zero-and-few-shot-learning-c08e145dc4ed), nlp.town (https://nlp.town/blog/zero-shot-classification/)) and has been cited by other commercial vendors showing its applicability to many use cases (e.g by Amazon (https://assets.amazon.science/10/72/e3dcf5174fcdb724a51b492c1fc4/enhancement-and-analysis-of-tars-few-show-learning-model-for-product-attribute-extraction-from-unstructured-texts.pdf)).

2. Current state-of -the-art approaches to text classification typically leverage BERT-style Transformer models with a softmax classifier, jointly fine-tuned to predict c lass labels of a target task. In our second academic submission which is currently under review, we instead propose an alternative training objective in which we learn task-specific embeddings of text: our proposed objective learns embeddings such that all texts that share the same target class label should be close together in the embedding space, while all others should be far apart. This allows us to replace the softmax classifier with a more interpretable k-nearest-neighbor classification approach useful for many industrial applications such as presently in this submission.

Our final academic submission is a system paper that highlights the use of this technique in concert.