Interview: Yves Mulkers

Yves Mulkers
Data and Analytics Strategist,
7wData

Columbia Road: You’ve been working in the field of data for years. Would you like to share any concrete recent development areas that you’ve been working on?

Yves: On a really concrete level, just recently I’ve been evaluating data catalogues on the market; it takes a lot of time after the business analysis work to find the right data and identify it across systems. Data catalogues help you find where the data is coming from and how it goes from one hub to another, helping you to find, verify and understand it. Some data catalogues do this with machine learning, so they look at the data, help you identify it and then build that into the model. If you have a business understanding and, say, you’re looking for a client, in one system it might be listed as a client ID and another with a client number. The data catalogue can look at the data to find patterns and see where there are matches, recognising that client IDs and client numbers are the same thing. This is very helpful because otherwise I have to write and identify a query, and it’s quite technical to make sure that columns in different systems with different labels contain the same data.

CR: Do you have any success stories to share where data catalogues would have been implemented and utilised?

Yves: There have certainly been some successful implementations of data catalogues in the utility industry. The trouble is humans are very negative — we see the bad in everything. So if you read articles, 80% of data catalogue implementations fail and 80% of machine learning and AI projects fail. But there have definitely been lessons learned, and you need to look at failures from a lessons-learned perspective; in the case of data catalogues, for example, maybe they threw everything from the operational systems into the data catalogue and then manually started to map and identify. So there is a lesson to learn there in starting with a business use case and taking only that data and putting it into your system, analysing it and growing from there. It’s not a big bang approach, but it allows you to go forward step by step and continuously show the value of what you’ve been building.

CR: When it comes to AI, there are some really interesting use cases with global players like Amazon, and maybe Zalando in Europe. But if your turnover is only 1–2 billion dollars a year, what are the use cases for machine learning for digital sales? I haven’t seen too many for smaller companies.

Yves: For smaller shops it’s the typical market basket analysis and these types of algorithms that help. It’s the returning visitor data and the intent data that shows what someone’s been looking at. That’s popping up more and more in ecommerce — you’re looking to buy a fridge, so you look at some on a website, then they combine that data and enrich it with other data and realise “hey, this person is really looking for a fridge!” Then they know to put adverts everywhere — wherever you go online. Now it’s very likely you’ll buy that fridge. A lot of companies are now focusing on intent data for ecommerce because you can find hot leads, so to speak. In B2B it’s much harder of course, because people don’t tend to surf around and vendors don’t have that connection in B2B. The hidden gem is all the digital markers you leave behind where you can identify that a company is very likely to make a new technology investment. It’s not a perfect science, but it gives you some idea of who to approach in the B2B space so you can reach out if there’s an interest.

CR: And B2B companies aren’t really doing that. It would be possible to track not only individuals, but also how many people from one company are visiting different touchpoints. There’s definitely potential there.

Yves: I know which companies are visiting my website from tracking data and this helps me ensure that the message I have on my website addresses my ideal client. A lot of people aren’t aware you can do that. When I see a company has visited I sometimes reach out on LinkedIn and start a conversation — it’s a warm lead rather than just spamming people. There is a lot of MarTech knowledge that can be used, and if you combine that with other kinds of knowledge from finance and technology, you can optimise on all levels. Adding data from your website to your customer profiles can help you to optimise your sales. Think about your complete business in a holistic way and not just your marketing pipeline – I find it fascinating to build up different knowledge from all parts of a business.

CR: A typical challenge that our clients face is combining product data from very different markets and product lines into one aggregated database. Do you have any case studies that show how they can succeed in this process?

Yves: I’ve experienced two cases like this that immediately spring to mind. The first was a retailer who sold a lot of consumer products. They have a distribution hub in Sweden and various sales organisations all over Europe and the project we worked on was supply chain optimisation. The challenge was that they were using siloed systems and every physical product had a different commercial name depending on which country’s database we were looking at. In order to optimise the complete supply chain, including distribution and physical shops, we had to understand the different product names to confirm that we were talking about the same product.

It seems simple but it’s not just a matter of translation — sometimes you need to package your products in different quantities, for example, and this is important because it can be a legal issue. Back in the day, aggregating or aligning products was mostly technical, you had it identified in your enterprise resource planning (ERP) system in a certain way. But if you look at a product in two different ERP systems you need to ask if it’s really the same type of product, and if not whether we can look at it in the same way when optimising the supply chain? It’s both a technical challenge and a business challenge: understanding what the product is, but also how you handle, store or ship it — there are various ways to say whether it can be categorised as the same product.

CR: I’ve experienced a similar problem with a client, where product lines have different names in different countries, but the key challenge with ecommerce is they still need to get everything in the same webshop. It’s important to have an overall understanding of the data, which might have different codes and names in different countries. You make a good point that it’s not just about the data IDs, it’s about the whole supply chain including the means of delivery and packaging choices. People often think this is something to figure out at the end of a webshop project, but it can really cause problems if it hasn’t been identified beforehand.

Yves: Yes, and that's if you're talking about products within your own company. If you take a company like Amazon, it’s another step further. And if you want to have the complete supply chain from producer to distribution and so on, it becomes even more complicated. I've attended some conferences where a unique global product identifier was discussed, and I thought, okay, it seems simple, but apparently there are 100 different standards for product identifiers out there. Which one do you choose? It's a big challenge, and that’s just the ID! And then of course, there are an endless number of other attributes to consider. If you look at the packaging, that’s even more complex. With boxed products for example, how many go in each box? In one country it might be 12, in another 24. It’s a big challenge to standardise the data across countries.

CR: How did you go about solving these issues in the project you mentioned?

Yves: I wasn’t involved in that particular part of the standardisation, instead we were confronted with different things in different systems. It was on the master data management (MDM) track where they were standardising the products and saying that these five IDs, say, would become this one ID in the end. We had a preparation list of around 100 products where we could see what the final, global unique identifier for each product would be. This meant we could anticipate our data model to work with both the old and new IDs, and we found a way for the matching and calculating to work as long as we had both sets. When we evolve and gradually fade out the old IDs, the new ones come into place. You need to find a way of working with the two systems because you’ll always have a transition period with one system that’s already been migrated to the new product IDs and others that haven’t yet. We needed to anticipate the change on a technical level, while cooperating closely with the people working on the MDM track.

CR: You mentioned a second case example. What was it?

Yves: Yes, it was an identification of medicinal products (IDMP) project for a pharmaceutical company. In Europe there is often a shortage of pharmaceutical products on the market, so the idea was to standardise products across all pharmaceutical companies to improve availability. There are basically two sorts of products, vaccines and drugs, which are treated differently in terms of how they’re used. We were focusing on vaccines — the vaccine goes into a serum and then into a plunger, and understanding the packaging was almost a project in itself! We had a data model but the biggest complexity that slowed the project down was that there were 7,000 stakeholders around the table trying to drive the project. That’s pharmaceutical companies, lobby groups, legal stakeholders and government representatives — there are so many people with different interests.

We had a good team who were weeding out discrepancies in the data model, but there were still so many errors. So, you learn that there’s always more to do and discuss. The approach was to work for six months, design the model and then throw it out for everyone to comment on. This allows initiatives to be crowd created. One thing I learned from this project is that if you get something thrown to you to comment on, don’t be judgemental that it’s wrong, just start with it and let it evolve but keep it moving forward. There are so many discussions to be had before projects finally move forward, especially standardisation projects.

CR: That’s an interesting story, and the learning points from it will apply to many other cases. The key is facilitating cooperation and discussion between people who have different opinions and who are coming at things from different angles. Do you have any best practices or tips for how to make such a collaboration go smoothly?

Yves: One thing I learned on the project was that if you have a good vision you can be in the driver’s seat. In the IDMP project there was too much discussion without anything moving forward, so I decided to take on the responsibility and accountability for getting things moving. Another thing was that the data needed to be standardised.

The challenge with pharmaceutical products is that all the complexity is in unstructured data in documents, for example all the possible side effects if you take a medicine. I suggested doing text mining and running a semantic model on it, but I found that the ontology doesn’t yet exist to define everything in the documents. So in the end we hired a bunch of students for fortnightly periods and they just read through all the documents and manually highlighted all the entities and concepts and we extracted them in a physical, manual way. This gave us our dictionary and context for the data mining – we had a solid result, which was better than building and training a model that would have taken more time.

CR: When it comes to huge amounts of data, I think using students is a really underrated method! Too often companies think they need to automate things even for a one-time analysis, but thousands of lines of product data is not that complex to go through with someone who’s motivated.

Yves: It’s a trade-off and it’s pragmatic — we want to get results and if you build a model it can be more expensive than students or cheap labour. Websites like Amazon Mechanical Turk let you say what you need, and maybe 1,000 people will pick up your task. This allows you to do something manually at first, and if you see that it’s working you can automate the process as you go, but you’ve already reduced the time to market. Twenty years ago I thought that everything could be automated, but actually there are times when human beings are better and faster at completing a task.

CR: I see three key learning points from this discussion, the first being that every data project is really a business project — business should lead it and you should get an end-to-end understanding of the data you’re looking at. The second is the importance of getting people together to facilitate discussion, achieve mutual understanding and then iterate forward. And the third key learning point is to think before you automate — find out which processes you can do manually, experiment with those and only automate after that.

Yves: I agree, and related to that final point, the importance of finding pain points and focusing on fixing those with automation. Once you’ve been through the process you’ll understand it, and then it’s easier to automate in the correct way — it’s very important to find the right steps and to understand why you’re taking these steps. You can scale automation, make the process faster and remove human error, but it’s important to find the right situations in which to apply automation.


YVES MULKERS is a data strategist at 7wData who has been working for over 20 years on IT-related matters, initially on the technical aspects of data and integration and more recently on data strategy and developing data-related business opportunities. Yves is widely recognised as a top-10 influencer in big data, AI, the cloud and digital transformation and he uses his knowledge and experience of operational environments and emerging technologies and capabilities to help companies achieve their data-related goals.