What is needed to run Process Mining?

Got raw data? That’s just the first ingredient! Process Mining loves clean data, so explore, prep, and season your event data to perfection. It’s the secret sauce for cooking up powerful process insights!

What data is needed in order to get the ball rolling

Imagine you’re running a lemonade stand, but you have a terrible memory! To understand how well your stand is doing, you decide to track some basic info:

  • Customer ID (CaseID): This is like a number you give each customer. It lets you know it’s the same person coming back for more lemonade (or maybe complaining about a sour batch!).
  • Action Taken (Activity): This is what happened! Did you “Take Order,” “Prepare Lemonade,” or maybe “Resolve Angry Customer Complaint” (hopefully not too often!).
  • Action Time (Timestamp): This is when you did the actions. Knowing the order of your actions is crucial!

With just these three pieces of data, Process Mining can be like a tiny spy at your stand. It can see the basic flow of customers, identify any bottlenecks (maybe you’re slow at making lemonade!), and even tell you if some customers are grumpy more often than others (time to improve your recipe!).

Here’s an example of what the data might look like in a table:*

Customer ID (CaseID)Action Time (Timestamp)Action Taken (Activity)
110:00 AMTake Order
110:02 AMPrepare Lemonade
110:05 AMServe Customer
210:03 AMTake Order
210:10 AMResolve Angry Customer Complaint (yikes!)
210:12 AMPrepare Lemonade
210:15 AMServe Customer (hopefully happier this time!)

This may seem like very little information, but it’s enough for Process Mining to start asking questions and uncovering some basic insights into your lemonade stand’s efficiency!

1. The Case of the Missing Lemonade Logs

Our lemonade stand was a smash hit! Customers loved our secret recipe (mostly), and business was booming. But with great success came a new challenge: we were drowning in customers. Lines were long, tempers were flaring, and worst of all, we had no idea why!

Remember that tiny spy we hired (Process Mining)? Turns out it can’t work miracles. It needs good intel, and all we had were a few scribbles on a napkin. Here’s where things got messy:

  • The Data Detective: Our first problem was finding all the juicy details. Customer orders were scattered across sticky notes, loose receipts, and even a crumpled napkin in our pocket (gross!). It was like a detective story, piecing together information from all these random sources (databases, flat files, message logs, you name it!).
  • Speaking the Same Language: Even when we found the data, it wasn’t always clear. Some notes said “Customer Happy!” while others just had a frowny face. We needed a translator (data standardization) to make sure the spy understood what each scribble meant.
  • Asking the Right Questions: Finally, we had to figure out what we actually wanted to know. Were the lines long because people were slow at ordering, or maybe we were taking too long to make the lemonade? Asking the right questions helped us focus our data collection (different views on the data).
Lemonda log

It turns out, cleaning up this data mess was a whole new adventure. But in the next chapter, you’ll see how, with a little detective work and some help from our data-loving spy, we were able to optimize our lemonade stand and become the envy of the neighborhood!

2: The Great Data Dig

Our lemonade stand was a smash hit, but the lines were a nightmare! We knew we needed our data spy (Process Mining) to help us out, but first, we had to get it some decent intel. That meant a deep dive into the world of data extraction – basically, finding all the hidden clues about our customers and turning them into something our spy could understand.

Here’s what we discovered:

  • Treasure Hunt: Sometimes, the data was like buried treasure – hidden in dusty corners of our systems (web pages, emails, PDFs). We had to become data archaeologists, digging through old files and using fancy tools (screen scraping) to unearth the information we needed.
  • Lost in Translation: Even when we found the data, it wasn’t always clear. Some clues were scribbled on napkins (unstructured data) and others were locked away in a secret code (missing metadata). We needed a translator (data standardization) to decipher it all.
  • Focus is key: With so many data sources (thousands of tables!), it was tempting to just grab everything. But just like you wouldn’t try every flavor combination at the ice cream shop, we needed to focus on the questions we wanted answered. Did customers take too long to order, or were we the bottleneck making lemonade? Focusing on these key questions helped us prioritize which data to extract.

It wasn’t easy, but with a little elbow grease and a healthy dose of curiosity, we managed to unearth a treasure trove of data. In the next chapter, we’ll see how we cleaned up this mess and finally got our data spy working for us!

3: The Data Detox

We had a mountain of data, thanks to our heroic extraction efforts (see Chapter 3). But hold on to your hats, because this data was a mixed bag – some useful customer info, some random scribbles, and a whole lot of stuff we just didn’t need. It was time for a data detox!

Filtering became our new best friend. Think of it like sorting through a messy toolbox. We started with big picture stuff (coarse-grained scoping) when we extracted the data. Now, it was time to get detailed (fine-grained scoping).

Here’s how we tackled the filtering challenge:

  • Focus on the Stars: Imagine the most frequent customer orders as the shiny new tools in our toolbox. We decided to focus on the top 10 most common activities (ordering, waiting, receiving lemonade) to keep things manageable for our data spy. The rest could wait in the back of the shed (for now).
  • Iteration is Key: Filtering wasn’t a one-time thing. As our data spy started analyzing the clean data, it pointed us towards new areas to focus on. It was like a detective following leads, constantly refining our filter based on new insights.
Focus

With the data sparkling clean (well, mostly clean), it was finally time to unleash the real power of our data spy (Process Mining) in the next chapter! We’d explore different techniques like discovery, conformance, and enhancement to diagnose our lemonade stand’s problems and become the most efficient lemonade operation on the block!

4: The Data Makeover

Our data detox (Chapter 4) did wonders, but there was still one crucial step before unleashing our data spy (Process Mining) – the data makeover! Imagine a customer walking into our stand with a crumpled money bill. We wouldn’t reject them, but it would be a lot easier to handle if the bill was crisp and clean. That’s the idea behind data cleaning.

Here’s what we needed to do:

  • Case Closed: A process is like a customer’s journey – it has a beginning, middle, and end. We needed to connect all the events related to a single customer (case) – their order, wait time, and finally, receiving their lemonade. Think of it as organizing all the receipts for a single customer visit.
  • Speaking Process: Our data wasn’t always talking the process language. Activities needed to be clearly defined as status changes for each customer’s journey (case). For example, “Customer Happy!” wasn’t specific enough. We needed a clear status like “Lemonade Delivered.”

It wasn’t the most glamorous part of the adventure, but with a little data wrangling and some clear thinking, we finally had a sparkling clean dataset! With this data that our data spy transformed, we uncover the secrets behind our long lines and turn our lemonade stand into a bubbling beacon of efficiency (and deliciousness)!