Where to Source and Structure Data for Process Mining
Harness process mining to enhance workflows by extracting and properly structuring data from existing systems, ensuring accurate analysis.
Extracting Data from SAP for Purchase-to-Pay (P2P) Process Mining
To analyze the Purchase-to-Pay process with process mining, you need to extract a high-quality event log from SAP. Below are the typical steps and structures involved.
Before extracting data, clarify the process steps you want to include. A standard P2P process consists of:
You’ll usually pull data from the following tables:
Process Step | SAP Table | Description |
---|---|---|
Purchase Requisition | EBAN | Purchase Requisition Header |
Purchase Order | EKKO / EKPO | PO Header (EKKO) & Items (EKPO) |
Goods Receipt | MKPF / MSEG | Material Document Header & Line Items |
Invoice Receipt | BKPF / BSEG | Accounting Document Header & Line Items |
Payment | PAYR / REGUH | Payment Document Tables |
The following fields are typically required for process mining. You can adjust based on your specific needs.
Case ID | PO Number / Document No. | Groups all events in a single process |
---|---|---|
Activity | Derived (e.g., “PO Created”) | Name of process step |
Timestamp | AEDAT, BUDAT, CPUDT | When the step occurred |
User/Agent | ERNAM, USNAM, etc. | Who executed the action |
Amount | WRBTR, DMBTR | Value associated with the step |
Vendor / Material | LIFNR, MATNR | Partner or material info |
SAP ECC (ABAP stack)
SAP S/4HANA (HANA stack)
Minimum columns needed for process mining
Column | Example | Example |
---|---|---|
Case ID | Unique identifier for the process instance | Purchase Order ID |
Activity | Name of the process step | ”Create PO”, “Post Invoice” |
Timestamp | Date and time of the event | 2025-04-17 10:22:12 |
Optional columns to enrich the event log
Column Name | Description |
---|---|
User | User who performed the action |
Amount | Monetary value associated with the event |
Vendor / Material | Vendor or material involved in the event |
Company Code | Company code associated with the event |
Currency | Currency of the transaction |
Document Type | Type of document (e.g., PO, GR, Invoice) |
Other Fields | Amount, Vendor, Currency, etc |
tip
Unpivot your data to turn multiple date fields into one row per activity if needed.
Once you have the raw data, you may need to transform it into a format suitable for process mining. This often involves:
Tools you can use:
Data cleaning and preparation is a crucial step in the data analysis process. It involves identifying and correcting errors, inconsistencies, and inaccuracies in the data to ensure its quality and reliability.
" Example ABAP code to extract Purchase-to-Pay event log data
REPORT z_p2p_event_log_export.
TABLES: ekko, ekpo, mseg, mkpf, rbkp, rseg.
DATA: BEGIN OF event_log OCCURS 0,
case_id TYPE ekko-ebeln,
activity TYPE char40,
timestamp TYPE sy-datum,
username TYPE sy-uname,
amount TYPE ekpo-netwr,
END OF event_log.
START-OF-SELECTION.
" 1. Extract Purchase Order creation
SELECT ebeln aedat ernam netwr
INTO (event_log-case_id, event_log-timestamp, event_log-username, event_log-amount)
FROM ekko
WHERE aedat >= '20240101'.
event_log-activity = 'Create Purchase Order'.
APPEND event_log.
ENDSELECT.
" 2. Extract Goods Receipt (MIGO)
SELECT mblnr budat usnam ebeln
INTO (event_log-case_id, event_log-timestamp, event_log-username, event_log-amount)
FROM mseg
INNER JOIN mkpf ON mseg~mblnr = mkpf~mblnr
WHERE bwart = '101' AND budat >= '20240101'. " GR document type
event_log-activity = 'Goods Receipt'.
APPEND event_log.
ENDSELECT.
" 3. Extract Invoice Posting (MIRO)
SELECT ebeln bldat usnam wrbtr
INTO (event_log-case_id, event_log-timestamp, event_log-username, event_log-amount)
FROM rseg
INNER JOIN rbkp ON rseg~belnr = rbkp~belnr
WHERE bldat >= '20240101'.
event_log-activity = 'Invoice Posted'.
APPEND event_log.
ENDSELECT.
" 4. Export results to file (optional, if permissions allow)
DATA: filename TYPE string VALUE '/usr/sap/tmp/p2p_event_log.csv',
file TYPE string,
rc TYPE i.
OPEN DATASET filename FOR OUTPUT IN TEXT MODE ENCODING DEFAULT.
LOOP AT event_log.
CONCATENATE event_log-case_id
event_log-activity
event_log-timestamp
event_log-username
event_log-amount
INTO file SEPARATED BY ';'.
TRANSFER file TO filename.
ENDLOOP.
CLOSE DATASET filename.
WRITE: / 'Event log exported successfully.'.
Section | Purpose |
---|---|
ekko | Header table for PO creation |
mseg + mkpf | For Goods Receipt (GR) entries |
rseg + rbkp | For Invoice Receipt (IR) entries |
OPEN DATASET | Writes the event log as a CSV file to the server (check permissions) |
bwart = ‘101’ | Filter for GR movement types |
You can use SQL queries to extract data from the relevant tables. Below are some example queries for each step of the P2P process. Adjust the queries based on your specific requirements and SAP version.
Here’s a simple example
-- Purchase Order creation
SELECT
EKKO.EBELN AS CaseID,
'PO Created' AS Activity,
EKKO.AEDAT AS Timestamp,
EKKO.ERNAM AS User,
EKKO.LIFNR AS Vendor,
EKKO.WAERS AS Currency,
EKKO.BUKRS AS CompanyCode
FROM EKKO
WHERE EKKO.BSART = 'NB'; -- Standard PO type
-- Goods Receipt
SELECT
MSEG.EBELN AS CaseID,
'Goods Receipt' AS Activity,
MKPF.BUDAT AS Timestamp,
MKPF.USNAM AS User,
MSEG.MATNR AS Material,
MSEG.MENGE AS Quantity
FROM MSEG
JOIN MKPF ON MSEG.MBLNR = MKPF.MBLNR
WHERE MSEG.BEWTP = 'R'; -- Receipt type
-- Invoice Receipt
SELECT
BSEG.BELNR AS CaseID,
'Invoice Posted' AS Activity,
BKPF.BUDAT AS Timestamp,
BKPF.USNAM AS User,
BSEG.WRBTR AS Amount
FROM BSEG
JOIN BKPF ON BSEG.BELNR = BKPF.BELNR
WHERE BKPF.BLART = 'RE'; -- Invoice type
You can unify these logs into a single event log table for process mining:
-- Combine event logs (simplified UNION)
SELECT * FROM (
SELECT 'PO Created' AS Activity, EBELN AS CaseID, AEDAT AS Timestamp FROM EKKO
UNION
SELECT 'Goods Receipt', EBELN, BUDAT FROM MSEG JOIN MKPF ON MSEG.MBLNR = MKPF.MBLNR
UNION
SELECT 'Invoice Posted', BELNR, BUDAT FROM BKPF
) AS EventLog
ORDER BY CaseID, Timestamp;
Once you have a unified structure with at least: • Case ID • Activity • Timestamp
Pro Tip
• Normalize timestamps to the same timezone. • Make sure your case IDs are consistent across all events. • Clean the data to remove test or cancelled documents. • Use filters like company code, document type, or fiscal year to narrow the scope.