Python Web Scraping in SAP Business Objects CMC: Get the List of Schedules

Luca Liu - Dec 12 '23 - - Dev Community

Introduction

Recently, I faced a mission: organizing all SAP Business Objects schedules into an Excel file. The manual process was tediousโ€”copying each schedule name and details and pasting them into Excel. With hundreds of schedules, it meant endless copy-pasting.

I scoured the internet for solutions and found that the primary solution is Query Builder. However, exporting the Power Builder results from the web into an Excel file is still challenging.

I also tried the SAP Business Objects API, it could fetch all schedules, but the number of recurring schedules is not the same as what we saw in SAP BO CMC.

So, I turned to Python, and it works! If you're grappling with the same challenge, my code might be the solution you're seeking.

Importing Libraries

import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
import time
Enter fullscreen mode Exit fullscreen mode

Here, itโ€™s importing necessary libraries. pandas for data manipulation, selenium for web scraping, BeautifulSoup for HTML parsing, and time for adding pauses between operations.

Setting up SAP BO Connection Details

bo_cmc = 'http://localhost:port/BOE/CMC'
bo_user_name = 'username'
bo_password = 'password'
Enter fullscreen mode Exit fullscreen mode

It defines the SAP BusinessObjects Central Management Console (CMC) URL and login credentials.

Function to Get Soup (HTML Content)

def get_soup(driver, xpath):
    element = driver.find_element(By.XPATH, xpath)
    soup = BeautifulSoup(element.get_attribute("innerHTML"), 'lxml')
    soup.prettify()
    return soup
Enter fullscreen mode Exit fullscreen mode

This function uses Selenium to locate an HTML element using XPath, extracts its inner HTML, and then creates a BeautifulSoup object for parsing.

Functions to Convert HTML to List and DataFrame

def bo_instances_html_to_list(soup):
    n = int(len(soup.select('tr td div')) / 16)
    values = [[] * n for _ in range(n)]
    for i in range(n):
        for j in range(16):
            values[i].append(soup.select('tr td div')[i * 16 + j].text)
    return values

def bo_values_to_df(values):
    df = pd.DataFrame(values)
    df.columns = ['', 'title', 'type', 'status', 'location', 'owner', 'completion_time', 'next_run_time',
                  'submission_time',
                  'start_time', 'duration', 'recurrence', 'expiry', 'server', 'error', 'title2']
    return df

def bo_instance_to_df(soup):
    values = bo_instances_html_to_list(soup)
    df = bo_values_to_df(values)
    return df
Enter fullscreen mode Exit fullscreen mode

These functions convert HTML content of SAP BO instances into a list and then into a Pandas DataFrame.

Selenium Automation Function

def bo_selenium_to_df():
    options = webdriver.ChromeOptions()
    prefs = {'profile.default_content_settings.popups': 0}
    options.add_experimental_option('prefs', prefs)
    options.add_experimental_option("detach", True)
    driver = webdriver.Chrome(options=options)
    driver.get(bo_cmc)
    driver.switch_to.frame("servletBridgeIframe")
    time.sleep(3)

    username_input = driver.find_element(By.XPATH, '//*[@id="_id2:logon:USERNAME"]')
    password_input = driver.find_element(By.XPATH, '//*[@id="_id2:logon:PASSWORD"]')

    time.sleep(1)
    username_input.send_keys(bo_user_name)
    time.sleep(1)
    password_input.send_keys(bo_password)
    time.sleep(1)

    login_button = driver.find_element(By.XPATH, '//*[@id="_id2:logon:logonButton"]')
    time.sleep(1)
    login_button.click()

    time.sleep(5)
    html_text = driver.execute_script("return document.documentElement.outerHTML")
    contentFrame = driver.find_element(By.XPATH, '//*[@id="contentFrame"]')
    driver.switch_to.frame(contentFrame)
    time.sleep(2)

    innerContent = driver.find_element(By.XPATH, '//*[@id="innerContent"]')
    driver.switch_to.frame(innerContent)
    time.sleep(2)

    instance_manager_href = driver.find_element(By.XPATH, '//*[@id="manageList"]/li[1]/a[2]')
    instance_manager_href.click()
    time.sleep(1)

    # instancemanager
    ## switch to innerContent iframe
    html_page = driver.execute_script("return document.documentElement.outerHTML")
    contentFrame = driver.find_element(By.XPATH, '//*[@id="contentFrame"]')
    driver.switch_to.frame(contentFrame)
    time.sleep(1)

    innerContent = driver.find_element(By.XPATH, '//*[@id="innerContent"]')
    driver.switch_to.frame(innerContent)
    time.sleep(1)

    frames = []

    soup = get_soup(driver, xpath='//*[@id="UniversalRepositoryExplorer_detailView_mainTableBody"]')
    n = int(len(soup.select('tr td div')) / 16)
    df_start = bo_instance_to_df(soup=soup)
    frames.append(df_start)

    # In SAP BusinessObjects (SAP BO), schedules default to displaying 50 schedules per page.
    while n == 50:
        next_page_button = driver.find_element(By.XPATH, '//*[@id="UniversalRepositoryExplorer_goForwardButton"]')
        next_page_button.click()
        time.sleep(2)
        soup = get_soup(driver, xpath='//*[@id="UniversalRepositoryExplorer_detailView_mainTableBody"]')
        df = bo_instance_to_df(soup=soup)
        frames.append(df)
        n = int(len(soup.select('tr td div')) / 16)

    df = pd.concat(frames, axis=0)
    driver.close()
    drop_column = ['', 'completion_time', 'start_time', 'duration', 'server', 'error', 'title2']

    for column in drop_column:
        df = df.drop(column, axis=1)

    df['index'] = range(1, len(df) + 1)
    df = df[['index', 'title', 'type', 'status', 'location', 'owner', 'next_run_time',
             'submission_time', 'recurrence', 'expiry']]
    return df
Enter fullscreen mode Exit fullscreen mode

This function uses Selenium to automate the process of logging into SAP BO, navigating to the schedule manager, extracting schedule details, and finally converting them into a Pandas DataFrame.

Executing the Selenium Function

df = bo_selenium_to_df()
Enter fullscreen mode Exit fullscreen mode

This line calls the Selenium function and stores the resulting DataFrame in the variable df.


Explore more

Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.

๐Ÿš€ Connect with me on LinkedIn

๐ŸŽƒ Connect with me on X

๐ŸŒ Connect with me on Instagram

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .