GBase 8a Migration Plan Based on Netezza (2) - GBase 8a Replacement Project Cases

Cong Li - Jul 29 - - Dev Community

1. Big Data Platform Project

1.1 Project Overview

Project Background

Since 2010, this project has been building the technological foundation for big data analysis and has independently developed an off-site foreign exchange inspection system. This system includes nearly 200 analysis indicators, covering various subjects involved in foreign exchange such as banks, non-bank financial institutions, enterprises, and individuals. It monitors all foreign exchange transactions, including trade in goods and direct investment, as well as their counterparties and handling banks. Through big data analysis, the system can identify suspicious and illegal transactions from massive amounts of foreign exchange data, uncover clues of significant cross-regional foreign exchange violations, and trace the overall trajectory of funds involved in suspicious and illegal transactions, quickly and accurately pinpointing entities that violate foreign exchange laws and regulations. During the special inspection of the compliance operations of foreign exchange businesses conducted at several banks including Industrial and Commercial Bank of China, Agricultural Bank of China, Bank of China, China Construction Bank, Bank of Communications, CITIC Bank, and China Merchants Bank, the focus was on addressing cross-market, cross-industry, and cross-border arbitrage behaviors, using financial derivative transactions by banks on behalf of customers, and innovative foreign exchange business products to evade or violate regulatory requirements.

Requirements Analysis

Banks are the main entities involved in foreign exchange transactions. This project has established a regular inspection mechanism for big data analysis of banks. In recent years, based on a thorough understanding of the systems and database structures related to banks' foreign exchange businesses, a bank data analysis team was formed to extract bank data quarterly and conduct off-site analysis and monitoring of banks' foreign exchange business situations. The goal is to understand the comprehensive positions of banks in foreign exchange settlements, cross-border capital transactions, and other foreign exchange business operations. The key focus is on how to choose suitable software and hardware products to build a big data platform and effectively analyze massive amounts of data.

1.2 Solution

The GBase 8a MPP Cluster product stood out among many alternatives and was selected as the core product for this project's big data platform. It has been implemented in the off-site inspection project for banks, handling foreign exchange data, bank data, and storing about 10 years' worth of full data. Future expansions of GBase products will be based on business needs and will support more complex projects. The main task of the GBase 8a MPP Cluster product used in this phase is to handle data transmitted from external banks, store and manage internal data, and provide timely and effective data query and analysis support for the upper application layer. This phase uses six high-performance servers from the Sugon series, deploying a six-node GBase 8a MPP Cluster database cluster, with three nodes as coordinator management nodes, six nodes as data nodes, and one file server.

Image description

1.3 Application Effectiveness

  • Performance Improvement: The GBase 8a MPP Cluster product fully meets actual business needs, enabling efficient queries of massive amounts of data and greatly enhancing the performance of the foreign exchange business big data platform.
  • High-Quality Service Assurance: The all-in-one product service provides comprehensive assurance for users.

2. Data Warehouse Project for a Commercial Bank

2.1 Project Overview

Project Background

The bank has accumulated a wealth of business data, with the total amount of data growing rapidly, presenting characteristics of massive and fast-growing data resources. The original Netezza data warehouse system has usage bottlenecks and cannot meet the bank's data output needs for development, necessitating new ideas, methods, and technologies to gradually solve the problems. Considering the trend in the banking industry to use MPP architecture databases based on open X86 to build structured data processing platforms and analysis applications, this project adopts an MPP database on X86 servers to replace the original Netezza data warehouse appliance.

Key Issues

The original data warehouse system uses the Netezza data warehouse appliance, with approximately 25TB of available raw data capacity. The replacement aims to enhance data processing capacity, improve batch data processing efficiency, and strengthen the horizontal scalability of the database. The construction also emphasizes building an information security system to improve overall data security.

The main issues with the original data warehouse system are:

  • The original database capacity has reached its limit with increasing business data.
  • The original data warehouse experiences downtime, posing certain security risks.
  • The original Netezza data warehouse is beyond the maintenance period, with untimely technical support responses.

Construction Requirements

The MPP data warehouse platform needs to meet the following requirements:

  • Low Hardware Cost: Fully utilize x86 architecture PC servers without the need for expensive Unix servers and disk arrays.
  • High Scalability and Reliability: Support online expansion and contraction of cluster nodes; backup and disaster recovery capabilities without data loss.
  • Standard Compliance: Meet SQL92 and JDBC, ODBC interface standards.
  • Support X86 and Linux.
  • Technical Advancements: Conform to the current development needs of data warehouses and big data, possessing certain advanced features.

2.2 Solution

This replacement of the data warehouse uses the GBase 8a MPP Cluster for the unified storage, management, information sharing, and data resource services of massive amounts of data, serving as the support for application systems. It establishes different topics for different businesses and builds a complete architecture for data collection, loading, storage, analysis, and application display.

Overall System Architecture

Image description

  • Data Source Layer: Various existing business systems of the bank.
  • Extraction and Loading Layer: Using ETL tools to extract massive amounts of data from source systems for extraction, loading, and transformation operations.
  • Storage Management Layer: Built using GBase 8a MPP Cluster. After cleaning, the data is distributed to each node according to certain rules, establishing the main data warehouse and data marts. The scale of each mart varies depending on the business of the warehouse.
  • Analysis and Display Layer: The bank uses third-party analysis and mining tools to extract data from the data warehouse or data marts for further analysis and load it into corresponding business modules.
  • Application Portal Layer: The bank's internal or external systems organize the required data through middleware and present it via a portal website.

2.3 Application Effectiveness

Implementation Results

This commercial bank deployed two sets of 4-node GBase 8a MPP Cluster, establishing an active-active cluster. The primary database job completion is automatically synchronized to the standby database using synchronization tools. The data migration from Netezza to GBase 8a MPP Cluster, involving approximately 25TB of business data, has been completed and has been running stably for over 400 days.

Benefits and Value

  • Dynamic Expansion: The system has strong scalability, supporting dynamic cluster expansion with performance linearly improving with node addition.
  • Data Migration: Achieved a comprehensive solution for risk-free migration from third-party databases to the GBase 8a MPP Cluster database. The migration process is standardized, simple, and minimizes risks.
  • High Availability: The active-active synchronization mechanism ensures that data in the primary and standby clusters are fully consistent after daily cluster-level batch synchronization, guaranteeing high data and service availability. Even if the primary cluster experiences a failure that cannot be quickly restored, the cluster can be swiftly switched to use the standby database to provide data and services, ensuring high availability.
  • Cost-Effective: GBase 8a MPP Cluster runs on low-cost X86 PC servers, offering high performance and low cost.
  • Visualization and Easy Maintenance: Convenient and user-friendly cluster visualization management tools with comprehensive functions facilitate operations and maintenance staff in managing and maintaining the cluster, greatly enhancing productivity.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .