UPDATE: you can also use YugabyteDB Voyager to move from Oracle to YugabyteDB
Old DBAs have stories about Oracle providing a "SQL*Loader" without any "SQL*Unloader" because Larry Ellison didn't want his customers to move out. This has changed: there's an easy way to export to CSV with a simple set sqlformat csv
in SQLcl. Follow Jeff Smith blog to know more about it.
Here is an example. I wanted to move some sample data from Oracle to YugabyteDB to compare the size. I have an always free Automonous Database, which includes the SSB sample schema. There is a LINEORDER table which is a few hundreds of GB. I'll get the DDL with dbms_metadata
. The only change I had to do was sub(" NUMBER,"," NUMERIC,")
and I disabled constraints, and collation clauses.
Of course, there are professional tools to convert an Oracle schema to PostgreSQL. The good old ora2pg, or AWS SCT which is also great to assess the level of changes required by a migration. But for something quick, I'm good with awk
π
Then the export is easy with set sqlformat csv
and the few settings to output only data like feedback off pagesize 0 long 999999999 verify off
. I pipe all that to awk
which builds the \copy
command that takes these CSV lines as-is. I like to do little steps and then build 10000 lines COPY commands with (NR-data)%10000
, data
being set at the beginning of the COPY command. Sending them in parallel would be easy, but I may not need it because YugabyteDB is multithreaded.
Here is the script I use - I have my Autonomous Database wallet in TNS_ADMIN, SQLcl installed in my home (an Oracle free tier ARM on which I also run my YugabyteDB lab).
{
TNS_ADMIN=/home/opc/wallet_oci_fra ~/sqlcl/bin/sql -s demo/"bsky-social-baena-3gpea"@o21c_tp @ /dev/stdin SSB LINEORDER <<SQL
set feedback off pagesize 0 long 999999999 verify off
whenever sqlerror exit failure
begin
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'SEGMENT_ATTRIBUTES', false);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'STORAGE', false);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'CONSTRAINTS', false);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'REF_CONSTRAINTS', false);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'SQLTERMINATOR', true);
dbms_metadata.set_transform_param(dbms_metadata.session_transform, 'COLLATION_CLAUSE', 'NEVER');
end;
/
set sqlformat default
select dbms_metadata.get_ddl('TABLE','&2','&1') from dual ;
set sqlformat csv
select * from "&1"."&2" ;
SQL
} | awk '
/^ *CREATE TABLE /{
table=$0 ; sub(/^ *CREATE TABLE/,"",table)
print "drop table if exists "table";"
schema=table ; sub(/\"[.]\".*/,"\"",schema)
print "create schema if not exists "schema";"
}
/^"/{
data=NR-1
print "\\copy "table" from stdin with csv header"
}
data<1{
sub(" NUMBER,"," numeric,")
}
{print}
data>0 && (NR-data)%1000000==0{
print "\\."
print "\\copy "table" from stdin with csv"
}
END{
print "\\."
}
'
The output can directly be piped to psql
π
Here is my screen when starting the load:
It is a lab, measuring elapsed time makes not sense, but I looked at rows_inserted
statistics to verify that all is distributed to the 3 nodes of my distributed SQL database. Even with a single client session, the load is distributed on all the cluster.
This works the same for PostgreSQL because it is the same API: YugabyteDB uses PostgreSQL on top of the distributed storage.
All the component in this test are free and easy to use:
- The VM is on Oracle Cloud Free tier (ARM), The Oracle Database is a free Autonomous Database π https://www.oracle.com/cloud/free/
- PostgreSQL is open source and free π https://www.postgresql.org
- YugabyteDB is open source and free π https://www.yugabyte.com