Migrating DATE columns from Oracle to Redshift

Oracle’s DATE type is both a date and a time down to the second level.

Redshift gives you a choice between a true DATE that is only a date, with no time, and a TIMESTAMP that goes down to a microsecond.

When migrating an Oracle database to Redshift, you can simply convert all the DATE columns to TIMESTAMP. (Keeping them as DATE columns will syntactically work, but possibly lose information.) For the Oracle columns which are truly dates, a TIMESTAMP column would be wider than they need to be, which breaks Redshift best practices.

So it’s good to introspect the actual column data and determine which columns should remain DATE and which should be replaced with TIMESTAMP. (It’s better to design your Oracle schema to limit DATE types to columns where time doesn’t matter, to avoid this issue in the first place.)

I wrote a script in PL/SQL to introspect data to determine which DATE columns are actually datetimes / timestamps and which are really dates. You could use this to improve your Oracle schema even if you’re not migrating to Redshift, to change DATE columns to TIMESTAMP where you care about time. That would make the schema more self-descriptive - a column like event_dttm DATE should be an anti-pattern!

There are at least two things that I don’t like about this script and I would do better if I were spending more time on it – see if you can spot them:

declare
    cursor c is (
        select * from all_tab_columns where owner = 'YOUR_SCHEMA' and data_type='DATE'
    );
    x int;
begin
    for r in c
    loop
        execute immediate
            'select case when exists ' ||
            '   (select 1 from ' ||  r.owner || '.' || r.table_name ||
            '       where ' || r.column_name || ' <> trunc(' || r.column_name || ')' ||
            '   ) then 1 else 0 end as foo from dual'
            into x;
        if x > 0 then
            dbms_output.put_line(r.table_name || '.' || r.column_name || ' should be timestamp');
        else
            dbms_output.put_line(r.table_name || '.' || r.column_name || ' is OK as a date');
        end if;
    end loop;
end;
/

What I would do better next time

Be more intelligent about inferring intent; even if col <> trunc(col) for a given date column, maybe all the time components are still identical. (Perhaps due to migrating from another environment in a different timezone)
group columns together to scan each table only once, if there are multiple date columns on the same table
make this a table-valued function so you can SELECT the output as discrete fields rather than parsing it from dbms_output (that would also make it reusable and un-hard-code the schema name)
do it in Spark instead of PL/SQL, so we can do parallel scans on each table and generate dynamic SQL more cleanly; although that might actually perform worse when taking into account pulling the data over the wire and needing a Spark cluster to run it on (or spilling to disk if running locally and/or without enough memory)

Written on February 27, 2019

Bill Schneider

where I write about software engineering. All opinions are my own

Migrating DATE columns from Oracle to Redshift