Querying An AS400 Using SQL Server 2019 Data Virtualization

 

For those who have taken an interest in the latest SQL Server release, SQL Server 2019, you may have noticed a new feature called Data Virtualization. Perhaps you’ve seen a presentation about it, or maybe you’ve read about it in Bob Ward’s book entitled “SQL Server 2019 Revealed.”

I myself was curious about this feature and so I set out to see if I could use it to connect to an AS400 iSeries machine. Tom Willwerth, my friend at StraightPathSQL, and I worked on this process together. This post is primarily about walking you through the steps we recently took to get this configured.

SQL Server Polybase and SQL Server 2019 Data Virtualization

Before I dive into that, why was I interested in this feature? What did I hope to gain? Well, first of all, there was definitely the motivation of wondering, “Can we get this to work?” Secondly, and more practically, the promise of SQL Server 2019 Data Virtualization is to make other data sources available without using a Linked Server and without the time it takes to develop an ETL process to move the data. On a related note, you can cut out the time it takes for an ETL job to actually move the data somewhere like a data warehouse or flattened tables for reporting. Third, the Polybase feature has a built in engine that can provide query performance not available via Linked Servers. Fourth, I wanted to provide a way for developers to query data in the AS400 without having to learn the different syntax required by the AS400 iSeries. Fifth, query writers can also join the external able to local SQL Server data.

How To Set Up SQL Server 2019 Data Virtualiztion

Now let’s move forward with the how of this process. I started by installing SQL Server 2019, and in that process there are some steps to select for installing the Polybase feature. If you’re not familiar with that, it came out in SQL Server 2016 and is the foundation of data virtualization. During the install process, SQL Server suggested a port range that it would use as part of configuring Polybase. I wasn’t sure if SQL Server knew that the suggested port range was open or not. As a result, I actually stopped to download the SysInternals tool called TCPView and run it on the server to confirm that the suggested port range was open.

After the SQL Server install completed, I ran the code below in SSMS to enable the Polybase feature.

exec sp_configure 'Show Advanced Options', 1
reconfigure
EXEC sp_configure 'polybase enabled', 1
reconfigure
EXEC sp_configure 'polybase enabled'

 

The next thing I did was create a database master key. This has to exist in order to encrypt the credential that will be stored in SQL Server later on in the process.

CREATE MASTER KEY WITH ENCRYPTION BY PASSWORD = 'S0meStr0ngP@55w0rd1234xYz!'

 

I then set my test database to compatibility level 150 for SQL Server 2019 and made my database scoped credential, like so.

ALTER DATABASE MyTestSQLDatabase SET COMPATIBILITY_LEVEL = 150

CREATE DATABASE SCOPED CREDENTIAL MySQLCred WITH IDENTITY = 'MYAS400CRED', SECRET = 'S0m3Str0ngP@55w0rd';

The value for IDENTITY corresponds to a user in the source database, in this case, the AS400. SECRET, of course, is the password for the user name stored in IDENTITY.

I believe after this Tom joined the process and we began together by installing version 1.8 of the 64 bit iSeries Client Access driver for Windows onto the SQL Server instance I was testing from. Going into the details of that are a bit beyond the scope of this post so I won’t describe the process here.

Creating an external data source was the next part of the journey. Syntax for that is here.

After a lot of wrangling, searching, and looking here for an AS400 connection string example, Tom and I arrived at this code sample that we ran successfully. Obviously I’ve changed some details to anonymize this, but you get the idea.

CREATE EXTERNAL DATA SOURCE MYAS400DEV WITH (LOCATION = 'odbc://MYAS400DEV.LeeMarkum.com', CONNECTION_OPTIONS = 'Driver={iSeries Access ODBC Driver}; System = MYAS400DEV'; ServerNode = MYAS400DEV.LeeMarkum.com', PUSHDOWN = ON, CREDENTIAL = MySQLCred);

Next up we took the optional step of creating a schema called EXT for holding our external tables. Afterward, we moved on to the business of actually making our first external table. This was very much a multi-step, trial and error process.

Here is the process we eventually found that worked:

  • Get SQL definition from AS400 via the iSystem Navigator, minus trigger definitions, and paste into SSMS
  • Remove any index or key definitions. Without this change SQL Server throws an error
  • Remove any defined defaults. Without this change SQL Server throws an error
  • Change the CHAR data types to NCHAR. Without this change SQL Server throws an error
  • Replace DB2 schema name with EXT, our previously created SQL Server schema.
  • Change CREATE TABLE  to CREATE EXTERNAL TABLE
  • Add this at the bottom: WITH ( LOCATION=’MYAS400DEV.SomeSchemaName.DB2TABLENAMEHERE’, DATA_SOURCE= MYDEVAS400)

This left us with code like the below:

CREATE EXTERNAL TABLE EXT.Customer 
(CUST# NUMERIC(7,0)NOT NULL,
CUSTNAME NCHAR(30) NOT NULL,
CUSTADDRS NCHAR(50) NOT NULL,
CUSTCITY NCHAR(30) NOT NULL,
CUSTSTATE NCHAR(2) NOT NULL,
CUSTZIP NCHAR(2) NOT NULL
)
WITH ( LOCATION='MYAS400DEV.SomeSchemaName.DB2TABLENAMEHERE', DATA_SOURCE= MYDEVAS400)

As part of this process, we had to keep the same column names as what existed in the AS400. Otherwise, an error was thrown, and this makes sense given that we’re looking at the AS400 table in the WITH (LOCATION = ”, data_source = ) part of the syntax.

After this step there is the optional step of creating statistics on columns of the virtual table. You might want to do this, for instance, on columns that might be involved in joins as a performance enhancement. For the purposes of this work, I skipped this step for the time being because Tom and I just wanted to see some query results from all this work! We could return later to this item to try to tweak performance.

The moment of truth had finally arrived. We loaded up a query against our EXT.Customer table in SQL Server 2019 and, to our delight, we received the expected result set back using standard T-SQL.

I’m not gonna lie, this was a fair amount of work and took some time to set up. However, I’m hoping this blog post, and others like it, will save you some pain and time.

Next Steps To Take

  1. Get a copy of SQL Server 2019 Developer Edition and install it and the Polybase feature. Then enable Polybase.
  2. Identify a data source in your environment that you would like to use data virtualization to query. It could be Oracle, Microsoft Access, PostGres, MySQL or any number of things.
  3. Look through the available documentation on the connection strings for the data source you’ve chosen and figure out what the connection string should look like.
  4. Look through the rest of the walk-through in this article and give the feature a try.
  5. If you would like help with anything in this post, or with something else related to SQL Server, reach out to me here, or on Twitter, and I’ll be glad to offer assistance.
  • Hi Lee,

    do you see any adoption of Polybase?
    I’m looking forward to test it against a DB2 but I’m concerned that Polybase will not be maintained in the long term.

    Thank you

    • Lee says:

      Hi Francesco, I’m not sure about adoption rates for polybase. I do think that Microsoft is going to keep investing in the technology. Polybase was introduced in SQL Server 2014 and I believe had some enhancements on 2017 and some very significant enhancements in SQL Server 2019. With the announcement of SQL Server 2022 at MS Ignite, I’m interested to see what information about Polybase comes out over the next few weeks and months.

  • Thank you,

    I’m new to Polybase. Does it means that the table is actually on SQL Server or is that just a temp table that get’s filled up every time we query the AS400?

    And then how is it purged?

    • Lee says:

      Hi Francesco. Data Virtualization creates a table on the SQL Server instance that has the same columns and data types as the table on the target, in this case, the AS400. When a person queries this virtual table in SQL Server, a driver is used to pass that query on to the AS400 and retrieve results. The table on the SQL Server table is not populated with data.

  • […] Lee Markum proves you can read data from an AS400 via PolyBase: […]

  • >