Connect Teradata QueryGrid to Azure HDInsight
Many Teradata customers are interested in integrating Vantage with Microsoft Azure first party services. This guide will help you connect Teradata QueryGrid to Azure HDInsight.
Many Teradata customers are interested in integrating Teradata Vantage with Microsoft Azure first party services. This Getting Started Guide will help you connect Teradata QueryGrid to Azure HDInsight service.
Although this approach has been implemented and tested internally, it is offered on an as-is basis. Teradata QueryGrid does support Azure HDInsight and it is highly encouraged to work with Teradata to determine the optimal configuration and architecture. This paper approach does not replace official product documentation.
We encourage your feedback. We want to understand what you found useful and how we can improve this guide. Please send your feedback to rupal.shah@teradata.com.
Disclaimer: This guide includes content from both Microsoft and Teradata product documentation.
This is a diagram of the workflow.
Vantage combines descriptive, predictive, prescriptive analytics, autonomous decision-making, ML functions, and visualization tools into a unified, integrated platform that uncovers real-time business intelligence at scale, no matter where the data resides.
Vantage enables companies to start small and elastically scale compute or storage, paying only for what they use, harnessing low-cost object stores and integrating their analytic workloads.
Vantage supports R, Python, Teradata Studio, and any other SQL-based tools. You can deploy Vantage across public clouds, on-premises, on optimized or commodity infrastructure, or as-a-service.
For more information see documentation.
You will need the following accounts, and systems:
Step 1: Requirements
Before deploying a Teradata ecosystem, your Azure subscription must have sufficient level of permissions assigned with Contributor role at the subscription level and sufficient quota limits.
3. Refresh Subnets to view newly added HDISubnet
4.Configure Storage blade settings – Choose Primary storage account select (New) <HDI Cluster name> and leave other properties default (blank) values and click Next: Security + networking
5. Configure Security + networking blade settings – Choose Virtual network vnet-teradata and the Subnet you created in step 3 (e.g., HDISubnet) and click Next: Configuration + pricing
6. Configure Configuration + pricing blade settings – Change Worker node Number of nodes from 4 to 2 for this article and click Review + create
7. Click Create once validation passes. Expect deployment to take 15-20 minutes.
Create Public IP addresses
4. Refresh Inbound security rules to view new port entry
For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.
For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.
e. Add QueryGrid Manager in the Monitored Systems portlet:First, install a QueryGrid Root Certificate
Next, add QueryGrid Manager in the Monitored Systems portlet
Note: This article will use the QueryGrid software version deployed in the solution template. We will not cover uploading the latest version.
You will need to add the private IP addresses of your Nodes to the respective Data Source Systems created in the previous step. Teradata nodes to the TD Data Source System and HDInsight (Hadoop) nodes (head and worker nodes) to the HDI Data Source System. This will enable the tdqg-node package to auto install across all nodes.
For more information see Teradata® QueryGrid™ Installation and User Guide.
CREATE AUTHORIZATION td_server_db.hiveservice AS DEFINER TRUSTED USER 'hive' PASSWORD 'hive' ;
An authorization object is created in the td_server_db database. Using the DEFINER clause make the authorization available globally to all users.
GRANT CREATE SERVER ON td_server_db TO dbc;
GRANT EXECUTE FUNCTION ON TD_SYSFNLIB TO dbc;
CREATE FOREIGN SERVER target_server_name
EXTERNAL SECURITY DEFINER TRUSTED target_server_auth
USING
LINK('linkname')
VERSION ('version')
DO IMPORT WITH TD_SYSFNLIB.QGInitiatorImport,
DO EXPORT WITH TD_SYSFNLIB.QGInitiatorExport;
For example, where Teradata Database is the initiating system and HDInsight cluster is the target/remote system using the QueryGrid Link definition called TD2HDI.
CREATE FOREIGN SERVER remotehdi
EXTERNAL SECURITY DEFINER TRUSTED hiveservice
USING
LINK('TD2HDI')
VERSION('active')
DO IMPORT WITH TD_SYSFNLIB.QGInitiatorImport,
DO EXPORT WITH TD_SYSFNLIB.QGInitiatorExport;
For more information see Teradata® QueryGrid™ Installation and User Guide.
3. Join data from HDInsight and Vantage
4. Create a View
5. Show Foreign Server - Allows you to see a server object definition that contains the name value pairs that the associated table operators use to connect to the foreign serverSHOW FOREIGN SERVER remotehdi
6. Drop Foreign Server - In addition to deleting the server object and its associated information from the dictionary tables, all dependent entries on the associated table operators are deleted.DROP FOREIGN SERVER TD_SERVER_DB.remotehdi;
Delete Resource Group
Although this approach has been implemented and tested internally, it is offered on an as-is basis. Teradata QueryGrid does support Azure HDInsight and it is highly encouraged to work with Teradata to determine the optimal configuration and architecture. This paper approach does not replace official product documentation.
We encourage your feedback. We want to understand what you found useful and how we can improve this guide. Please send your feedback to rupal.shah@teradata.com.
Disclaimer: This guide includes content from both Microsoft and Teradata product documentation.
Overview
This article describes the minimal steps to connect Teradata QueryGrid to a HDInsight (Hadoop) cluster and query it from Teradata Vantage on Azure. It will highlight deployment requirements for Teradata (DIY) on Azure, Teradata Viewpoint, Teradata QueryGrid and Azure HDInsight. We will use Teradata Viewpoint to configure these services to connect seamlessly between platforms. Finally, we will create a foreign server connection and leverage QueryGrid fabric to execute SQL from Vantage to a HDInsight cluster.This is a diagram of the workflow.
About Azure HDInsight
Azure HDInsight is a Big Data service from Microsoft that brings 100% Apache Hadoop and other popular Big Data solutions to the cloud. A modern, cloud-based data platform that manages data of any type. Whether your data is structured or unstructured, and of any size, HDInsight makes it possible for you to gain the full value of Big Data.
With HDInsight, you can seamlessly process data of all types through Microsoft’s modern data platform. Our platform provides simplicity, ease of management, and an open Enterprise-ready Big Data solution. HDInsight provides a platform for all of your Big Data needs including Batch, Interactive, No SQL and Streaming. It also comes with a strong eco-system of tools and developer environment.
Supported cluster types include: Hadoop (Hive), HBase, Storm, Spark, Kafka, Interactive Hive (LLAP), and ML Services.
For more information see documentation.About Teradata Vantage
Vantage is the platform for Pervasive Data Intelligence; delivering real-time intelligent answers to users and systems across all parts of an organization. It leverages 100 percent of a business’s data, regardless of scale, volume, or complexity. Running Teradata Database software on Azure is similar to running a Teradata Database in your on-site data center.Vantage combines descriptive, predictive, prescriptive analytics, autonomous decision-making, ML functions, and visualization tools into a unified, integrated platform that uncovers real-time business intelligence at scale, no matter where the data resides.
Vantage enables companies to start small and elastically scale compute or storage, paying only for what they use, harnessing low-cost object stores and integrating their analytic workloads.
Vantage supports R, Python, Teradata Studio, and any other SQL-based tools. You can deploy Vantage across public clouds, on-premises, on optimized or commodity infrastructure, or as-a-service.
For more information see documentation.
About Teradata QueryGrid™
Teradata QueryGrid 2.x is a data analytics fabric that provides seamless, high-performing data access, processing, and movement across one or more data sources. Teradata QueryGrid supports the following connectors:- Teradata Database
- Hive
- Spark SQL
- Oracle (only as a target connector)
- Starburst Enterprise Presto
Prerequisites
You are expected to be familiar with Teradata Vantage, Teradata QueryGrid and Azure HDInsightYou will need the following accounts, and systems:
- Azure Pay-as-you-go subscription
- Teradata Vantage (DIY) on Azure instance (version 16.20) or later.
Procedure
Once you have met the prerequisites, follow these steps:- Requirements
- Deploy Teradata (DIY) on Azure with Teradata Viewpoint and Teradata QueryGrid Manager
- Create subnet for Azure HDInsight cluster
- Create Azure HDInsight cluster
- Create public IP address and inbound port
- Configure Teradata QueryGrid Manager
- Configure Teradata Viewpoint
- Configure QueryGrid in the Viewpoint QueryGrid Portlet
- Configuring and using QueryGrid connectors
- Load Data to HDInsight
- Query HDInsight data from Vantage
- Cleanup
Step 1: Requirements
Before deploying a Teradata ecosystem, your Azure subscription must have sufficient level of permissions assigned with Contributor role at the subscription level and sufficient quota limits.
- Logon to Azure portal, click on Subscription icon for quick check. Otherwise, on the left pane click on Cost Management + Billing > Cost Management > Azure subscriptions
- Next, check if your subscription has sufficient quota limits (recommended: 128 cores). On your subscription page, Click Usage + quotas, filter on Select a Provider (e.g., Microsoft Compute) and All locations (e.g., deployment region). If nothing shows up, change Show only items with usage to Show All.
- Click Request Increase if quota limit is not sufficient, click Quota type and choose Compute-VM (cores-vCPUs) subscription limit increases and follow instructions to submit request.
Step 2: Deploy Teradata on Azure (DIY)
A solution template allows you to deploy multiple Teradata products simultaneously. At deployment, you can create a new resource group or use an existing, empty resource group that contains the VNet, VMs, storage accounts, network security groups, and so on.- In the portal, click Create a resource sign located in the upper left and search for Teradata. A list of Teradata products appears under the Results pane.
- Select the Teradata Vantage (DIY) solution template and click Create
- Configure Basic blade settings – Enter new Resource group click Create new, choose Region, enter Password for VM and click Next
- Configure Database blade settings – Enter DBC Password, choose Database Tier: Enterprise and Number of nodes: 2 and click Next
- Configure Viewpoint blade settings – Choose Yes and enter Password and Viewpoint Image version: Multiple Systems and click Next through to QueryGrid Manager blade
- Configure QueryGrid Manager blade settings – Choose Yes and click Next through to General Settings (blade11)
- Configure General Settings – Choose defaults and click Next
- In Review + create blade, click Create after validation.
Note: Unlike Teradata Vantage (DIY) Developer tier, Enterprise tier software is not free and Azure infrastructure for VM, Virtual Network and Storage cost exists.
For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.Step 3: Create Subnet for Azure HDInsight Cluster
Before we deploy HDInsight, we need to create a subnet for HDInsight cluster in the Teradata ecosystem virtual network.- In the portal, open the Teradata resource group or click on Go to resource group
- Click on vnet-teradataSubnets and click + Subnet and add subnet name (e.g., HDISubnet) and leave options at their default values and click OK
3. Refresh Subnets to view newly added HDISubnet
Step 4: Create Azure HDInsight cluster
Next, create HDInsight (Hadoop) cluster with 2 headnodes and 2 workernodes.- In the portal, click Create a resource sign and search for HDInsight
- Select the Azure HDInsight solution template and click Create
- Configure Basics blade settings – Enter new Resource group click Create new, enter Cluster name, choose Region (same as Teradata deployment), select Cluster type and version (e.g., Hadoop 2.7.3/HDI 3.6) and enter Cluster login password and click Next: Storage
4.Configure Storage blade settings – Choose Primary storage account select (New) <HDI Cluster name> and leave other properties default (blank) values and click Next: Security + networking
5. Configure Security + networking blade settings – Choose Virtual network vnet-teradata and the Subnet you created in step 3 (e.g., HDISubnet) and click Next: Configuration + pricing
6. Configure Configuration + pricing blade settings – Change Worker node Number of nodes from 4 to 2 for this article and click Review + create
7. Click Create once validation passes. Expect deployment to take 15-20 minutes.
Step 5: Create public IP address and inbound port
For outside clients to gain access to Teradata Database and VMs you need to create a public IP address to a NIC interface and create an Inbound security rule for port 1025. By default, Teradata on Azure and Teradata products (Teradata Viewpoint and Teradata QueryGrid Manager) does not deploy with a public IP address or inbound port to access the database.Create Public IP addresses
- In the portal, open the Teradata on Azure resource group and click on a NIC interface: database-nic00
- Click on IP configurations and click on an ip Name
- Set Public IP address to Associate
- Under the Choose public IP address dropdown, click Create new
- In the Add a public IP address prompt enter Name and change Assignment to Static
- Click OK and Save
4. Refresh Inbound security rules to view new port entry
Step 6: Configure Teradata QueryGrid Manager
After deploying a QueryGrid Manager VM, you need to logon to the VM to add a user and set a password for the Viewpoint account to access the QueryGrid Manager VM. You will need the QueryGrid Manager VM username and password created during deployment.- Log on to the QueryGrid Manager VM using PuTTY and public IP address created in the previous step.
- Switch to the root user environment.
- Add user ‘tdqgm’ to the QueryGrid Manager user group to prevent permission errors.
- Set the default viewpoint password
For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.
Step 7: Configure Teradata Viewpoint
Before we can configure QueryGrid, we first need to add Teradata and QueryGrid Manager systems to the Viewpoint Monitored Systems portlet.- From your browser, open the Viewpoint portal logon screen using public IP address created in step 4 for Viewpoint VM.
- Log in to the Viewpoint portal as admin and password created during VM deployment.
- Click admin portlet (gear icon) and click on Monitored Systems portlet.
For more information see Teradata Vantage™ on Azure (DIY) Installation and Administration Guide.
e. Add QueryGrid Manager in the Monitored Systems portlet:First, install a QueryGrid Root Certificate
- Click Select Portlet dropdown
- Click the Certificates.
- From the Setup list, click Certificate Authority.
- Click Install Certificate.
- Type an alias (e.g., cert) for the certificate, up to 30 characters.
- Select the A trusted SSL-enabled service option
- Type the private-IP address for hostname of a Teradata QueryGrid Manager instance (e.g.10.0.0.9)
- Type 9443 as the port number.
- Click Install
Next, add QueryGrid Manager in the Monitored Systems portlet
- Click Select Portlet dropdown
- Click Monitored Systems.
- Click + next to Systems and select Add QueryGrid.
- Under General System Details, enter a system nickname (e.g., QGM), up to 8 characters.
- Select the Enable system check box.
- Enter the private IP address for Host ID of the Teradata QueryGrid Manager.
- Under Login, enter the credentials viewpoint and the reset password performed in step 6 to enable Viewpoint to access QueryGrid Manager.
- Click Apply and Close Admin portlet
Step 8: Configure QueryGrid in the Viewpoint QueryGrid Portlet
In this section, we will add the QueryGrid portlet to Viewpoint and configure QueryGrid to access the HDInsight cluster. This will include adding a Data Center, Data Source Systems for Teradata and HDInsight cluster and add appropriate Teradata and HDInsight nodes and QueryGrid software to each Data Source System. Next, define a Fabric and add Connectors for Teradata and HDInsight. Finally, create a Link to define Teradata (as the initiating source) and HDInsight (as the target).Note: This article will use the QueryGrid software version deployed in the solution template. We will not cover uploading the latest version.
- Add QueryGrid portlet to the Viewpoint Portal
- In the portal, click Add Content.
- Locate and click the QueryGrid portlet.
- Click Add.
- Under Fabric Components, select Data Centers.
- (Optional) Click Edit on default Data Center created during QueryGrid Manager deployment to rename and click Save
- Under Fabric Components, select Systems.
- Click + next to Systems.
- At Add System, enter system name (e.g.TD) for Teradata Data Source System
- Select Data center
- Select Node software version deployed from solution template
- Resource Allocation, set Max memory per node to 1GB
- Click Save
- Repeat to add HDInsight Data Source System (e.g., HDI) with same properties
You will need to add the private IP addresses of your Nodes to the respective Data Source Systems created in the previous step. Teradata nodes to the TD Data Source System and HDInsight (Hadoop) nodes (head and worker nodes) to the HDI Data Source System. This will enable the tdqg-node package to auto install across all nodes.
- Under Fabric Components, select Systems.
- Select the Data Source System you want to add nodes to. (e.g., TD)
- On the Nodes tab, click + next to Nodes.
For more information see Teradata® QueryGrid™ Installation and User Guide.
Step 9: Configuring and Using Connectors
For this article and simple demonstration, use any Teradata client tool to create an authorization object, grant privileges and create a foreign server connection to our HDInsight cluster.- Log on as an Administrator, such as dbc, to the initiating Teradata Database system, and create an authorization object for the target server, for example:
CREATE AUTHORIZATION td_server_db.hiveservice AS DEFINER TRUSTED USER 'hive' PASSWORD 'hive' ;
An authorization object is created in the td_server_db database. Using the DEFINER clause make the authorization available globally to all users.
- Grant the CREATE SERVER and EXECUTE FUNCTION privileges on the td_server_db database to the Administrator user, for example:
GRANT CREATE SERVER ON td_server_db TO dbc;
GRANT EXECUTE FUNCTION ON TD_SYSFNLIB TO dbc;
- Create the foreign server connection to HDInsight (Hadoop) cluster
CREATE FOREIGN SERVER target_server_name
EXTERNAL SECURITY DEFINER TRUSTED target_server_auth
USING
LINK('linkname')
VERSION ('version')
DO IMPORT WITH TD_SYSFNLIB.QGInitiatorImport,
DO EXPORT WITH TD_SYSFNLIB.QGInitiatorExport;
For example, where Teradata Database is the initiating system and HDInsight cluster is the target/remote system using the QueryGrid Link definition called TD2HDI.
CREATE FOREIGN SERVER remotehdi
EXTERNAL SECURITY DEFINER TRUSTED hiveservice
USING
LINK('TD2HDI')
VERSION('active')
DO IMPORT WITH TD_SYSFNLIB.QGInitiatorImport,
DO EXPORT WITH TD_SYSFNLIB.QGInitiatorExport;
For more information see Teradata® QueryGrid™ Installation and User Guide.
Step 10: Load Data to HDInsight
Before we can query our HDInsight (Hadoop) cluster, we need to create a table and some data.- In the portal, open HDInsight resource group and click on publicIpheadnode- for the public IP address for your HDI headnode0 VM.
- Log on as sshuser and password using Putty
- Change to root user: sudo su –
- At the prompt use Beeline (Hive server2 command line interface), type:
- create table t_test(id int);
- insert into t_test values(100);
- insert into t_test values(101);
Step 11: Query HDInsight Data from Vantage
Finally, use any Teradata client tool to query HDInsight data.- Log on as dbc to the initiating Teradata Database system using the public IP address
- Insert and Select data from HDInsight
- insert into t_test@remotehdi values(1);
- select * from t_test@remotehdi;
3. Join data from HDInsight and Vantage
- create table td_server_db.localtbl (c1 int)
- insert into td_server_db.localtbl values (200)
- insert into td_server_db.localtbl values (201)
- insert into td_server_db.localtbl values (2)
- select * from t_test@remotehdi union all select * from td_server_db.localtbl
4. Create a View
- create view td_server_db.t_test_v as select * from t_test@remotehdi;
- select * from td_server_db.t_test_v;
5. Show Foreign Server - Allows you to see a server object definition that contains the name value pairs that the associated table operators use to connect to the foreign serverSHOW FOREIGN SERVER remotehdi
6. Drop Foreign Server - In addition to deleting the server object and its associated information from the dictionary tables, all dependent entries on the associated table operators are deleted.DROP FOREIGN SERVER TD_SERVER_DB.remotehdi;
Step 12: Cleanup
To avoid incurring charges to your Azure subscription for the resources used, follow these steps.Delete Resource Group
- Logon in Azure portal, click on Resource groups in left pane.
- Click on Resource group and click on Delete Resource Group in menu
- In Delete Resource group pane type in resource group name to verify deletion.
- Click Delete
- Repeat to delete other Resource Groups
Bleiben Sie auf dem Laufenden
Abonnieren Sie den Blog von Teradata, um wöchentliche Einblicke zu erhalten