Artikel

Announcing Teradata VantageCloud Lake on Google Cloud

This integration provides the trusted data foundation necessary for robust AI and analytics initiatives.

Daniel Herrera
Daniel Herrera
25. Juni 2024 4 min Lesezeit

As data professionals working on data pipelines, analytics, or artificial intelligence and machine learning (AI/ML) models, our main goal is to build solutions and tools that help our organizations to better serve customers. We work with data from many sources in different formats, often with inconsistencies. Our job is to pull it all together by cleaning and analyzing data and creating models. Taking the models we've developed and making sure they work well in production and can handle large-scale use isn’t easy. 

Simplifying data integration and processing, ensuring easy access to suitable tools for development tasks, and facilitating seamless integration of cutting-edge technologies into our solutions are all factors that enhance our productivity and make our work more enjoyable. 

Teradata VantageCloud Lake, the complete cloud analytics and data platform for AI, is now integrated with Google Cloud. VantageCloud Lake is the most performant data engine for processing both structured and semi-structured data across any data landscape. It includes ClearScape Analytics™, a comprehensive set of tools for data processing and analytics. For those working within the Google ecosystem, this integration brings our powerful lakehouse closer to tools like Vertex AI, Google’s ML development environment, and enables more seamless experimentation with the Gemini family of large language models (LLMs).

Efficient pipelines in a unified and trusted data ecosystem

Everything starts and ends with data. For data engineers, VantageCloud Lake, designed to work with object storage, enables management of uncontrolled data growth, as it allows management of structured and semi-structured data stored with any cloud service provider (CSP) through the lakehouse environment. 

This data can easily be processed and transformed to build the datasets that data scientists and analysts rely on to build models and analyses. These datasets remain in the secure environment of the lakehouse and, subject to networking policies and virtual private cloud (VPC) service controls of specific organizations, are accessible from Google’s tools, like Vertex AI. 

VantageCloud Lake also enables dynamic management of compute resources for more efficient pipelines.

VantageCloud Lake compute groups management console
VantageCloud Lake compute groups management console

  • Compute groups can be configured with different scheduled profiles, which can be suspended or resumed on command, or scheduled to manage ingestion and transformation pipelines and other heavy batch loads.  

VantageCloud Lake compute profile management console
VantageCloud Lake compute profile management console

  • Management of Extraction Load and Transformation (ELT) pipelines is a breeze with VantageCloud Lake due to its integration with tools like Apache Airflow for workflow scheduling, Airbyte for data ingestion, and dbt for data transformations.

Airflow DAG employing the Teradata connector
Airflow Directed Acyclic Graph (DAG) employing the Teradata connector

Accelerated model development and deployment, plus easier lifecycle management 

For data scientists, model development and deployment and lifecycle management is easier with VantageCloud Lake, thanks to ClearScape Analytics, Bring Your Own Model (BYOM) technology for model interchange management, and ModelOps for model lifecycle management. These tools can now be more easily integrated with Vertex AI and the Gemini family of LLMs. 

ClearScape Analytics for end-to-end model development

  • ClearScape Analytics is a comprehensive suite of analytics tools, featuring an extensive collection of in-database analytics functions tailored for AI/ML needs.
  • Data exploration, data cleaning, feature engineering, and model training can be performed in database, significantly reducing compute costs through the VantageCloud Lake environment. 
Data Exploration

Descriptive Statistics
td_columnsummary
td_categoricalsummary
td_univariatestatistics
td_getrowswithoutmissingvalues
td_whichmin
td_whichmax
td_histogram
td_qqnorm

Statistical Tests
td_anova
td_ztest
td_ftest
td_chisq

Path and Pattern
npath
attribution
sessionize

Text Analytics
td_textparser
ngramsplitter
td_wordembeddings
td_naivebayestextclassifiertrainer
naivebayestextclassifierpredict
td_sentimentextractor
td_ifidf

Data Preparation

Handling Outlier
td_getfutilecolumns
td_outlierfilterfit
td_outlierfiltertransform
td_oneclasssvm
td_oneclasssvmpredict

Handling Missing Values
td_getrowswithmissingvalues
td_simpleimputefit
td_simpleimputetransform

Parsing data
td_traintestsplit
td_convertto
td_numapply
td_strapply
td_fillrowid
td_roundcolumns
antiselect
pack
unpack
stringsimilarity
td_unpivoting
td_pivoting

Feature Engineering

Categorical Variable Transform
td_ordinalencodingfit 
td_ordinalencodingtransform 
td_onehotencodingfit 
td_onehotencodingtransform 

Continuous Variable Transform
td_nonlinearcombinefit 
td_nonlinearcombinetransform 
td_scalefit 
td_scaletransform 
td_functionfit 
td_functiontransform 
td_polynomialfeaturefit 
td_polynomialfeaturetransform 
td_rownormalizefit 
td_rownormalizetransform 
td_bincodefit 
td_bincodetransform 
movingaverage

Dimensionality Reduction
td_randomprojectionmincomponents 
td_randomprojectionfit 
td_randomprojectiontransform 

Model Training And Forecasting

Model Training

td_glm
td_decisionforest 
td_kmeans 
td_svm
td_knn
td_xgboost 
td_oneclasssvm 
td_vectordistance

Model Evaluation And Selection

Model Scoring

td_glmpredict 
td_kmeans predict
td_decisionforestpredict 
td_svmpredict 
td_xgboostpredict 
td_oneclasssvmpredict 

Model Evaluation

td_regressionevaluator 
td_classificationevaluator 
td_silhouette 
td_roc

1. Machine learning (ML) functions 

Vertex AI, Compute Engine, and BYOM


ClearScape Analytics in-database analytics functions, BYOM, and Vertex AI-powered AI/ML pipeline
 

  • For ML techniques not available through in-database processing or requiring specialized compute, data preparation can still be performed in database, leveraging the VantageCloud Lake environment, while training can be deferred to Google tools like Vertex AI and Compute Engine 

Teradata Jupyter extensions integrated to Vertex AI Notebooks
Teradata Jupyter extensions integrated with Vertex AI notebooks

 

Gemini integrated with Vertex AI
Gemini integrated with Vertex AI 
 

  • In these scenarios, models can be exported through PMML, ONNX, or other supported standards and deployed in database using BYOM technology. 
  • Models imported through BYOM can be used to easily score data in database, facilitating the use of models in production with the performance at scale of VantageCloud. 
  • Once models are deployed, their lifecycles can be tracked at scale with ModelOps technology. View the video below to learn more.

Experiment and innovate with generative AI 

Google’s catalog of generative AI tools, such as the Gemini family of LLMs, are easy to integrate with data in your data lakehouse to deliver innovative solutions. For example, you can expedite customer service ticket resolution and improve customer service processes by leveraging LLMs for classification of tickets and analysis. View the demo to see it in action.

Conclusion

The integration of VantageCloud Lake with Google Cloud provides the trusted data foundation necessary for robust AI and analytics initiatives for data engineers, data scientists, and organizations in the Google Cloud ecosystem. Whether you’re a data engineer looking to streamline your pipelines or a data scientist aiming to maximize the efficiency of your ML models, VantageCloud Lake offers the tools and capabilities to meet your needs. 

Learn more or request a demo of VantageCloud Lake on Google Cloud


Feedback and questions

We value your insights and questions. Feel free to connect with me on LinkedIn and explore the wealth of helpful resources available on the Teradata Developer Portal, Teradata Developer Community, and Teradata YouTube.

Tags

Über Daniel Herrera

Daniel Herrera is a builder and problem-solver fueled by the opportunity to create tools that aid individuals in extracting valuable insights from data. As a technical product manager, Daniel specialized in data ingestion and extract, transform, and load (ETL) for enterprise applications. He’s actively contributed as a developer, developer advocate, and open-source contributor in the data engineering space. Certified as a Cloud Solutions Architect in Microsoft Azure, his proficiency extends to programming languages including SQL, Python, JavaScript, and Solidity.

Zeige alle Beiträge von Daniel Herrera

Bleiben Sie auf dem Laufenden

Abonnieren Sie den Blog von Teradata, um wöchentliche Einblicke zu erhalten



Ich erkläre mich damit einverstanden, dass mir die Teradata Corporation als Anbieter dieser Website gelegentlich Marketingkommunikations-E-Mails mit Informationen über Produkte, Data Analytics und Einladungen zu Events und Webinaren zusendet. Ich nehme zur Kenntnis, dass ich mein Einverständnis jederzeit widerrufen kann, indem ich auf den Link zum Abbestellen klicke, der sich am Ende jeder von mir erhaltenen E-Mail befindet.

Der Schutz Ihrer Daten ist uns wichtig. Ihre persönlichen Daten werden im Einklang mit der globalen Teradata Datenschutzrichtlinie verarbeitet.