Please or Register to create posts and topics.

Các tính năng của AWS Glue

AWS Glue features

AWS Glue features fall into three major categories:

  • Discover and organize data

  • Transform, prepare, and clean data for analysis

  • Build and monitor data pipelines

Discover and organize data

  • Unify and search across multiple data stores – Store, index, and search across multiple data sources and sinks by cataloging all your data in AWS.

  • Automatically discover data – Use AWS Glue crawlers to automatically infer schema information and integrate it into your AWS Glue Data Catalog.

  • Manage schemas and permissions – Validate and control access to your databases and tables.

  • Connect to a wide variety of data sources – Tap into multiple data sources, both on premises and on AWS, using AWS Glue connections to build your data lake.

Transform, prepare, and clean data for analysis

  • Visually transform data with a drag-and-drop interface – Define your ETL process in the drag-and-drop job editor and automatically generate the code to extract, transform, and load your data.

  • Build complex ETL pipelines with simple job scheduling – Invoke AWS Glue jobs on a schedule, on demand, or based on an event.

  • Clean and transform streaming data in transit – Enable continuous data consumption, and clean and transform it in transit. This makes it available for analysis in seconds in your target data store.

  • Deduplicate and cleanse data with built-in machine learning – Clean and prepare your data for analysis without becoming a machine learning expert by using the FindMatches feature. This feature deduplicates and finds records that are imperfect matches for each other.

  • Built-in job notebooks – AWS Glue job notebooks provide serverless notebooks with minimal setup in AWS Glue so you can get started quickly.

  • Edit, debug, and test ETL code – With AWS Glue interactive sessions, you can interactively explore and prepare data. You can explore, experiment on, and process data interactively using the IDE or notebook of your choice.

  • Define, detect, and remediate sensitive data – AWS Glue sensitive data detection lets you define, identify, and process sensitive data in your data pipeline and in your data lake.

Build and monitor data pipelines

  • Automatically scale based on workload – Dynamically scale resources up and down based on workload. This assigns workers to jobs only when needed.

  • Automate jobs with event-based triggers – Start crawlers or AWS Glue jobs with event-based triggers, and design a chain of dependent jobs and crawlers.

  • Run and monitor jobs – Run AWS Glue jobs with your choice of engine, Spark or Ray. Monitor them with automated monitoring tools, AWS Glue job run insights, and AWS CloudTrail. Improve your monitoring of Spark-backed jobs with the Apache Spark UI.

  • Define workflows for ETL and integration activities – Define workflows for ETL and integration activities for multiple crawlers, jobs, and triggers.

Video sau đây tập trung nói về việc đơn giản hoá công việc thu gôm, làm sạch, và đồng nhất dữ liệu cho doanh nghiệp với công cụ AWS Glue