This lecture talks about:
1) What is Amazon Athena?
2) What Athena helps to analyze: Unstructured, Semi-structured, Columinar formats such as Apache Parquet, Apache ORC.
3) Presto: Athena uses presto to run SQL queries
Lab: Querying VPC Flow Logs in S3 using Athena
This lab helps you create below resources in AWS:
1) 2 S3 buckets
2) 2 VPC's, 2 Subnets, 2 EC2 instances, 2 Internet Gateways
3) create a new database in Athena
create database vpcdb;
4) external table creation syntax:
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
version int,
account string,
interfaceid string,
sourceaddress string,
destinationaddress string,
sourceport int,
destinationport int,
protocol int,
numpackets int,
numbytes bigint,
starttime int,
endtime int,
action string,
logstatus string,
vpcid string,
subnetid string,
instanceid string,
tcpflags int,
type string,
pktsrcaddr string,
pktdstaddr string,
region string,
azid string,
sublocationtype string,
sublocationid string,
pktsrcawsservice string,
pktdstawsservice string,
flowdirection string,
trafficpath string
)
PARTITIONED BY (`date` date)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://athenaworkshop1/AWSLogs/650992375745/vpcflowlogs/ap-southeast-1/'
TBLPROPERTIES ("skip.header.line.count"="1");
5) Partition creation syntax:
ALTER TABLE vpc_flow_logs
ADD PARTITION (`date`='2021-08-07')
location 's3://athenaworkshop1/AWSLogs/650992375745/vpcflowlogs/ap-southeast-1/2021/08/07';
6) Load all partitions
MSCK REPAIR TABLE vpc_flow_logs;
7) Query data
select * from vpc_flow_logs;
select distinct interfaceid from vpc_flow_logs;
select count(*) from vpc_flow_logs;
Lab reference: [ Ссылка ]
****
Ещё видео!