Introduction
This article covers the following tech skills:
In the year 2150, Earth's resources have been depleted, and humanity has established a thriving metropolis on Mars, known as Martropolis. As an environmental protection officer, your mission is to ensure the sustainability of this futuristic city by analyzing and optimizing resource utilization. One of your primary responsibilities is to leverage the power of Hadoop and Hive to process and analyze vast amounts of environmental data, which will guide your decision-making process.
Your objective is to explore the Hive database, investigate its structure, and gain insights into the data it contains. By mastering the art of describing tables in Hive, you will unlock the secrets hidden within the data, enabling you to make informed decisions that will shape the future of Martropolis and safeguard its delicate ecosystem.
Connect to Hive and List Available Databases
In this step, you will learn how to connect to the Hive environment and list the available databases.
First, ensure you are logged in as the hadoop
user by running the following command in the terminal:
su - hadoop
Now, launch the Hive shell by executing the following command:
hive
Once you're in the Hive shell, you can use the SHOW DATABASES
command to list all available databases.
SHOW DATABASES;
This command will display a list of databases, including the default database.
Example output:
hive> SHOW DATABASES;
OK
default
martropolis
Time taken: 0.528 seconds, Fetched: 2 row(s)
Switch to the 'martropolis' Database
In this step, you will switch to the martropolis
database, which contains the tables relevant to your mission.
USE martropolis;
After executing this command, you will be working within the martropolis
database.
Tip:
martropolis
has been automatically created by the system as a sample database for this lab.
List Tables in the 'martropolis' Database
Now that you're in the martropolis
database, you can list all the tables it contains using the SHOW TABLES
command.
SHOW TABLES;
This command will display a list of tables available in the martropolis
database.
Example output:
hive> SHOW TABLES;
OK
sensor_data
Time taken: 0.028 seconds, Fetched: 1 row(s)
Describe the Structure of a Table
To understand the structure of a table, you can use the DESCRIBE
command followed by the table name.
DESCRIBE sensor_data;
This command will provide detailed information about the table's columns, including column names, data types, and any additional metadata.
Example output:
hive> DESCRIBE sensor_data;
OK
sensor_id int
sensor_name string
reading double
dt string
# Partition Information
# col_name data_type comment
dt string
Time taken: 0.154 seconds, Fetched: 8 row(s)
Explore Table Properties
In addition to the table structure, you can also explore the properties of a table using the DESCRIBE EXTENDED
command.
DESCRIBE EXTENDED sensor_data;
This command will provide more detailed information about the table, including its properties, such as the table type, input and output formats, location, and any other relevant metadata.
Example output:
hive> DESCRIBE EXTENDED sensor_data;
OK
sensor_id int
sensor_name string
reading double
dt string
# Partition Information
# col_name data_type comment
dt string
Detailed Table Information Table(tableName:sensor_data, dbName:martropolis, owner:hadoop, createTime:1711106250, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:sensor_id, type:int, comment:null), FieldSchema(name:sensor_name, type:string, comment:null), FieldSchema(name:reading, type:double, comment:null), FieldSchema(name:dt, type:string, comment:null)], location:hdfs://localhost:9000/user/hive/warehouse/martropolis.db/sensor_data, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[FieldSchema(name:dt, type:string, comment:null)], parameters:{totalSize=49, numRows=2, rawDataSize=47, COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\"}, numFiles=1, numPartitions=1, transient_lastDdlTime=1711106250, bucketing_version=2}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)
Time taken: 0.367 seconds, Fetched: 10 row(s)
Analyze Table Partitions (Optional)
If your tables are partitioned, you can use the SHOW PARTITIONS
command to view the partitions of a specific table.
SHOW PARTITIONS sensor_data;
This command will display a list of partitions for the specified table, along with their corresponding partition values.
Example output:
hive> SHOW PARTITIONS sensor_data;
OK
dt=2023-05-01
Time taken: 0.099 seconds, Fetched: 1 row(s)
Summary
In this lab, you learned how to navigate the Hive environment, switch between databases, list tables, and describe the structure and properties of tables. By mastering these fundamental skills, you have taken the first step towards unlocking the valuable insights hidden within the environmental data of Martropolis.
Through hands-on experience, you gained a deeper understanding of the SHOW DATABASES
, USE
, SHOW TABLES
, DESCRIBE
, DESCRIBE EXTENDED
, and SHOW PARTITIONS
commands. These commands are essential tools for exploring and understanding the organization of data in Hive, enabling you to make informed decisions that will shape the future of Martropolis and safeguard its delicate ecosystem.
🚀 Practice Now: Mars Data Discovery With Hadoop
Want to Learn More?
- 🌳 Learn the latest Hadoop Skill Trees
- 📖 Read More Hadoop Tutorials
- 💬 Join our Discord or tweet us @WeAreLabEx