Updated on 2022-06-01 GMT+08:00

Application Development Overview

Hive

Hive is an open-source data warehouse framework built on Hadoop. It stores structured data and provides basic data analysis services using the Hive query language (HiveQL), a language like the structured query language (SQL). Hive converts HiveQL statements to MapReduce or Spark jobs to query and analyze massive amounts of data stored in Hadoop clusters.

Hive provides the following functions:

  • Extracts, transforms, and loads (ETL) data using HiveQL.
  • Analyzes massive amounts of structured data using HiveQL.
  • Supports various data storage formats, such as JSON, CSV, TEXTFILE, RCFILE, ORCFILE, and SEQUENCEFILE, and custom extensions.
  • Provides multiple client connection modes and supports JDBC APIs.

Hive applies to offline massive data analysis (such as log and cluster status analysis), large-scale data mining (such as user behavior analysis, interest region analysis, and region display), and other scenarios.