However, the map-reduce programming model is very low level and requires developers to write custom programs which are hard to maintain and reuse.
Hive, an open-source data warehousing solution built on top of Hadoop.
Hive supports queries expressed in a SQL-like declarative language - HiveQL, which are compiled into map-reduce jobs executed on Hadoop.
Data Model
Data in Hive is organized into:
- Tables - These are analogous to tables in relational databases. Each table has a corresponding HDFS directory.
- The data in a table is serialized and stored in files within that directory.
- Partitions - Each table can have one or more partitions which determine the distribution of data within sub-directories of the table directory.
- Buckets - Data in each partition may in turn be divided into buckets based on the hash of a column in the table.
Query Language
Hive provides a SQL-like query language called HiveQL which supports select, project, join, aggregate, union all and sub-queries in the from clause.
HiveQL is also very extensible. It supports user de ned column transformation (UDF) and aggregation (UDAF) functions implemented in Java.
Users can embed custom map-reduce scripts written in any language using a simple row-based streaming interface,
HIVE ARCHITECTURE
Hive Architecture
- External Interfaces - Hive provides both user interfaces like command line (CLI) and web UI, and application programming interfaces (API)
- The Metastore is the system catalog
- The Driver manages the life cycle of a HiveQL statement during compilation, optimization and execution.
- The Compiler is invoked by the driver upon receivin a HiveQL statement.
Metastore
- Database - is a namespace for tables.
- Table - Metadata for table contains list of columns and their types, owner, storage and SerDe information.
- Partition - Each partition can have its own columns and SerDe and storage information.
Compiler
- The Parser transforms a query string to a parse tree representation.
- The Logical Plan Generator converts the internal query representation to a logical plan, which consists of a tree of logical operators.
- The Optimizer performs multiple passes over the logical plan
沒有留言:
張貼留言