Select
[TOC]
Select Syntax
[WITH CommonTableExpression (, CommonTableExpression)*] (Note: Only available starting with Hive 0.13.0)
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[ORDER BY col_list]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]
select 可以是union query的一部分,也可以是另一个query的subquery
table_reference 标识了query的输入。 可以是一个严格意义上的table, 一个view,一个join查询或者一个子查询
table 名和column 名大小写相同
简单查询
SELECT * FROM t1
从hive0.13.0开始,from可选(如: select 1+1)获取当前的数据库
SELECT current_database()
当指定数据库之后,可以通过指定在table名前家database名 或者使用 USE 来查询不同的database
use database_name; select query_specifications; use default;
WHERE Clause
where是boolean表达式,支持数值操作和UDF
SELECT * FROM sales WHERE amount > 10 AND region = "US"
ALL and DISTINCT Clauses
All 和 DISTINCT 选项用于指明重复的数据是否需要返回。 默认为ALL,
hive> SELECT col1, col2 FROM t1
1 3
1 3
1 4
2 5
hive> SELECT DISTINCT col1, col2 FROM t1
1 3
1 4
2 5
hive> SELECT DISTINCT col1 FROM t1
1
2
Partition Based Queries
table创建分区之后,查询可以在每个分区上运行, 而且只会扫描查询中涉及到的分区。
SELECT page_views.*
FROM page_views
WHERE page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31'
也可以在join的on表达式中使用分区
SELECT page_views.*
FROM page_views JOIN dim_users
ON (page_views.user_id = dim_users.id AND page_views.date >= '2008-03-01' AND page_views.date <= '2008-03-31')
HAVING Clause
从hive0.7.0中开始支持hiving
SELECT col1 FROM t1 GROUP BY col1 HAVING SUM(col2) > 10
与之等同的子查询如下
SELECT col1 FROM (SELECT col1, SUM(col2) AS col2sum FROM t1 GROUP BY col1) t2 WHERE t2.col2sum > 10
LIMIT Clause
决定查询返回的条数, 返回的条数是随机的。
SELECT * FROM t1 LIMIT 5
top K 查询
SET mapred.reduce.tasks = 1 SELECT * FROM sales SORT BY amount DESC LIMIT 5
REGEX Column Specification
在hive0.13.0之前的版本,默认支持column上的正在表达式, 在0.13.0之后,需要配置资源文件 hive.support.quoted.identifiers
为 none
SELECT `(ds|hr)?+.+` FROM sales
Last updated
Was this helpful?