James Pirz
2015-07-30 00:03:17 UTC
Hi,
I am using Hive 1.2, and I am trying to run some queries based on TPCH
schema. My original query is:
SELECT N_NAME, AVERAGE(C_ACCTBAL)
FROM customer JOIN nation
on C_NATIONKEY=N_NATIONKEY
GROUP BY N_NAME;
for which I get:
FAILED: SemanticException [Error 10025]: Line 1:15 Expression not in GROUP
BY key 'C_ACCTBAL'
It does not really make sense, as I am running an aggregation on an
attribute which is not part of the group-by clause, which makes sure that
each group eventually gets one single value for the output. In Hive's
language manual we see that:
( https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy )
" ⊠When using group by clause, the select statement can only include
columns included in the group by clause. Of course, you can have as many
aggregation functions (e.g. count) in the select statement as well."
and the example there is similar to what I have.
I even simplified the query, and dropped the join, but it did not make a
difference:
SELECT C_NATIONKEY, AVERAGE(C_ACCTBAL)
FROM customer
GROUP BY C_NATIONKEY;
FAILED: SemanticException [Error 10025]: Line 1:20 Expression not in GROUP
BY key 'C_ACCTBAL'
Can you please let me know if I am missing something here and this behavior
is expected or not ?
In case you need it, the schema for the tables looks like:
hive> describe customer;
OK
c_custkey int
c_name string
c_address string
c_phone string
c_acctbal double
c_mktsegment string
c_comment string
c_nationkey int
hive> describe nation;
OK
n_nationkey int
n_name string
n_regionkey int
n_comment string
Thanks.
I am using Hive 1.2, and I am trying to run some queries based on TPCH
schema. My original query is:
SELECT N_NAME, AVERAGE(C_ACCTBAL)
FROM customer JOIN nation
on C_NATIONKEY=N_NATIONKEY
GROUP BY N_NAME;
for which I get:
FAILED: SemanticException [Error 10025]: Line 1:15 Expression not in GROUP
BY key 'C_ACCTBAL'
It does not really make sense, as I am running an aggregation on an
attribute which is not part of the group-by clause, which makes sure that
each group eventually gets one single value for the output. In Hive's
language manual we see that:
( https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy )
" ⊠When using group by clause, the select statement can only include
columns included in the group by clause. Of course, you can have as many
aggregation functions (e.g. count) in the select statement as well."
and the example there is similar to what I have.
I even simplified the query, and dropped the join, but it did not make a
difference:
SELECT C_NATIONKEY, AVERAGE(C_ACCTBAL)
FROM customer
GROUP BY C_NATIONKEY;
FAILED: SemanticException [Error 10025]: Line 1:20 Expression not in GROUP
BY key 'C_ACCTBAL'
Can you please let me know if I am missing something here and this behavior
is expected or not ?
In case you need it, the schema for the tables looks like:
hive> describe customer;
OK
c_custkey int
c_name string
c_address string
c_phone string
c_acctbal double
c_mktsegment string
c_comment string
c_nationkey int
hive> describe nation;
OK
n_nationkey int
n_name string
n_regionkey int
n_comment string
Thanks.