Discussion:
current_date function in hive
Ayazur Rehman
2015-06-02 14:19:29 UTC
Permalink
Hi everyone,

I am trying to schedule a hive query using Oozie, to perform aggregation on
a table on data of a particular day and save the results in another table
whenever every 24 hours.

the schema of my table is something like (tablename - currenttable)
id string
cdatetime timestamp
average int
locations array<string>
color string

And currently the query that I perform manually everyday is something like

insert into table lotable select id, lv, cdatetime, color, count(color)
from currenttable lateral view explode(locations) lvtable as lv where
to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;

So, in order to automate the process I want to use a date function that
would let hive aggregate on the data of the previous day.
I tried using current_date function but I can't get the syntax right. I get
the following error
FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
function 'current_date'

Could you please help me with the syntax.
--
Thanking You,
Ayaz
DU DU
2015-06-02 14:24:52 UTC
Permalink
You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))
Post by Ayazur Rehman
Hi everyone,
I am trying to schedule a hive query using Oozie, to perform aggregation
on a table on data of a particular day and save the results in another
table whenever every 24 hours.
the schema of my table is something like (tablename - currenttable)
id string
cdatetime timestamp
average int
locations array<string>
color string
And currently the query that I perform manually everyday is something like
insert into table lotable select id, lv, cdatetime, color, count(color)
from currenttable lateral view explode(locations) lvtable as lv where
to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;
So, in order to automate the process I want to use a date function that
would let hive aggregate on the data of the previous day.
I tried using current_date function but I can't get the syntax right. I
get the following error
FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
function 'current_date'
Could you please help me with the syntax.
--
Thanking You,
Ayaz
--
Thanks,
Will
Gopal Vijayaraghavan
2015-06-02 17:30:02 UTC
Permalink
Hi,
Post by DU DU
You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))
That would be a very bad idea for query correctness.

The current_date UDF was introduced because of that particular
anti-pattern.

The unix_timestamp() is evaluated when a row is read, so each row gets a
slightly different timestamp when doing ETLs & I have seen it give
completely incorrect results when used near midnight (i.e query kicked off
at 11:59, executed at 00:01 next day).

Plus map-reduce falure handling mandates (logically) identical outputs on
map task retries or speculation.

I¹ve recently pulled out the current_timestamp() impl out into a reusable
UDF
recently so that I can use that in hive-1.0.x installs as well.

https://github.com/t3rmin4t0r/current-timestamp

You can use that UDF JAR with hive-1.0+, the constant folding will replace
the value in-place during compilation.

Cheers,
Gopal
Ayazur Rehman
2015-06-03 17:57:23 UTC
Permalink
Hi everyone,

Thanks for the responses. My problem is solved now.
I was not using the proper version of hive to be able to use the
current_date function.
Thanks Gopal for the UDF link.
I beleive I will have to add the udf to the hiveserver2 in order to be able
to run the query using Oozie.
Post by Gopal Vijayaraghavan
Hi,
Post by DU DU
You may try to_date(FROM_UNIXTIME(UNIX_TIMESTAMP()))
That would be a very bad idea for query correctness.
The current_date UDF was introduced because of that particular
anti-pattern.
The unix_timestamp() is evaluated when a row is read, so each row gets a
slightly different timestamp when doing ETLs & I have seen it give
completely incorrect results when used near midnight (i.e query kicked off
at 11:59, executed at 00:01 next day).
Plus map-reduce falure handling mandates (logically) identical outputs on
map task retries or speculation.
I¹ve recently pulled out the current_timestamp() impl out into a reusable
UDF
recently so that I can use that in hive-1.0.x installs as well.
https://github.com/t3rmin4t0r/current-timestamp
You can use that UDF JAR with hive-1.0+, the constant folding will replace
the value in-place during compilation.
Cheers,
Gopal
--
Thanking You,
Ayazur Rehman
+1 720 917 8540
Bhagwan S. Soni
2015-06-02 14:29:00 UTC
Permalink
Use "from_unixtime(unix_timestamp())". This might help you to get what you
want.
You may have to split date and time because this function will returns
TIMESTAMP.
Post by Ayazur Rehman
Hi everyone,
I am trying to schedule a hive query using Oozie, to perform aggregation
on a table on data of a particular day and save the results in another
table whenever every 24 hours.
the schema of my table is something like (tablename - currenttable)
id string
cdatetime timestamp
average int
locations array<string>
color string
And currently the query that I perform manually everyday is something like
insert into table lotable select id, lv, cdatetime, color, count(color)
from currenttable lateral view explode(locations) lvtable as lv where
to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;
So, in order to automate the process I want to use a date function that
would let hive aggregate on the data of the previous day.
I tried using current_date function but I can't get the syntax right. I
get the following error
FAILED: SemanticException [Error 10011]: Line 1:47 Invalid
function 'current_date'
Could you please help me with the syntax.
--
Thanking You,
Ayaz
Mich Talebzadeh
2015-06-02 17:26:39 UTC
Permalink
How about using individual data functions on crdatetime (in my case op_time)



Something like



INSERT OVERWRITE TABLE tmp

SELECT rs.op_type, rs.year, rs.Month, rs.Day, rs.Total_Rows

FROM

(

SELECT

op_type

, YEAR(op_time) As Year

, MONTH(op_time) AS Month

, DAY(op_time) AS Day

, count(*) AS Total_Rows

FROM t

GROUP BY

op_type

, YEAR(op_time)

, MONTH(op_time)

, DAY(op_time)

) rs





HTH



Mich Talebzadeh



http://talebzadehmich.wordpress.com



Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.

co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4

Publications due shortly:

Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and Coherence Cache

Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly



NOTE: The information in this email is proprietary and confidential. This message is for the designated recipient only, if you are not the intended recipient, you should destroy it immediately. Any information in this message shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries or their employees, unless expressly so stated. It is the responsibility of the recipient to ensure that this email is virus free, therefore neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.



From: Ayazur Rehman [mailto:***@gmail.com]
Sent: 02 June 2015 15:19
To: ***@hive.apache.org
Subject: current_date function in hive



Hi everyone,



I am trying to schedule a hive query using Oozie, to perform aggregation on a table on data of a particular day and save the results in another table whenever every 24 hours.



the schema of my table is something like (tablename - currenttable)

id string

cdatetime timestamp

average int

locations array<string>

color string



And currently the query that I perform manually everyday is something like



insert into table lotable select id, lv, cdatetime, color, count(color) from currenttable lateral view explode(locations) lvtable as lv where to_date(cdatetime)='2015-06-01' group by cdatetime, color, lv, id;



So, in order to automate the process I want to use a date function that would let hive aggregate on the data of the previous day.

I tried using current_date function but I can't get the syntax right. I get the following error

FAILED: SemanticException [Error 10011]: Line 1:47 Invalid function 'current_date'



Could you please help me with the syntax.
--
Thanking You,

Ayaz
Continue reading on narkive:
Loading...