ROW_NUMBER() equivalent in Hive

Discussion:

kumar mr

2013-02-21 07:33:16 UTC

Hi,

This is Kumar, and this is my first question in this group.

I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering.
It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step.
Can we do this in lot simpler way?

Thanks in advance.

Regards,
Kumar

Owen O'Malley

2013-02-21 16:08:01 UTC

Permalink

What are the semantics for ROW_NUMBER? Is it a global row number? Per a
partition? Per a bucket?

-- Owen

Post by kumar mr
Hi,
This is Kumar, and this is my first question in this group.
I have a requirement to implement ROW_NUMBER() from Teradata in Hive
where partitioning happens on multiple columns along with multiple column
ordering.
It can be easily implemented in Hadoop MR, but I have to do in Hive. By
doing in UDF can assign same rank to grouping key considering dataset is
small, but ordering need to be done in prior step.
Can we do this in lot simpler way?
Thanks in advance.
Regards,
Kumar

Ashutosh Chauhan

2013-02-21 16:17:35 UTC

Permalink

Kumar,

If you are willing to be on bleeding edge, this and many other partitioning
and windowing functionality some of us are developing in a branch over at:
https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
Check out this branch, build hive and than you can have row_number()
functionality. Look in
ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
or so example queries demonstrating various capabilities which we have
already working (including row_number).
We hope to have this branch merged in trunk soon.

Hope it helps,
Ashutosh

Stephen Boesch

2013-02-21 20:17:23 UTC

Permalink

Hi Ashutosh,
I am interested / reviewing your windowing feature. Can you be more
specific about which (a) tests and (b) src files constitute your additions
(there are lots of files there ;) )

thanks

stephen boesch

Post by kumar mr
Kumar,
If you are willing to be on bleeding edge, this and many other
partitioning and windowing functionality some of us are developing in a
https://svn.apache.org/repos/asf/hive/branches/ptf-windowing
Check out this branch, build hive and than you can have row_number()
functionality. Look in
ql/src/test/queries/clientpositive/ptf_general_queries.q which has about 60
or so example queries demonstrating various capabilities which we have
already working (including row_number).
We hope to have this branch merged in trunk soon.
Hope it helps,
Ashutosh

Ashutosh Chauhan

2013-02-22 01:44:10 UTC

Permalink

Hi Stephen,

As I indicated in my previous email, check out file ql/src/test/queries/
clientpositive/ptf_general_queries.q it has plenty of example queries
demonstrating the functionality which is available. If you are interested
in hive src changes which has enabled this feature.. you may want to start
by looking at a patch attached on HIVE-896 which was the starting point for
this work. That jira also has links with other jira which we did /are doing
on top of that patch.

Hope it helps,
Ashutosh

Post by Stephen Boesch
Hi Ashutosh,
I am interested / reviewing your windowing feature. Can you be more
specific about which (a) tests and (b) src files constitute your additions
(there are lots of files there ;) )
thanks
stephen boesch

kumar mr

2013-02-21 19:12:45 UTC

Permalink

Owen,

it's for entire table. the sample TD query looks like below,

SELECT
columnA
,columnB
, columnC
, columnD
, columnX
,ROW_NUMBER() OVER (PARTITION BY columnA, columnB, columnC ORDER BY columnX DESC, columnY DESC) AS rank
FROM table a

Regards,
Kumar

-----Original Message-----
From: Owen O'Malley <omalley-1oDqGaOF3Lkdnm+***@public.gmane.org>
To: user <user-***@public.gmane.org>
Sent: Thu, Feb 21, 2013 8:08 am
Subject: Re: ROW_NUMBER() equivalent in Hive

What are the semantics for ROW_NUMBER? Is it a global row number? Per a partition? Per a bucket?

-- Owen

On Wed, Feb 20, 2013 at 11:33 PM, kumar mr <kumar.mr-***@public.gmane.org> wrote:

Hi,

This is Kumar, and this is my first question in this group.

I have a requirement to implement ROW_NUMBER() from Teradata in Hive where partitioning happens on multiple columns along with multiple column ordering.
It can be easily implemented in Hadoop MR, but I have to do in Hive. By doing in UDF can assign same rank to grouping key considering dataset is small, but ordering need to be done in prior step.
Can we do this in lot simpler way?

Thanks in advance.

Regards,
Kumar

Continue reading on narkive:

Search results for 'ROW_NUMBER() equivalent in Hive' (Questions and Answers)

187

replies

What are some tips for people trying to quit smoking?

started 2012-11-12 12:50:20 UTC

current events

replies

what is chupacabra?

started 2007-03-24 19:53:20 UTC

zoology

replies

can you explain how acid rain can cause damage to the environment?

started 2013-06-16 09:26:41 UTC

botany

replies

I want to know fun random facts!!!?

started 2008-02-11 14:46:22 UTC

entertainment & music