Discussion:
msck repair table and hive v2.1.0
Stephen Sprague
2016-07-14 04:29:53 UTC
Permalink
hey guys,
i'm using hive version 2.1.0 and i can't seem to get msck repair table to
work. no matter what i try i get the 'ol NPE. I've set the log level to
'DEBUG' but yet i still am not seeing any smoking gun.

would anyone here have any pointers or suggestions to figure out what's
going wrong?

thanks,
Stephen.



hive> create external table foo (a int) partitioned by (date_key bigint)
location 'hdfs:/tmp/foo';
OK
Time taken: 3.359 seconds

hive> msck repair table foo;
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask


from the log...

2016-07-14T04:08:02,431 DEBUG [MSCK-GetPaths-1]:
httpclient.RestStorageService (:()) - Found 13 objects in one batch
2016-07-14T04:08:02,431 DEBUG [MSCK-GetPaths-1]:
httpclient.RestStorageService (:()) - Found 0 common prefixes in one batch
2016-07-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (:()) -
java.lang.NullPointerException
2016-07-14T04:08:02,434 WARN [main]: exec.DDLTask (:()) - Failed to run
metacheck:
org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109)
at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Stephen Sprague
2016-07-15 01:28:46 UTC
Permalink
in the meantime given my tables are in s3 i've written a utility to do a
'aws s3 ls' on the bucket and folder in question, change the folder syntax
to partition syntax and then issued my own 'alter table ... add partition'
for each partition.


so essentially it does what msck repair tables does but in a non-portable
way. oh well. gotta do what ya gotta do.
Post by Stephen Sprague
hey guys,
i'm using hive version 2.1.0 and i can't seem to get msck repair table to
work. no matter what i try i get the 'ol NPE. I've set the log level to
'DEBUG' but yet i still am not seeing any smoking gun.
would anyone here have any pointers or suggestions to figure out what's
going wrong?
thanks,
Stephen.
hive> create external table foo (a int) partitioned by (date_key bigint)
location 'hdfs:/tmp/foo';
OK
Time taken: 3.359 seconds
hive> msck repair table foo;
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
from the log...
httpclient.RestStorageService (:()) - Found 13 objects in one batch
httpclient.RestStorageService (:()) - Found 0 common prefixes in one batch
2016-07-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (:())
- java.lang.NullPointerException
2016-07-14T04:08:02,434 WARN [main]: exec.DDLTask (:()) - Failed to run
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109)
at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Rajesh Balamohan
2016-07-15 01:55:16 UTC
Permalink
Hi Stephen,

Can you try by turning off multi-threaded approach by setting
"hive.mv.files.thread=0"? You mentioned that your tables tables are in s3,
but the external table created was pointing to HDFS. Was that intentional?

~Rajesh.B
Post by Stephen Sprague
in the meantime given my tables are in s3 i've written a utility to do a
'aws s3 ls' on the bucket and folder in question, change the folder syntax
to partition syntax and then issued my own 'alter table ... add partition'
for each partition.
so essentially it does what msck repair tables does but in a non-portable
way. oh well. gotta do what ya gotta do.
Post by Stephen Sprague
hey guys,
i'm using hive version 2.1.0 and i can't seem to get msck repair table to
work. no matter what i try i get the 'ol NPE. I've set the log level to
'DEBUG' but yet i still am not seeing any smoking gun.
would anyone here have any pointers or suggestions to figure out what's
going wrong?
thanks,
Stephen.
hive> create external table foo (a int) partitioned by (date_key bigint)
location 'hdfs:/tmp/foo';
OK
Time taken: 3.359 seconds
hive> msck repair table foo;
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
from the log...
httpclient.RestStorageService (:()) - Found 13 objects in one batch
httpclient.RestStorageService (:()) - Found 0 common prefixes in one batch
2016-07-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker (:())
- java.lang.NullPointerException
2016-07-14T04:08:02,434 WARN [main]: exec.DDLTask (:()) - Failed to run
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109)
at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814)
at
org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
--
~Rajesh.B
Stephen Sprague
2016-07-15 03:26:46 UTC
Permalink
Hi Rajesh,
sure. i'll give that setting a try. thanks.

re: s3 vs. hdfs. indeed. I figured i'd eliminate the s3 angle when posting
here given the msck repair table failed in both cases. but yeah my real use
case is using s3.

ok. just tried that setting and got a slightly different stack trace but
end result still was the NPE.

its a strange one.

Cheers,
Stephen.

2016-07-15T03:13:08,102 DEBUG [main]: parse.ParseDriver (:()) - Parse
Completed
2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Semantic Analysis
Completed
2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Returning Hive
schema: Schema(fieldSchemas:null, properties:null)
2016-07-15T03:13:08,119 INFO [main]: metadata.Hive (:()) - Dumping
metastore api call timing information for : compilation phase
2016-07-15T03:13:08,119 DEBUG [main]: metadata.Hive (:()) - Total time
spent in each metastore function (ms): {isCompatibleWith_(HiveConf, )=0,
getTable_(String, String, )=16, flushCache_()=0}
2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Completed compiling
command(queryId=ubuntu_20160715031308_bdf29227-ee7e-417f-834d-dae397d4eb9b);
Time taken: 0.018 seconds
2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Executing
command(queryId=ubuntu_20160715031308_bdf29227-ee7e-417f-834d-dae397d4eb9b):
msck repair table foo
2016-07-15T03:13:08,119 INFO [main]: ql.Driver (:()) - Starting task
[Stage-0:DDL] in serial mode
2016-07-15T03:13:08,138 DEBUG [main]: ipc.Client (:()) - The ping interval
is 60000 ms.
2016-07-15T03:13:08,138 DEBUG [main]: ipc.Client (:()) - Connecting to /
10.12.15.12:8020
2016-07-15T03:13:08,140 DEBUG [IPC Parameter Sending Thread #3]: ipc.Client
(:()) - IPC Client (1990733619) connection to /10.12.15.12:8020 from ubuntu
sending #35
2016-07-15T03:13:08,138 DEBUG [IPC Client (1990733619) connection to /
10.12.15.12:8020 from ubuntu]: ipc.Client (:()) - IPC Client (1990733619)
connection to /10.12.15.12:8020 from ubuntu: starting, having connections 1
2016-07-15T03:13:08,140 DEBUG [IPC Client (1990733619) connection to /
10.12.15.12:8020 from ubuntu]: ipc.Client (:()) - IPC Client (1990733619)
connection to /10.12.15.12:8020 from ubuntu got value #35
2016-07-15T03:13:08,144 DEBUG [main]: ipc.ProtobufRpcEngine (:()) - Call:
getFileInfo took 7ms
2016-07-15T03:13:08,144 DEBUG [main]: metadata.HiveMetaStoreChecker
(:()) - *Not-using
threaded version of MSCK-GetPaths*
2016-07-15T03:13:08,144 DEBUG [IPC Parameter Sending Thread #3]: ipc.Client
(:()) - IPC Client (1990733619) connection to /10.12.15.12:8020 from ubuntu
sending #36
2016-07-15T03:13:08,145 DEBUG [IPC Client (1990733619) connection to /
10.12.15.12:8020 from ubuntu]: ipc.Client (:()) - IPC Client (1990733619)
connection to /10.12.15.12:8020 from ubuntu got value #36
2016-07-15T03:13:08,145 DEBUG [main]: ipc.ProtobufRpcEngine (:()) - Call:
getListing took 1ms
2016-07-15T03:13:08,146 ERROR [main]: exec.DDLTask (:()) -
java.lang.NullPointerException
at
java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1011)
at
java.util.concurrent.ConcurrentHashMap.put(ConcurrentHashMap.java:1006)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:409)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109)
at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)


table ddl:

CREATE EXTERNAL TABLE `foo`(
`a` int)
PARTITIONED BY (
`date_key` bigint)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.SequenceFileInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat'
LOCATION
'hdfs://10.12.15.12:8020/tmp/foo'
TBLPROPERTIES (
'transient_lastDdlTime'='1468469502')

On Thu, Jul 14, 2016 at 6:55 PM, Rajesh Balamohan <
Post by Rajesh Balamohan
Hi Stephen,
Can you try by turning off multi-threaded approach by setting
"hive.mv.files.thread=0"? You mentioned that your tables tables are in s3,
but the external table created was pointing to HDFS. Was that intentional?
~Rajesh.B
Post by Stephen Sprague
in the meantime given my tables are in s3 i've written a utility to do a
'aws s3 ls' on the bucket and folder in question, change the folder syntax
to partition syntax and then issued my own 'alter table ... add partition'
for each partition.
so essentially it does what msck repair tables does but in a non-portable
way. oh well. gotta do what ya gotta do.
Post by Stephen Sprague
hey guys,
i'm using hive version 2.1.0 and i can't seem to get msck repair table
to work. no matter what i try i get the 'ol NPE. I've set the log level
to 'DEBUG' but yet i still am not seeing any smoking gun.
would anyone here have any pointers or suggestions to figure out what's
going wrong?
thanks,
Stephen.
hive> create external table foo (a int) partitioned by (date_key bigint)
location 'hdfs:/tmp/foo';
OK
Time taken: 3.359 seconds
hive> msck repair table foo;
FAILED: Execution Error, return code 1 from
org.apache.hadoop.hive.ql.exec.DDLTask
from the log...
httpclient.RestStorageService (:()) - Found 13 objects in one batch
httpclient.RestStorageService (:()) - Found 0 common prefixes in one batch
2016-07-14T04:08:02,433 ERROR [main]: metadata.HiveMetaStoreChecker
(:()) - java.lang.NullPointerException
2016-07-14T04:08:02,434 WARN [main]: exec.DDLTask (:()) - Failed to run
java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:444)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:448)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.getAllLeafDirs(HiveMetaStoreChecker.java:388)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:309)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:285)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:230)
at
org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:109)
at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1814)
at
org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:403)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
at
org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at
org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
--
~Rajesh.B
Loading...