Hive: how to show all partitions of a table?

Hadoop Hive

Hadoop Problem Overview

I have a table with 1000+ partitions.

"Show partitions" command only lists a small number of partitions.

How can i show all partitions?

Update:

I found "show partitions" command only lists exactly 500 partitions.
"select ... where ..." only processes the 500 partitions!

Hadoop Solutions

Solution 1 - Hadoop

CLI has some limit when ouput is displayed. I suggest to export output into local file:

$hive -e 'show partitions table;' > partitions

Solution 2 - Hadoop

hive> show partitions table_name;

Solution 3 - Hadoop

Okay, I'm writing this answer by extending wmky's answer above & also, assuming that you've configured mysql for your metastore instead of derby.

select PART_NAME FROM PARTITIONS WHERE TBL_ID=(SELECT TBL_ID FROM TBLS WHERE TBL_NAME='<table_name>');

The above query gives you all possible values of the partition columns.

Example:

hive> desc clicks_fact;
OK
time                	timestamp           	                    
..        	                    
day                 	date                	                    
file_date           	varchar(8)          	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
	 	 
day                 	date                	                    
file_date           	varchar(8)          	                    
Time taken: 1.075 seconds, Fetched: 28 row(s)

I'm going to fetch the values of partition columns.

mysql> select PART_NAME FROM PARTITIONS WHERE TBL_ID=(SELECT TBL_ID FROM TBLS WHERE TBL_NAME='clicks_fact');
+-----------------------------------+
| PART_NAME                         |
+-----------------------------------+
| day=2016-08-16/file_date=20160816 |
| day=2016-08-17/file_date=20160816 |
....
....
| day=2017-09-09/file_date=20170909 |
| day=2017-09-08/file_date=20170909 |
| day=2017-09-09/file_date=20170910 |
| day=2017-09-10/file_date=20170910 |
+-----------------------------------+

1216 rows in set (0.00 sec)

Returns all partition columns.

Note: JOIN table DBS ON DB_ID when there is a DB involved (i.e, when, multiple DB's have same table_name)

Solution 4 - Hadoop

You can see Hive MetaStore tables,Partitions information in table of "PARTITIONS". You could use "TBLS" join "Partition" to query special table partitions.

Solution 5 - Hadoop

Yet another option is to communicate with Hive Metastore via Thrift protocol.
If you write code in python, you may benefit from hmsclient library:

Hive cli:

hive> create table test_table_with_partitions(f1 string, f2 int) partitioned by (dt string);
OK
Time taken: 0.127 seconds

hive> alter table test_table_with_partitions add partition(dt=20210504) partition(dt=20210505);
OK
Time taken: 0.152 seconds

Python cli:

>>> from hmsclient import hmsclient
>>> client = hmsclient.HMSClient(host='hive.metastore.location', port=9083)
>>> with client as c:
...    all_partitions = c.get_partitions(db_name='default',
...                                      tbl_name='test_table_with_partitions', 
...                                      max_parts=24 * 365 * 3)
...
>>> print([{'dt': part.values[0]} for part in all_partitions])
[{'dt': '20210504'}, {'dt': '20210505'}]

NB: max_parts is a parameter than cannot be greater than 32767 (java short max value).

If you have Airflow installed together with apache.hive extra, you create hmsclient quite easy:

hive_hook = HiveMetastoreHook()
with hive_hook.metastore as hive_client:
    ... your code goes here ...

This seems a more valid way of communication with Hive Metastore than accessing DB directly (and database engine agnostic BTW).

Content Type	Original Author	Original Content on Stackoverflow
Question	Kevin Leo	View Question on Stackoverflow
Solution 1 - Hadoop	www	View Answer on Stackoverflow
Solution 2 - Hadoop	Sri	View Answer on Stackoverflow
Solution 3 - Hadoop	Sridhar	View Answer on Stackoverflow
Solution 4 - Hadoop	youchuikai	View Answer on Stackoverflow
Solution 5 - Hadoop	GoodDok	View Answer on Stackoverflow