hive doesn't change parquet schema

This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
mike123
Posts: 68
Joined: Fri Sep 15, 2017 1:44 am
Contact:

hive doesn't change parquet schema

Postby mike123 » Thu Dec 21, 2017 6:46 pm

Unable to query Hive parquet table after altering column type

I have external table with parquet file.
in hive it is working.

Code: Select all

Hive>select * from table name limit 5;


but in impala it is not working.

Code: Select all

impala-shell>>select * from table name limit 5;

ERROR: File 'hdfs://localhost.com:8020/orclmigration/tablename/year=2017/month=9/day=7/000005_0' has an incompatible type with the table schema for column 'star_singer'.  Expected type: INT64.  Actual type: BYTE_ARRAY


tomwaugh

Re: hive doesn't change parquet schema

Postby tomwaugh » Thu Dec 21, 2017 6:51 pm

it looks like it is working in Hive but not in Impala.

query just change metadata. The underlying files on HDFS remain unchanged. Since the parquet metadata is embedded in the files , they have no idea what the metadata change has been. Hence we see this issue. Expected type: INT64. Actual type: BYTE_ARRAY

ALTER TABLE tablename CHANGE star_singer star_singer bigint;

Guest

Re: hive doesn't change parquet schema

Postby Guest » Thu Dec 21, 2017 6:55 pm

I think you should perform below steps:

1) create temp table from original table.
2) drop original table and recreate with changed data type
3) insert data from temp table to original table

mike123
Posts: 68
Joined: Fri Sep 15, 2017 1:44 am
Contact:

Re: hive doesn't change parquet schema

Postby mike123 » Thu Dec 21, 2017 7:07 pm

Set below parameter if you delete/add the column and face any issue. It will only work in higher version.
I see in CDH 5.10 delete/adding column updating perfectly without doing anything. Just altering table may cause issue due to external file system.

PARQUET_FALLBACK_SCHEMA_RESOLUTION=name



Return to “Hadoop and Big Data”

Who is online

Users browsing this forum: No registered users and 1 guest