HIVE_CURSOR_ERROR in Athena when reading parquet files written by pandas

In a recent project, a colleague asked me to look at a HIVE_CURSOR_ERROR in Athena that they weren’t able to get rid of. Since the error message was not incredibly helpful and the way this error appeared is not that uncommon, I thought writing this may help you, dear reader, and future me when I inevitably forget about it again.

zum Artikel gehen

Push-Down-Predicates in Parquet and how to use them to reduce IOPS while reading from S3

Working with datasets in pandas will almost inevitably bring you to the point where your dataset doesn’t fit into memory. Especially parquet is notorious for that since it’s so well compressed and tends to explode in size when read into a data

zum Artikel gehen

Solving Hive Partition Schema Mismatch Errors in Athena

Working with CSV files and Big Data tools such as AWS Glue and Athena can lead to interesting challenges. In this blog I will explain to you how to solve a particular problem that I encountered in a project - the HIVE_PARTITION_SCHEMA_MISMATCH.

zum Artikel gehen

Building QuickSight Datasets with CDK - Athena

In a previous blog post we built QuickSight Datasets by directly loading files from S3. In the wild the data in S3 is often already aggregated or even transformed in Athena. In this new blog post we see how to create a QuickSight Dataset directly relying

zum Artikel gehen

Automating Athena Queries with Python

Automating Athena Queries with Python Introduction Over the last few weeks I’ve been using Amazon Athena quite heavily. For those of you who haven’t encountered it, Athena basically lets you query data stored in various formats on S3 using SQL

zum Artikel gehen

Update your Style in Test Kitchen

It is surprising how many resources on the Internet are carrying on outdated or deprecated information - the Chef ecosystem is no exception to this. While outdated style in Ruby files has been detected via cookstyle for a while, Test Kitchen files still h

zum Artikel gehen