Friday, May 20, 2016

AdTech Storage on Apache HBase, and Apache Phoenix


In this blog, lets talk about  current data platform that uses Apache Phoenix over Apache HBase as a persistent storage for adtech objects


Why HBase and Phoenix?

Apache HBase is well known for its extremely good horizontal scale and strong consistency on atomic row level updates. In addition HBase provides very fast (~ a few ms to hundreds of ms) query performance over very large data set (i.e. ~ >10 billion rows in a table) with proper row key design.  

Even though HBase provides wonderful features, it still introduces considerable challenges and overhead to our developers when used as raw data storage. The low-level and verbose HBase API and byte array oriented schema design is difficult to master and use efficiently. With these challenges in mind, evaluated  a few open source solutions and chosen.


Apache Phoenix strikes a good balances well between supporting complex use cases and consistent horizontal scaling capability for united data sets. It provides a JDBC abstraction layer on top of HBase. In addition it provides richer schema support, secondary index (both global and local index) support, SQL-compliant query language, atomically auto increase sequence, and many other features are still being explored.


With Apache Phoenix and HBase chosen as a storage , can provide a managed REST API layer, Data Services , to support variety of API clients, such as Python, PHP,  Angular, JavaScript/Node.Js. In addition to providing REST layer over Phoenix, it  also provides streaming changes to Apache Kafka topics (For audit and streaming analytic insights, etc.), authentication and authorization system, and metadata discovery and caching facilities.

System Data Flow
Diagram that illustrates a high-level system data flow.
marin-oltp.png
In this blog, discussing only the persistent storage system.
The persistent storage  is layered on top of the open source Apache Phoenix/HBase. The REST service API layer is to expose the underlying high-performance storage to  various services and UI. For complex online analytics queries, built a custom Presto-OLTP connector that streams the underlying Apache Phoenix data into Presto.


Presto is a very high performing interactive query engine that Facebook open sourced in 2014. For more details about Presto, please refer here


To address some of the performance challenges we had with Apache Phoenix for complex queries, we built the Presto-Phoenix connector to leverage the Presto query engine.

Conclusion
Lessons and practices learned during migration of data platform to Apache Phoenix over HBase for storage:
  • Denormalize your Apache Phoenix data schema to avoid:
    • Joins and FK links between tables on the Phoenix side
    • Group By and other aggregate calc (i.e. Sum, Avg, etc.) on the Phoenix side, instead perform these queries from an analytics engine (i.e. Presto)
    • Sub queries
  • Design the schema focusing on the Primary Keys as Phoenix/HBase prefers to use range scan and skip scan to access the underlying KV stores for performance
  • Design the schema to consider the immutability of the PK. Once a data row key is generated and inserted into HBase, the row key should be always treated as immutable
  • Consider the reasonable salts based on the table cardinality. For high cardinality tables, use salts that are appropriate for the underlying HBase cluster size (i.e.  20% ~ 100% of HBase Region server count). However avoid too many salts for small tables (i.e. tables with less than a couple millions records)
  • Avoid using foreign key references in the primary table. Instead embed related objects (denormalized) into the primary table
  • Avoid filters on the non-PK columns. Instead do queries solely based on the leading PK column values
  • Don't build too many secondary Phoenix indexes over your data table. This will slow down your update throughput which is critical to writes
  • If possible,try to use in conjunction with Phoenix/HBase, other external indexing systems (i.e. SolrCloud, Elasticsearch, etc.) for indexing non-PK columns in a table.
  • Try using Spark or other async job flows to periodically update these external index data stores outside the HBase coprocessor to unblock the HBase process for low latency writes and reads.
  • Do large joins outside the Phoenix context (i.e. use Spark, Presto, etc. for joins)
  • Do aggregates cautiously, and do complex aggregates within an analytics service framework outside of Phoenix (i.e. Spark, Presto, etc.)
  • Try to do your writes through a Phoenix-Spark connector if you are using Spark to write to Phoenix
  • Try to do your writes through batches of a few thousands (i.e. < 20,000 rows) per batch if you need to go through JDBC to write to Phoenix
  • If your batch writes are insertions instead of updates,try to cache and batch sequence ID generations.