Apache Doris刚刚“毕业”:为什么要关心这个SQL数据仓
发布时间: 2023-07-11


Doris是一种基于SQL的大规模并行处理(MPP)开源分析数据仓库,正在Apache Incubator(Apache孵化器)进行开发。现在,Doris跻身顶级项目行列,据Apache 软件基金会(ASF)声称,这意味着“它已证明了能够进行适当的自治”。


Doris原名Palo,诞生于中国互联网搜索巨头百度,是其广告业务的数据仓库系统,2017 年开源,2018年进入Apache 孵化器。

Doris植根于Apache Impala和Google Mesa

据Apache软件基金会声称,Doris基于Google Mesa和Apache Impala集成,Apache Impala是2012年开发的开源MPP SQL查询引擎,基于Google F1的基础。




Doris的其他一些功能包括列存储、并行执行、矢量化技术、查询优化、ANSI SQL,以及通过面向Apache Flink、Apache Hive、Apache Hudi、Apache Iceberg、Apache Spark、 Elasticsearch及其他系统的连接件与大数据生态系统集成。(华东CIO大会、华东CIO联盟、CDLC中国数字化灯塔大会、CXO数字化研学之旅、数字化江湖-讲武堂,数字化江湖-大侠传、数字化江湖-论剑、CXO系列管理论坛(陆家嘴CXO管理论坛、宁波东钱湖CXO管理论坛等)、数字化转型网,走进灯塔工厂系列、ECIO大会等)





Ventana Research研究总监David Menninger说:“随着数据量不断增长,MPP数据库成为了能够以足够快的速度或足够低的成本处理数据以满足组织需求的唯一实际方法。”



Menninger认为Doris大有希望,虽然有许多MPP数据库可选,其中一些是开源的,但实际上没有一种开源的MPP MySQL替代方案。

“MySQL本身和MariaDB已经过扩展,可支持更庞大的分析工作负载,但它们最初是为事务处理设计的”,Menninger说,补充道可以将开源PostreSQL数据库Greenplum以及Google BigQuery、Amazon RedShift和Microsoft Synapse等超大规模服务视为Doris的竞争对手。

此外,Gartner大数据和分析前研究副总裁Sanjeev Mohan表示,还可以将ClickHouse、Apache Druid和Apache Pinot视为是竞争对手。






In case you are wondering who “she” is and what school she went to, Doris is an open source, SQL-based massively parallel processing (MPP) analytical data warehouse that was under development at Apache Incubator.

Last week, Doris achieved the status of top-level project, which according to the Apache Software Foundation (ASF) means that “it has proven its ability to be properly self-governed.”

The data warehouse was recently released in version 1.0, its eighth release while undergoing development at the incubator (along with six Connector releases). It has been built to support online analytical processing (OLAP) workloads, often used in data science scenarios.

Doris, originally known as Palo, was born inside Chinese internet search giant Baidu as a data warehousing system for its advertisement business before being open sourced in 2017 and entering the Apache Incubator in 2018.

Doris has roots in Apache Impala and Google Mesa

Doris, according to the Apache Software Foundation, is based on the integration of Google Mesa and Apache Impala, an open source MPP SQL query engine, developed in 2012 and based on the underpinnings of Google F1.

Mesa, which was designed to be a highly scalable analytic data warehousing system around 2014, was used to store critical measurement data related to Google’s Internet advertising business.

According to its developers, both at Baidu and at the Apache Incubator, Doris offers simple design architecture while providing high availability, reliability, fault tolerance, and scalability.

“The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Doris,” the Apache Software Foundation said in a statement, adding that the data warehouse supports multidimensional repor
