User talk:Zihao H/sandbox
Apache Doris
[edit]Apache Doris is an open-source high-performance, real-time analytical database based on MPP architecture. And it is a project of the Apache Software Foundation.
Apache Doris only requires a sub-second response time to return query results under massive data. It can support not only high-concurrent point query scenarios but also high-throughput analysis scenarios. Apache Doris is a tool for report analysis, ad-hoc query, unified data warehouse, and data lake query acceleration. On Apache Doris, users can build various applications, such as user behavior analysis, A/B test platform, log retrieval analysis, user portrait analysis, and order analysis.
Doris is widely used in business OLAP applications to analyze massive data in real-time. Apache Doris currently serves over 1,000 users worldwide covering leading technology companies, such as Alibaba Cloud, Baidu, Bytedance(TikTok) , JD.COM, Kwai, Meituan, MiHoYo, NetEase, Shopee, Tencent, Xiaomi, etc.
Developer(s) | Apache Software Foundation |
---|---|
Stable release | 'v1.2.1', '2023'
|
Repository | github |
Written in | Java C++ |
License | Apache License 2.0 |
Website | doris |
History
[edit]Originally known as Baidu PALO, Doris was born inside Chinese search engine company Baidu as a data warehouse for its advertisement business before it open-sourced in 2017 and entered the Apache Incubator in 2018.
In June 2022, Apache Doris graduated from Apache incubator as a Top-Level Project successfully. In 5 years, with the guidance of Apache Way and the great support from incubator mentors, Apache Doris Community nurtured an impressive growth with more than 400 contributors and reached 6,700 stars in Github repository.
Applications
[edit]Apache Doris can be easily used in several cases:
Reporting Anaylysis
[edit]Analysis services such as Real-time Dashboards, Reports for Decision Making and High Concurrent user-oriented report.
Ad-Hoc Query
[edit]Analyst-oriented self-service analytics with irregular query patterns and high throughput requirements.
Unified Data Warehouse
[edit]Apache Doris allows users to build a unified data warehouse via single platform instead of handling multiple software stacks.
You can query your datalake from Apache Hive, Apache Hudi, Apache Iceberg and other Object Storage System such as AWS S3.
Architecture
[edit]Apache Doris has a simple architecture with 2 types of processes, FE and BE.
FE
[edit]Frontend (FE) is designed for user request access, query parsing & planning, metadata management, node management, etc.
BE
[edit]Backend (BE) is designed for data storage and query plan execution
Features
[edit]MPP
[edit]Massively parallel processing (MPP) is a collaborative processing of the same program using two or more processors. By using different processors, speed can be dramatically increased. Doris adopts the MPP model in its query engine to realize parallel execution within different nodes. It also supports distributed shuffle JOIN for multiple wide tables to handle complex queries.
Vectorized SQL Query Execution Enginge
[edit]The Doris query engine is vectorized, with all memory structures lined up in a columnar format. This can largely reduce virtual function calls, improve cache hit rates, and make efficient use of SIMD instructions. Usually, vectorized engine is 5-10 time faster in wide table aggregation than the non-vectorized.
Performance Test
[edit]ClickBench is a benchmark for analytical DBMS. This benchmark represents typical workload in areas of traffic analysis, web analytics, machine-generated data, structured logs, and events data.
Test Results
[edit]Cold Run
[edit]Apache Doris won the 2nd place in query performance test(Cold Run on Instance: c6a.4xlarge, 500gb gp2).
Hot Run
[edit]Apache Doris won the 3rd place in query performance test(Hot Run on Instance: c6a.4xlarge, 500gb gp2).
See also
[edit]References
[edit]1. Apache Doris, "Introduction to Apache Doris", retrieved 4th January 2023.
2. Apache Software Foundation, "APACHE PROJECT LOGOS", retrieved 4th January 2023.
3. Apache Doris, "apache/doris", retrieved 4th January 2023.
4. Apache Doris, "Release 1.2.1", retrieved 4th January 2023.
5. Apache Doris, "Announcing Open Source Realtime Analytical database Apache Doris as a Top-Level Project", retrieved 4th January 2023.
6. Baidu, "Apache Doris", retrieved 4th January 2023.
7. Medium, "The Practice of Apache Doris in Orange Connex: The Innovation of Data Warehouse Architecture!The Processing of Tens of Millions of Data Has Speeded up from 2 Hours to 3 Minutes", retrieved 17th Nov, 2022.
8. Medium, "Best Practice: How Does Apache Doris Help AISPEACH Build a Datawherehouse in AI Chatbots Scenario", retrieved 23rd Nov, 2022.
9. Medium, "Tesla’s Chinese Rival NIO Just Adopted Apache Doris as Their New Generation Datawarehouse", retrieved 28th Nov, 2022.
10. Medium, "Apache Doris Helped NetEase Create a Refined Operation DMP System", retrieved 1st Dec, 2022.
11. Medium, "Amazon Alternatives: JD.COM Is Stepping Forward to OpenSource Database with Apache Doris", retrieved 3rd Dec, 2022.
12. Medium, "Ideal Substitute for iPhone: Xiaomi, just Introduced Apache Doris to Build an OLAP Database", retrieved 8th Dec, 2022.
13. Medium, "Apache Doris vs Druid, Which one is better? — What we have used in Kwai.", retrieved 15th Dec, 2022.
14. Medium, "Apache Kudu to Doris: 5 Times Faster in Creating Tables & 2 Minutes Shorten in Query", retrieved 19th Dec, 2022.
15. Gartner, "Real-time Analytics", retrieved 4th January 2023.
16. Wikipedia, "Data lake", retrieved 4th January 2023.
17. Amy Gallo, "A Refresher on A/B Testing", retrieved 28th June 2017.
18. Apache Doris, "Installation and deployment", retrieved 4th January 2023.
19. Wikipedia, "Online analytical processing", retrieved 4th January 2023.
20. Snowflake, "What is a data lakehouse", retrieved 4th January 2023.
21. Wikipedia, "Object storage", retrieved 4th January 2023.