A Rust Magic: Polars vs Pandas Speed Test
date
Feb 17, 2023
slug
substitute-pandas-with-polars-a-dataframe-module-rewritten-in-rust
status
Published
summary
tags
Engineering
Python
Data Analysis
type
Post
Polars is an alternative to Pandas that I've heard about but never actually used. According to itself, it is a "blazingly fast DataFrames" - can you believe that?
In this article, I tested it in my own common environment, and it's really fast.
Test Results
The bar chart shows that Polars takes 1/4 or even less time than Pandas for common operations:
Detailed table:
Task | Pandas | Polars |
Import a 10mb csv file | 0.157s | 0.055s |
Column loops | 0.168s | 0.060s |
Concat three 10mb dataframes | 0.063s | 0.016s |
Groupby() and sum() | 0.008s | 0.002s |
Test Method
Environment
- Apple Silicon M1 (2020, the cheapest one)
- MacOS 13
- Jupyter Notebook in VSCode
- Python ==3.10.9
- pandas==1.5.3
- polars==0.16.6
Tasks
- Import a 10MB csv file with spe & encoding, which is a very common task
- Concatenate repeated dfs into one
- Simple statistical operations of groupby and sum
- Loop statistical operations of groupby and sum according to each column name
Detailed test data and code: https://github.com/reycn/polars-pandas-bench
References
- Test data and code: https://github.com/reycn/polars-pandas-bench
- Someone else's large-scale test: https://h2oai.github.io/db-benchmark/
- Polars open source repository: https://github.com/pola-rs/polars