Tech | A Rust Magic: Polars vs Pandas Speed Test
date
Feb 17, 2023
slug
substitute-pandas-with-polars-a-dataframe-module-rewritten-in-rust
status
Published
summary
Polars outperforms Pandas significantly in speed tests for common data operations, completing tasks like importing CSV files and groupby/sum operations in a fraction of the time, demonstrating its efficiency on an Apple Silicon M1 environment.
tags
Engineering
Python
Data Analysis
type
Post
Polars is an alternative to Pandas that I've heard about but never actually used. According to itself, it is a "blazingly fast DataFrames" - can you believe that?
In this article, I tested it in my own common environment, and it's really fast.
Test Results
The bar chart shows that Polars takes 1/4 or even less time than Pandas for common operations:

Detailed table:
Task | Pandas | Polars |
Import a 10mb csv file | 0.157s | 0.055s |
Column loops | 0.168s | 0.060s |
Concat three 10mb dataframes | 0.063s | 0.016s |
Groupby() and sum() | 0.008s | 0.002s |
Test Method
Environment
- Apple Silicon M1 (2020, the cheapest one)
- MacOS 13
- Jupyter Notebook in VSCode
- Python ==3.10.9
- pandas==1.5.3
- polars==0.16.6
Tasks
- Import a 10MB csv file with spe & encoding, which is a very common task
- Concatenate repeated dfs into one
- Simple statistical operations of groupby and sum
- Loop statistical operations of groupby and sum according to each column name
Detailed test data and code: https://github.com/reycn/polars-pandas-bench
References
- Test data and code: https://github.com/reycn/polars-pandas-bench
- Someone else's large-scale test: https://h2oai.github.io/db-benchmark/
- Polars open source repository: https://github.com/pola-rs/polars