A Rust Magic: Polars vs Pandas Speed Test

date
Feb 17, 2023
slug
substitute-pandas-with-polars-a-dataframe-module-rewritten-in-rust
status
Published
summary
Polars is an alternative to Pandas that I've heard about but never actually used. According to itself, it is a "blazingly fast DataFrames" - can you believe that? In this article, I tested it in my own common environment, and it's really fast.
tags
Engineering
Python
Data Analysis
type
Post
Polars is an alternative to Pandas that I've heard about but never actually used. According to itself, it is a "blazingly fast DataFrames" - can you believe that?
In this article, I tested it in my own common environment, and it's really fast.

Test Results

The bar chart shows that Polars takes 1/4 or even less time than Pandas for common operations:
notion image
Detailed table:
Task
Pandas
Polars
Import a 10mb csv file
0.157s
0.055s
Column loops
0.168s
0.060s
Concat three 10mb dataframes
0.063s
0.016s
Groupby() and sum()
0.008s
0.002s

Test Method

Environment

  • Apple Silicon M1 (2020, the cheapest one)
  • MacOS 13
  • Jupyter Notebook in VSCode
  • Python ==3.10.9
  • pandas==1.5.3
  • polars==0.16.6

Tasks

  1. Import a 10MB csv file with spe & encoding, which is a very common task
    1. Concatenate repeated dfs into one
      1. Simple statistical operations of groupby and sum
        1. Loop statistical operations of groupby and sum according to each column name

          References

          1. Test dataset: https://www.kaggle.com/datasets/jessemostipak/hotel-booking-demand
          1. Test data and code: https://github.com/reycn/polars-pandas-bench
          1. Someone else's large-scale test: https://h2oai.github.io/db-benchmark/
          1. Polars open source repository: https://github.com/pola-rs/polars
           

          © Rongxin 2021 - 2024