{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "daaf17e6",
   "metadata": {},
   "source": [
    "> Note: To use this notebook, you must first install the [Rust `excvr` jupyter kernel](https://github.com/google/evcxr/blob/main/evcxr_jupyter/README.md).  Also, note that `clone()` is used fairly often in the examples.  This is because we tend to create one dataset for multiple examples.  When this dataset is used, the rust ownership system will `move` that dataframe, which will make it unavailable to later examples.  By `clone`ing, we can keep using it over and over."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "86cbe61e-b4be-4103-9b5f-6a617d0836ec",
   "metadata": {},
   "outputs": [],
   "source": [
    ":dep polars = {version = \"0.23.2\", features = [\"lazy\", \"csv-file\", \"strings\", \"temporal\", \"dtype-duration\", \"dtype-categorical\", \"concat_str\", \"list\", \"list_eval\", \"rank\", \"lazy_regex\"]}\n",
    ":dep color-eyre = {version = \"0.6.2\"}\n",
    ":dep rand = {version = \"0.8.5\"}\n",
    ":dep reqwest = {version = \"0.11.11\", features = [\"blocking\"]}\n",
    "\n",
    "use color_eyre::{Result};\n",
    "use polars::prelude::*;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26e5ebd3-2c57-49a2-8d64-789570c2e259",
   "metadata": {},
   "source": [
    "# Expressions\n",
    "\n",
    "`fn(Series) -> Series`\n",
    "\n",
    "* Lazily evaluated\n",
    "    * Can be optimized\n",
    "    * Gives the library writer context and informed decisions can be made\n",
    "* Embarrassingly parallel\n",
    "* Context dependent\n",
    "    * selection/projection -> `Series` = *COLUMN, LITERAL, or VALUE*\n",
    "    * aggregation -> `Series` = *GROUPS*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "792fcccb-7852-4bab-b307-0302b20efde6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 5)\n",
       "┌─────┬────────┬─────┬────────┬──────────┐\n",
       "│ A   ┆ fruits ┆ B   ┆ cars   ┆ optional │\n",
       "│ --- ┆ ---    ┆ --- ┆ ---    ┆ ---      │\n",
       "│ i32 ┆ str    ┆ i32 ┆ str    ┆ i32      │\n",
       "╞═════╪════════╪═════╪════════╪══════════╡\n",
       "│ 1   ┆ banana ┆ 5   ┆ beetle ┆ 28       │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 2   ┆ banana ┆ 4   ┆ audi   ┆ 300      │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ null     │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ 2        │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 5   ┆ banana ┆ 1   ┆ beetle ┆ -30      │\n",
       "└─────┴────────┴─────┴────────┴──────────┘"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "let df = df! [\n",
    "    \"A\"        => [1, 2, 3, 4, 5],\n",
    "    \"fruits\"   => [\"banana\", \"banana\", \"apple\", \"apple\", \"banana\"],\n",
    "    \"B\"        => [5, 4, 3, 2, 1],\n",
    "    \"cars\"     => [\"beetle\", \"audi\", \"beetle\", \"beetle\", \"beetle\"],\n",
    "    \"optional\" => [Some(28), Some(300), None, Some(2), Some(-30)],\n",
    "]?;\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c985c4a8-bead-43a1-8f5e-278bae129b77",
   "metadata": {},
   "source": [
    "# Selection context"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "bf82274b-a140-48f9-bfcf-43fd50c82e17",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 4)\n",
       "┌─────┬─────┬─────────┬────────┐\n",
       "│ A   ┆ B   ┆ literal ┆ fruits │\n",
       "│ --- ┆ --- ┆ ---     ┆ ---    │\n",
       "│ i32 ┆ i32 ┆ str     ┆ str    │\n",
       "╞═════╪═════╪═════════╪════════╡\n",
       "│ 1   ┆ 5   ┆ B       ┆ banana │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤\n",
       "│ 2   ┆ 4   ┆ B       ┆ banana │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤\n",
       "│ 3   ┆ 3   ┆ B       ┆ apple  │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤\n",
       "│ 4   ┆ 2   ┆ B       ┆ apple  │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤\n",
       "│ 5   ┆ 1   ┆ B       ┆ banana │\n",
       "└─────┴─────┴─────────┴────────┘"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// We can select by name\n",
    "// We'll be re-using the dataframe a bunch, so we'll clone copies as we go.\n",
    "df.clone().lazy().select([\n",
    "    col(\"A\"),\n",
    "    col(\"B\"),\n",
    "    lit(\"B\"),  // we must tell polars we mean the literal \"B\"\n",
    "    col(\"fruits\"),\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "04e8b82d-a8cd-4a32-bcdb-6acf5bbb5b11",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (1, 2)\n",
       "┌─────┬─────┐\n",
       "│ A   ┆ B   │\n",
       "│ --- ┆ --- │\n",
       "│ i32 ┆ i32 │\n",
       "╞═════╪═════╡\n",
       "│ 15  ┆ 15  │\n",
       "└─────┴─────┘"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// you can select columns with a regex if it starts with '^' and ends with '$'\n",
    "\n",
    "df.clone().lazy().select([\n",
    "    col(\"^A|B$\").sum()\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "a1497389-0415-4744-aeaf-0e57d067431b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (1, 2)\n",
       "┌─────┬─────┐\n",
       "│ A   ┆ B   │\n",
       "│ --- ┆ --- │\n",
       "│ i32 ┆ i32 │\n",
       "╞═════╪═════╡\n",
       "│ 15  ┆ 15  │\n",
       "└─────┴─────┘"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// you can select multiple columns by name\n",
    "\n",
    "df.clone().lazy().select([\n",
    "    cols([\"A\", \"B\"]).sum()\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "e58ddb8e-8af2-467f-8254-9ffc44ff1472",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 10)\n",
       "┌─────┬────────┬─────┬────────┬─────┬────────────────┬───────────┬──────────────┬──────┐\n",
       "│ A   ┆ fruits ┆ B   ┆ cars   ┆ ... ┆ fruits_reverse ┆ B_reverse ┆ cars_reverse ┆ opti │\n",
       "│ --- ┆ ---    ┆ --- ┆ ---    ┆     ┆ ---            ┆ ---       ┆ ---          ┆ onal │\n",
       "│ i32 ┆ str    ┆ i32 ┆ str    ┆     ┆ str            ┆ i32       ┆ str          ┆ _rev │\n",
       "│     ┆        ┆     ┆        ┆     ┆                ┆           ┆              ┆ erse │\n",
       "│     ┆        ┆     ┆        ┆     ┆                ┆           ┆              ┆ ---  │\n",
       "│     ┆        ┆     ┆        ┆     ┆                ┆           ┆              ┆ i32  │\n",
       "╞═════╪════════╪═════╪════════╪═════╪════════════════╪═══════════╪══════════════╪══════╡\n",
       "│ 1   ┆ banana ┆ 5   ┆ beetle ┆ ... ┆ banana         ┆ 1         ┆ beetle       ┆ -30  │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤\n",
       "│ 2   ┆ banana ┆ 4   ┆ audi   ┆ ... ┆ apple          ┆ 2         ┆ beetle       ┆ 2    │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤\n",
       "│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ ... ┆ apple          ┆ 3         ┆ beetle       ┆ null │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤\n",
       "│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ ... ┆ banana         ┆ 4         ┆ audi         ┆ 300  │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤\n",
       "│ 5   ┆ banana ┆ 1   ┆ beetle ┆ ... ┆ banana         ┆ 5         ┆ beetle       ┆ 28   │\n",
       "└─────┴────────┴─────┴────────┴─────┴────────────────┴───────────┴──────────────┴──────┘"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// We select everything in normal order\n",
    "// Then we select everything in reversed order\n",
    "\n",
    "df.clone().lazy().select([\n",
    "    all(),\n",
    "    all().reverse().suffix(\"_reverse\")\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "e9fddfd3-c5cb-4ad8-880d-19103fcafaea",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 10)\n",
       "┌─────┬────────┬─────┬────────┬─────┬────────────┬───────┬──────────┬──────────────┐\n",
       "│ A   ┆ fruits ┆ B   ┆ cars   ┆ ... ┆ fruits_sum ┆ B_sum ┆ cars_sum ┆ optional_sum │\n",
       "│ --- ┆ ---    ┆ --- ┆ ---    ┆     ┆ ---        ┆ ---   ┆ ---      ┆ ---          │\n",
       "│ i32 ┆ str    ┆ i32 ┆ str    ┆     ┆ str        ┆ i32   ┆ str      ┆ i32          │\n",
       "╞═════╪════════╪═════╪════════╪═════╪════════════╪═══════╪══════════╪══════════════╡\n",
       "│ 1   ┆ banana ┆ 5   ┆ beetle ┆ ... ┆ null       ┆ 15    ┆ null     ┆ 300          │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 2   ┆ banana ┆ 4   ┆ audi   ┆ ... ┆ null       ┆ 15    ┆ null     ┆ 300          │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ ... ┆ null       ┆ 15    ┆ null     ┆ 300          │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ ... ┆ null       ┆ 15    ┆ null     ┆ 300          │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 5   ┆ banana ┆ 1   ┆ beetle ┆ ... ┆ null       ┆ 15    ┆ null     ┆ 300          │\n",
       "└─────┴────────┴─────┴────────┴─────┴────────────┴───────┴──────────┴──────────────┘"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// all expressions run in parallel\n",
    "// single valued `Series` are broadcasted to the shape of the `DataFrame`\n",
    "\n",
    "df.clone().lazy().select([\n",
    "    all(),\n",
    "    all().sum().suffix(\"_sum\")\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "642b19e5-6a03-4932-95c4-863a34fe0a32",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 1)\n",
       "┌────────┐\n",
       "│ fruits │\n",
       "│ ---    │\n",
       "│ bool   │\n",
       "╞════════╡\n",
       "│ true   │\n",
       "├╌╌╌╌╌╌╌╌┤\n",
       "│ true   │\n",
       "├╌╌╌╌╌╌╌╌┤\n",
       "│ false  │\n",
       "├╌╌╌╌╌╌╌╌┤\n",
       "│ false  │\n",
       "├╌╌╌╌╌╌╌╌┤\n",
       "│ true   │\n",
       "└────────┘"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// there are `str` and `dt` namespaces for specialized functions\n",
    "\n",
    "let predicate = col(\"fruits\").str().contains(\"^b.*\");\n",
    "\n",
    "df.clone().lazy().select([\n",
    "    predicate\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "b8f22042-3de8-4dae-af11-21b642fbe5a6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (3, 5)\n",
       "┌─────┬────────┬─────┬────────┬──────────┐\n",
       "│ A   ┆ fruits ┆ B   ┆ cars   ┆ optional │\n",
       "│ --- ┆ ---    ┆ --- ┆ ---    ┆ ---      │\n",
       "│ i32 ┆ str    ┆ i32 ┆ str    ┆ i32      │\n",
       "╞═════╪════════╪═════╪════════╪══════════╡\n",
       "│ 1   ┆ banana ┆ 5   ┆ beetle ┆ 28       │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 2   ┆ banana ┆ 4   ┆ audi   ┆ 300      │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 5   ┆ banana ┆ 1   ┆ beetle ┆ -30      │\n",
       "└─────┴────────┴─────┴────────┴──────────┘"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// use the predicate to filter\n",
    "let predicate = col(\"fruits\").str().contains(\"^b.*\");\n",
    "df.clone().lazy().filter(predicate).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "92004db3-7aef-4da0-9607-a882f83bb754",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (1, 2)\n",
       "┌─────┬────────────────┐\n",
       "│ A   ┆ some_compute() │\n",
       "│ --- ┆ ---            │\n",
       "│ i32 ┆ i32            │\n",
       "╞═════╪════════════════╡\n",
       "│ 8   ┆ 165            │\n",
       "└─────┴────────────────┘"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// predicate expressions can be used to filter\n",
    "\n",
    "df.clone().lazy().select([\n",
    "    col(\"A\").filter(col(\"fruits\").str().contains(\"^b.*\")).sum(),\n",
    "    (col(\"B\").filter(col(\"cars\").str().contains(\"^b.*\")).sum() * col(\"B\").sum()).alias(\"some_compute()\"),\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "7721728b-e05e-48fa-89d4-f9262344b766",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 1)\n",
       "┌──────────┐\n",
       "│ computed │\n",
       "│ ---      │\n",
       "│ f64      │\n",
       "╞══════════╡\n",
       "│ 0.002688 │\n",
       "├╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 0.004301 │\n",
       "├╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 0.004839 │\n",
       "├╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 0.004301 │\n",
       "├╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 0.002688 │\n",
       "└──────────┘"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// We can do arithmetic on columns and (literal) values\n",
    "// can be evaluated to 1 without programmer knowing\n",
    "\n",
    "let some_var = 1;\n",
    "\n",
    "df.clone().lazy().select([\n",
    "    ((col(\"A\") / lit(124.0) * col(\"B\")) / sum(\"B\") * lit(some_var)).alias(\"computed\")\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "73ddb6ff-a1b8-42c7-b3b5-65807f85a9c6",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 3)\n",
       "┌────────┬─────┬───────────────────┐\n",
       "│ fruits ┆ B   ┆ b when not banana │\n",
       "│ ---    ┆ --- ┆ ---               │\n",
       "│ str    ┆ i32 ┆ i32               │\n",
       "╞════════╪═════╪═══════════════════╡\n",
       "│ banana ┆ 5   ┆ -1                │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 4   ┆ -1                │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 3   ┆ -1                │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 2   ┆ -1                │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 1   ┆ -1                │\n",
       "└────────┴─────┴───────────────────┘"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// We can combine columns by a predicate\n",
    "// This doesn't work.  It seems like the condition always evaluates to true\n",
    "df.clone().lazy().select([\n",
    "    col(\"fruits\"),\n",
    "    col(\"B\"),\n",
    "    when(col(\"fruits\") == lit(\"banana\")).then(col(\"B\")).otherwise(-1).alias(\"b when not banana\")\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "9ff6ded8-dcde-448b-a6f9-63d5adf73046",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 3)\n",
       "┌─────┬─────┬───────────┐\n",
       "│ A   ┆ B   ┆ fold      │\n",
       "│ --- ┆ --- ┆ ---       │\n",
       "│ i32 ┆ i32 ┆ str       │\n",
       "╞═════╪═════╪═══════════╡\n",
       "│ 1   ┆ 5   ┆ 1B25.00.5 │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 2   ┆ 4   ┆ 2B16.01.0 │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 3   ┆ 3   ┆ 3B9.01.5  │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 4   ┆ 2   ┆ 4B4.02.0  │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 5   ┆ 1   ┆ 5B1.02.5  │\n",
       "└─────┴─────┴───────────┘"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// We can combine columns by a fold operation on column level\n",
    "\n",
    "df.clone().lazy().select([\n",
    "    col(\"A\"),\n",
    "    col(\"B\"),\n",
    "    fold_exprs(lit(0), |a, b| Ok(&a + &b), [\n",
    "        col(\"A\"),\n",
    "        lit(\"B\"),\n",
    "        col(\"B\").pow(lit(2)),\n",
    "        col(\"A\") / lit(2.0)\n",
    "    ]).alias(\"fold\")\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "id": "c21c7645-1b16-4372-87e5-c62261eaabdd",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 5)\n",
       "┌─────┬─────┬───────────┬───────────────────┬───────────────────┐\n",
       "│ idx ┆ A   ┆ A_shifted ┆ str_concat_1      ┆ str_concat_2      │\n",
       "│ --- ┆ --- ┆ ---       ┆ ---               ┆ ---               │\n",
       "│ i32 ┆ i32 ┆ i32       ┆ str               ┆ str               │\n",
       "╞═════╪═════╪═══════════╪═══════════════════╪═══════════════════╡\n",
       "│ 0   ┆ 1   ┆ null      ┆ 1banana5beetle28  ┆ 1banana5beetle28  │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 1   ┆ 2   ┆ 1         ┆ 2banana4audi300   ┆ 2banana4audi300   │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 2   ┆ 3   ┆ 2         ┆ null              ┆ null              │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 3   ┆ 4   ┆ 3         ┆ 4apple2beetle2    ┆ 4apple2beetle2    │\n",
       "├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ 4   ┆ 5   ┆ 4         ┆ 5banana1beetle-30 ┆ 5banana1beetle-30 │\n",
       "└─────┴─────┴───────────┴───────────────────┴───────────────────┘"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// even combine all\n",
    "use std::convert::TryInto;\n",
    "let height: i32 = df.height().try_into()?;\n",
    "df.clone().lazy().select([\n",
    "    range(0i32, height).alias(\"idx\"),\n",
    "    col(\"A\"),\n",
    "    col(\"A\").shift(1).alias(\"A_shifted\"),\n",
    "    concat_str([all()], \"\").alias(\"str_concat_1\"),  // prefer this\n",
    "    fold_exprs(col(\"A\"), |a, b| Ok(a + b), [all().exclude([\"A\"])]).alias(\"str_concat_2\"), // over this (accidentally O(n^2))\n",
    "]).collect()?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "294afe26-5a2d-4046-8ec2-6c75d4007889",
   "metadata": {},
   "source": [
    "# Aggregation context\n",
    "\n",
    "* expressions are applied over groups instead of columns\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "3d6a30eb-65c0-4290-98ec-e6b9594a887b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (2, 6)\n",
       "┌────────┬───────┬────────┬──────────────┬───────┬──────────────────────────────┐\n",
       "│ fruits ┆ B_sum ┆ B_sum2 ┆ fruits_first ┆ count ┆ cars                         │\n",
       "│ ---    ┆ ---   ┆ ---    ┆ ---          ┆ ---   ┆ ---                          │\n",
       "│ str    ┆ i32   ┆ i32    ┆ str          ┆ u32   ┆ list[str]                    │\n",
       "╞════════╪═══════╪════════╪══════════════╪═══════╪══════════════════════════════╡\n",
       "│ apple  ┆ 5     ┆ 5      ┆ apple        ┆ 2     ┆ [\"beetle\", \"beetle\"]         │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 10    ┆ 10     ┆ banana       ┆ 3     ┆ [\"beetle\", \"beetle\", \"audi\"] │\n",
       "└────────┴───────┴────────┴──────────────┴───────┴──────────────────────────────┘"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// we can still combine many expressions\n",
    "\n",
    "df.clone().lazy().sort(\"cars\", SortOptions::default()).groupby([\"fruits\"])\n",
    "    .agg([\n",
    "        col(\"B\").sum().alias(\"B_sum\"),\n",
    "        sum(\"B\").alias(\"B_sum2\"),  // syntactic sugar for the first\n",
    "        col(\"fruits\").first().alias(\"fruits_first\"),\n",
    "        col(\"A\").count().alias(\"count\"),\n",
    "        col(\"cars\").reverse()\n",
    "    ]).collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "id": "93d2efc6-23a3-45c6-8c0b-28c5984c8f65",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 6)\n",
       "┌────────┬───────┬────────┬──────────────┬───────┬────────┐\n",
       "│ fruits ┆ B_sum ┆ B_sum2 ┆ fruits_first ┆ count ┆ cars   │\n",
       "│ ---    ┆ ---   ┆ ---    ┆ ---          ┆ ---   ┆ ---    │\n",
       "│ str    ┆ i32   ┆ i32    ┆ str          ┆ u32   ┆ str    │\n",
       "╞════════╪═══════╪════════╪══════════════╪═══════╪════════╡\n",
       "│ banana ┆ 10    ┆ 10     ┆ banana       ┆ 3     ┆ beetle │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 10    ┆ 10     ┆ banana       ┆ 3     ┆ beetle │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 10    ┆ 10     ┆ banana       ┆ 3     ┆ audi   │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 5     ┆ 5      ┆ apple        ┆ 2     ┆ beetle │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 5     ┆ 5      ┆ apple        ┆ 2     ┆ beetle │\n",
       "└────────┴───────┴────────┴──────────────┴───────┴────────┘"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// We can explode the list column \"cars\"\n",
    "\n",
    "df.clone().lazy()\n",
    "    .sort(\"cars\", SortOptions { descending: false, nulls_last: false })\n",
    "    .groupby([\"fruits\"])\n",
    "    .agg([\n",
    "        col(\"B\").sum().alias(\"B_sum\"),\n",
    "        sum(\"B\").alias(\"B_sum2\"),  // syntactic sugar for the first\n",
    "        col(\"fruits\").first().alias(\"fruits_first\"),\n",
    "        col(\"A\").count().alias(\"count\"),\n",
    "        col(\"cars\").reverse()\n",
    "    ])\n",
    "    .explode([\"cars\"])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "id": "9675a2c7-49a1-45c0-80a6-018505e70701",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 5)\n",
       "┌────────┬───────┬──────────────┬───────┬───────────┐\n",
       "│ fruits ┆ B_sum ┆ fruits_first ┆ count ┆ B_shifted │\n",
       "│ ---    ┆ ---   ┆ ---          ┆ ---   ┆ ---       │\n",
       "│ str    ┆ i32   ┆ str          ┆ u32   ┆ i32       │\n",
       "╞════════╪═══════╪══════════════╪═══════╪═══════════╡\n",
       "│ apple  ┆ 5     ┆ apple        ┆ 2     ┆ null      │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 5     ┆ apple        ┆ 2     ┆ 3         │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 10    ┆ banana       ┆ 3     ┆ null      │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 10    ┆ banana       ┆ 3     ┆ 5         │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 10    ┆ banana       ┆ 3     ┆ 4         │\n",
       "└────────┴───────┴──────────────┴───────┴───────────┘"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.clone().lazy()\n",
    "    .groupby([\"fruits\"])\n",
    "    .agg([\n",
    "        col(\"B\").sum().alias(\"B_sum\"),\n",
    "        col(\"fruits\").first().alias(\"fruits_first\"),\n",
    "        count(),\n",
    "        col(\"B\").shift(1).alias(\"B_shifted\")\n",
    "    ])\n",
    "    .explode([\"B_shifted\"])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "5132aee4-8f71-4824-a7c2-e30a85cf5581",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (2, 3)\n",
       "┌────────┬──────────────┬───────────┐\n",
       "│ fruits ┆ shift_B      ┆ rev_B     │\n",
       "│ ---    ┆ ---          ┆ ---       │\n",
       "│ str    ┆ list[i32]    ┆ list[i32] │\n",
       "╞════════╪══════════════╪═══════════╡\n",
       "│ banana ┆ [null, 5, 4] ┆ [1, 4, 5] │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ [null, 3]    ┆ [2, 3]    │\n",
       "└────────┴──────────────┴───────────┘"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// we can also get the list of the groups\n",
    "\n",
    "df.clone().lazy()\n",
    "    .groupby([\"fruits\"])\n",
    "    .agg([\n",
    "        col(\"B\").shift(1).alias(\"shift_B\"),\n",
    "        col(\"B\").reverse().alias(\"rev_B\"),\n",
    "    ])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "id": "a04b9963-4706-4836-a443-658e40ef21a5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (2, 2)\n",
       "┌────────┬───────────┐\n",
       "│ fruits ┆ B         │\n",
       "│ ---    ┆ ---       │\n",
       "│ str    ┆ list[i32] │\n",
       "╞════════╪═══════════╡\n",
       "│ apple  ┆ [3, 2]    │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ [5, 4]    │\n",
       "└────────┴───────────┘"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// we can do predicates in the groupby as well\n",
    "\n",
    "df.clone().lazy()\n",
    "    .groupby([\"fruits\"])\n",
    "    .agg([\n",
    "        col(\"B\").filter(col(\"B\").gt(lit(1))).list().keep_name(),\n",
    "    ])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "id": "5ffd45b8-aa03-42d8-aa40-767d25a83b07",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (2, 2)\n",
       "┌────────┬─────┐\n",
       "│ fruits ┆ B   │\n",
       "│ ---    ┆ --- │\n",
       "│ str    ┆ f64 │\n",
       "╞════════╪═════╡\n",
       "│ apple  ┆ 2.5 │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤\n",
       "│ banana ┆ 4.5 │\n",
       "└────────┴─────┘"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// and sum only by the values where the predicates are true\n",
    "\n",
    "df.clone().lazy()\n",
    "    .groupby([\"fruits\"])\n",
    "    .agg([\n",
    "        col(\"B\").filter(col(\"B\").gt(lit(1))).mean(),\n",
    "    ])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "id": "7be46430-c86f-4295-9d6c-7352a65d8b1b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (2, 3)\n",
       "┌────────┬───────────┬─────────────┐\n",
       "│ fruits ┆ shifted   ┆ shifted_sum │\n",
       "│ ---    ┆ ---       ┆ ---         │\n",
       "│ str    ┆ list[i32] ┆ i32         │\n",
       "╞════════╪═══════════╪═════════════╡\n",
       "│ apple  ┆ [0, 3]    ┆ 3           │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ [0, 5, 4] ┆ 9           │\n",
       "└────────┴───────────┴─────────────┘"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// Another example\n",
    "\n",
    "df.clone().lazy()\n",
    "    .groupby([\"fruits\"])\n",
    "    .agg([\n",
    "        col(\"B\").shift_and_fill(1, 0).alias(\"shifted\"),\n",
    "        col(\"B\").shift_and_fill(1, 0).sum().alias(\"shifted_sum\"),\n",
    "    ])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4a6bfc86-a35e-4bc4-b851-624535b3a455",
   "metadata": {},
   "source": [
    "# Window functions!\n",
    "\n",
    "* Expression with superpowers.\n",
    "* Aggregation in selection context\n",
    "\n",
    "```rust\n",
    "col(\"foo\").aggregation_expression(..).over(\"column_used_to_group\")\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "id": "b5d0eef9-2f45-45d3-9ed7-b7262a00823c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 5)\n",
       "┌────────┬────────┬─────┬─────────────────┬───────────────┐\n",
       "│ fruits ┆ cars   ┆ B   ┆ B_sum_by_fruits ┆ B_sum_by_cars │\n",
       "│ ---    ┆ ---    ┆ --- ┆ ---             ┆ ---           │\n",
       "│ str    ┆ str    ┆ i32 ┆ i32             ┆ i32           │\n",
       "╞════════╪════════╪═════╪═════════════════╪═══════════════╡\n",
       "│ banana ┆ beetle ┆ 5   ┆ 10              ┆ 11            │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ audi   ┆ 4   ┆ 10              ┆ 4             │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ beetle ┆ 3   ┆ 5               ┆ 11            │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ beetle ┆ 2   ┆ 5               ┆ 11            │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ beetle ┆ 1   ┆ 10              ┆ 11            │\n",
       "└────────┴────────┴─────┴─────────────────┴───────────────┘"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// groupby 2 different columns\n",
    "\n",
    "df.clone().lazy()\n",
    "    .select([\n",
    "        col(\"fruits\"),\n",
    "        col(\"cars\"),\n",
    "        col(\"B\"),\n",
    "        col(\"B\").sum().over([\"fruits\"]).alias(\"B_sum_by_fruits\"),\n",
    "        col(\"B\").sum().over([\"cars\"]).alias(\"B_sum_by_cars\"),\n",
    "    ])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "id": "e7644393-fefb-45e1-ac85-931ae466dc91",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 3)\n",
       "┌────────┬─────┬──────────────────────┐\n",
       "│ fruits ┆ B   ┆ B_reversed_by_fruits │\n",
       "│ ---    ┆ --- ┆ ---                  │\n",
       "│ str    ┆ i32 ┆ i32                  │\n",
       "╞════════╪═════╪══════════════════════╡\n",
       "│ banana ┆ 5   ┆ 1                    │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 4   ┆ 4                    │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 3   ┆ 2                    │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 2   ┆ 3                    │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 1   ┆ 5                    │\n",
       "└────────┴─────┴──────────────────────┘"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// reverse B by groups and show the results in original DF\n",
    "\n",
    "df.clone().lazy()\n",
    "    .select([\n",
    "        col(\"fruits\"),\n",
    "        col(\"B\"),\n",
    "        col(\"B\").reverse().over([\"fruits\"]).alias(\"B_reversed_by_fruits\")\n",
    "    ])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "id": "fb71e41e-5e48-4237-a6f9-280d11efcfee",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "shape: (5, 3)\n",
       "┌────────┬─────┬─────────────────┐\n",
       "│ fruits ┆ B   ┆ lag_B_by_fruits │\n",
       "│ ---    ┆ --- ┆ ---             │\n",
       "│ str    ┆ i32 ┆ i32             │\n",
       "╞════════╪═════╪═════════════════╡\n",
       "│ banana ┆ 5   ┆ null            │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 4   ┆ 5               │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 3   ┆ null            │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ apple  ┆ 2   ┆ 3               │\n",
       "├╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤\n",
       "│ banana ┆ 1   ┆ 4               │\n",
       "└────────┴─────┴─────────────────┘"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "// Lag a column within \"fruits\"\n",
    "\n",
    "df.clone().lazy()\n",
    "    .select([\n",
    "        col(\"fruits\"),\n",
    "        col(\"B\"),\n",
    "        col(\"B\").shift(1).over([\"fruits\"]).alias(\"lag_B_by_fruits\")\n",
    "    ])\n",
    "    .collect()?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a4511749-24c9-41ba-97a0-5be282282585",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Rust",
   "language": "rust",
   "name": "rust"
  },
  "language_info": {
   "codemirror_mode": "rust",
   "file_extension": ".rs",
   "mimetype": "text/rust",
   "name": "rust",
   "pygment_lexer": "rust",
   "version": ""
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
