 Managing pip Package Dependencies
Managing pip Package Dependencies
Learn how to manage pip package dependencies in your flows.
Motivation
Your Python code may require pip package dependencies. How you manage these dependencies can affect the execution time of your flows.
If you install pip packages within beforeCommands, the packages will be downloaded and installed each time the task runs. This can significantly increase the duration of workflow executions. The following sections describe several ways to manage pip package dependencies efficiently in your flows.
Using a custom Docker image
Instead of using the base Python Docker image and installing dependencies through beforeCommands, you can create a custom Docker image that includes Python and all required pip packages. Since the dependencies are built into the image, they do not need to be downloaded and installed at runtime. This reduces overhead and ensures that execution time is dedicated solely to running your Python code.
For example, if your Python script depends on pandas, you can use a container image that already includes it, such as ghcr.io/kestra-io/pydata:latest. This eliminates the need to install dependencies using beforeCommands:
id: docker_dependencies
namespace: company.team
tasks:
  - id: code
    type: io.kestra.plugin.scripts.python.Script
    taskRunner:
      type: io.kestra.plugin.scripts.runner.docker.Docker
    containerImage: ghcr.io/kestra-io/pydata:latest
    script: |
      import pandas as pd
      df = pd.read_csv('https://huggingface.co/datasets/kestra/datasets/raw/main/csv/orders.csv')
      total_revenue = df['total'].sum()
Installing pip package dependencies at server startup
Another way to avoid installing dependencies during every execution is to preinstall them before starting the Kestra server. For a standalone Kestra server, you can run:
pip install requests pandas polars && ./kestra server standalone --worker-thread=16
If you are running Kestra with Docker, create a Dockerfile and install dependencies using the RUN command. Set the USER to root to allow package installation:
FROM kestra/kestra:latest
USER root
RUN pip install requests pandas polars
CMD ["server", "standalone"]
In your Docker Compose configuration, replace the image property with build: . to use your custom Dockerfile instead of the official image from Docker Hub. Also, remove the command property, since the CMD instruction in your Dockerfile now handles it:
services:
  ...
  kestra:
    build: .
    ...
When you start Kestra using Docker Compose, the Python dependencies will already be included in the container.
In both installation methods, you must run Python tasks using the Process Task Runner to ensure the code can access the dependencies installed in the Kestra server process.
You can verify that the dependencies are installed with the following example:
id: list_dependencies
namespace: company.team
tasks:
  - id: check
    type: io.kestra.plugin.scripts.python.Commands
    taskRunner:
      type: io.kestra.plugin.core.runner.Process
    commands:
      - pip list
Using cache files
In a WorkingDirectory task, you can create a virtual environment using the Process Task Runner, install all required pip dependencies, and cache the venv folder. This ensures the dependencies are reused in subsequent executions, eliminating the need for repeated installations. For more details, see the caching page.
The example below demonstrates how to cache the venv folder:
id: python_cached_dependencies
namespace: company.team
tasks:
  - id: working_dir
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: python_script
        type: io.kestra.plugin.scripts.python.Script
        taskRunner:
          type: io.kestra.plugin.core.runner.Process
        beforeCommands:
          - python -m venv venv
          - . venv/bin/activate
          - pip install pandas
        script: |
          import pandas as pd
          print(pd.__version__)
    cache:
      patterns:
        - venv/**
      ttl: PT24H
By using one of these techniques, you can avoid reinstalling dependencies for each execution and reduce overall execution time.
Was this page helpful?
