Virtual environments

Introduction

From the official documentation of Python[Foundation, 2021]:

“Python applications will often use packages and modules that don’t come as part of the standard library. Applications will sometimes need a specific version of a library, because the application may require that a particular bug has been fixed or the application may be written using an obsolete version of the library’s interface.

This means it may not be possible for one Python installation to meet the requirements of every application. If application A needs version 1.0 of a particular module but application B needs version 2.0, then the requirements are in conflict and installing either version 1.0 or 2.0 will leave one application unable to run.

The solution for this problem is to create a virtual environment, a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages. Different applications can then use different virtual environments. To resolve the earlier example of conflicting requirements, application A can have its own virtual environment with version 1.0 installed while application B has another virtual environment with version 2.0. If application B requires a library be upgraded to version 3.0, this will not affect application A’s environment.”

Throughout this book we will use a library called virtualenv to create the virtual environments. It is assummed that Python and pip are installed on a machine. To install virtual environment creation framework in Ubuntu use the command:

pip install virtualenv==20.10.0

Official page of the project: https://virtualenv.pypa.io/en/latest/.

Global Python Interpreter

By default, Python interpreter (or just Python) is installed in the directory:

/usr/local/bin

or

/usr/bin

For example, on my machine the full path to the Python interpreter is:

/usr/bin/python3.8

Every time the machine I am working on tries to run a Python a script, it uses that interpreter. Note that the version of that interpreter is 3.8.

The default path for the libraries are via:

/usr/lib/python3.8

To get the full list of installed libraries use the command:

pip freeze

In the default library directory, the command outputs alot of installed packages:

...
pyarrow==3.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycairo==1.16.2
pycparser==2.20
pycups==1.9.73
pydata-google-auth==1.2.0
pydot==1.4.1
Pygments==2.9.0
PyGObject==3.36.0
PyJWT==1.7.1
pymacaroons==0.13.0
PyNaCl==1.3.0
pyOpenSSL==19.1.0
pyparsing==2.4.7
pyRFC3339==1.1
...

The numbers after == symbolize the version of the package.

Ramen ratings dataset

The dataset is taken from https://www.kaggle.com/residentmario/ramen-ratings. The data is about various ramen shops around the world and their ratings. The head of the data:

!head ramen-ratings/ramen-ratings.csv
Review #,Brand,Variety,Style,Country,Stars,Top Ten
2580,New Touch,T's Restaurant Tantanmen ,Cup,Japan,3.75,
2579,Just Way,Noodles Spicy Hot Sesame Spicy Hot Sesame Guan-miao Noodles,Pack,Taiwan,1,
2578,Nissin,Cup Noodles Chicken Vegetable,Cup,USA,2.25,
2577,Wei Lih,GGE Ramen Snack Tomato Flavor,Pack,Taiwan,2.75,
2576,Ching's Secret,Singapore Curry,Pack,India,3.75,
2575,Samyang Foods,Kimchi song Song Ramen,Pack,South Korea,4.75,
2574,Acecook,Spice Deli Tantan Men With Cilantro,Cup,Japan,4,
2573,Ikeda Shoku,Nabeyaki Kitsune Udon,Tray,Japan,3.75,
2572,Ripe'n'Dry,Hokkaido Soy Sauce Ramen,Pack,Japan,0.25,

The objective of the get_best_ramen.csv script is to aggregate all the reviews around the world and create a ranking of the best ramen brands, regardless of Country, Style and Variety.

The project ramen-ratings/ has two files: the data file and the python script.

├── get_best_ramen.py
└── ramen-ratings.csv

The command

Using the Global Interpreter

python3 get_best_ramen.py 

will:

  1. Search for the default Python interpreter, which is /usr/bin/python3.8

  2. When loading all the libraries for the script (pandas, os and numpy), the command will use the default libraries in /usr/lib/python3.8/

  3. Run the script (convert it to machine language, wait for the compiler to respond and print out the input in human readable form)

!python3 ramen-ratings/get_best_ramen.py
Pandas version: 1.3.4
Numpy version: 1.21.3
Top 10 best ramen shops:
       Country                 Brand  Rating
0        Japan               Torishi     5.0
1     Malaysia                 Daddy     5.0
2        Japan              Takamori     5.0
3          USA  Komforte Chockolates     5.0
4        India             Patanjali     5.0
5  South Korea               Foodmon     5.0
6  South Korea            ChoripDong     5.0
7    Singapore           Prima Taste     5.0
8    Singapore                 Prima     5.0
9    Singapore               MyKuali     5.0

Top 10 worst ramen shops:
     Country            Brand  Rating
407   Taiwan        Nan Hsing    0.25
408    Japan           Mitoku    0.25
409      USA   Crystal Noodle    0.25
410      USA           Roland    0.00
411      USA       US Canning    0.00
412  Vietnam    Uni-President    0.00
413      USA  Dr. McDougall's    0.00
414   Taiwan            Tiger    0.00
415   Taiwan   Hsin Tung Yang    0.00
416      USA    One Dish Asia    0.00

As we can see, the script outputed 10 best and 10 worst ramen brands.

Using virtualenv

Imagine that a someone else wants to use the script get_best_ramen.py on their machine. If any package is missing (mainly, pandas or numpy) or their python version is too old, the script will not work.

This is a classic situation of “I don’t know, worked on my machine”. In order to have a robust collaboration, we need to freeze both the Python interpreter and the packages it is using. That is the main usecase of virtualenv!

To create an empty environment (with no packages) with a Python version of 3.8 and the environment name of “ramen_env”, use the command:

virtualenv --python 3.8 ramen_env

The command will create a new directory in the ramen project:

├── get_best_ramen.py
├── ramen_env
│   ├── bin
│   ├── lib
│   └── pyvenv.cfg
└── ramen-ratings.csv

The two most important directories are ramen_env/bin - interprete directory - and ramen_env/lib - the library path for the virtual environment.

To activate the environment use the command:

source ramen_env/bin/activate

The pip freeze command will now show no installed packages. It is very encouraged to keep a file to track the package versions. A common file used is requirements.txt. To install all the libraries in the file, use the command (be sure to activate the environment first!):

pip install -r requirements.txt

Now the command pip freeze will output 5 packages (some of the packages are automatically included in pandas and numpy installation):

numpy==1.21.4
pandas==1.3.0
python-dateutil==2.8.2
pytz==2021.3
six==1.16.0

Now to use the Python interpreter created by virtualenv and the specific packages of that environment, we can use the command:

<path-to-ramen-project>/ramen_env/bin/python <path-to-ramen-project>/get_best_ramen.py
!ramen-ratings/ramen_env/bin/python ramen-ratings/get_best_ramen.py
Pandas version: 1.3.0
Numpy version: 1.21.4
Top 10 best ramen shops:
       Country                 Brand  Rating
0        Japan               Torishi     5.0
1     Malaysia                 Daddy     5.0
2        Japan              Takamori     5.0
3          USA  Komforte Chockolates     5.0
4        India             Patanjali     5.0
5  South Korea               Foodmon     5.0
6  South Korea            ChoripDong     5.0
7    Singapore           Prima Taste     5.0
8    Singapore                 Prima     5.0
9    Singapore               MyKuali     5.0

Top 10 worst ramen shops:
     Country            Brand  Rating
407   Taiwan        Nan Hsing    0.25
408    Japan           Mitoku    0.25
409      USA   Crystal Noodle    0.25
410      USA           Roland    0.00
411      USA       US Canning    0.00
412  Vietnam    Uni-President    0.00
413      USA  Dr. McDougall's    0.00
414   Taiwan            Tiger    0.00
415   Taiwan   Hsin Tung Yang    0.00
416      USA    One Dish Asia    0.00

As we can see, the main difference is the outputed package versions: the default python installation uses Pandas 1.2.3 (versus 1.3.0 in the virtual env) and Numpy 1.21.3 (versus 1.24.4).

Summary

The term virtual environment sounds very fancy but in reality, it is just a location of Python interpreter, reference to its version and the path to the libraries.

To create, activate and populate the virtual environment with packages use the bash template:

virtualenv python --version <python version> <name of environment>
source <name of environment>/bin/activate
pip install -r requirements.txt

To use the enviroment:

python <python script name>.py # If the environment is activated
<name of environment>/bin/python <python script name>.py # If the environment is NOT activated