Why Jupyter?¶
It's fun to use¶
I find alternative IDEs like Spydern and RStudio messy and unatractive. General purpose IDEs like VS code are missing features that scientists and data analysts might want. Jupyter has all the features you could want for exploratory programming and data analysis, is highly customizable, and isn't an eyesore to look at.
One of my favorite aspects about Jupyter is that it respects your screen space my minimizing clutter and allowing you to easily customize what is and isn't shown. This means your active work environment can occupy your entire screen rather than 1/3 of it as in Spyder or RStudio.
Widely used¶
Python is the most commonly used programming language for data science jobs and Jupyter is one of its most popular IDEs. Jupyter is also commonly used on cloud computing platforms. Google colab is designed from and fully integrates with Jupyter.
Multifuntionality¶
- Notebooks
- Scripts
- Markdown files with LaTex and HTML integration
- Multiple languages
Integration with environments¶
Jupyter makes it easy to work with conda environments and encourages good habits for maintaining reproducible code.
Why not Jupyter?¶
- You don't like notebooks.
- You don't use python. If you're exclusively and R user, stick to R studio. It's more reliable and easier to get running for R.
- You're not doing data analysis. Jupyter is first and foremost for scientific computing. If you're doing something like software development, PyCharm or VScode are probably better options.
Setting up Jupyter¶
Step 1: Install Conda (or Mamba)¶
Anaconda is the most popular way to interface with Jupyter lab due to its easy to manage environments. This goes a long way in encouraging best practices, and will save you massive headaches in the long run.
However, the basic Anaconda distribution comes with a lot of software you will probably never use and will routinely fail to install new packages. I recommend installing Mamba, a light weight and particularly fast reimplemntation of the conda package manager. You get all of the features you would get with conda, but with significantly greater speed and reliability. If you've already got a version of anaconda on your computer, I recommend just removing it with anaconda-clean and starting with a fresh install of Mamba.
Step 2: Set up an environment¶
Virtual environments create contained computing environments that can run independently and won’t interfere with anything else on your computer. They also make your work easier to reproduce. They are not required to run Jupyter, but you’ll regret it down the road if you don’t use them.
I like to have a “sandbox” virtual environment for things like small projects that don't need to worry about reproducibility. It has most of what I need, I can spend no effort maintaining it, and if it ever breaks I can just delete it and make a new one. If working on a larger project I create a separate environment specific to that project. For now, let’s create a sandbox environment. In your terminal type:
This will create an environment named sandbox, with python 3.11 and the listed packages. You can install additional packages later, but generally its best practice to install them when you create the environment if you know what you need. This reduces the need to reconcile incompatable package versions in the future
Step 3: Launching JupyterLab¶
If you have a full Anaconda installation you can launch Jupyter through the Anaconda GUI. However, I recommend launching it through the command line. This allows you to see exactly what the notebook is doing, which can be helpful when working with long processes. You will also use the command line to install packages and manage your environments anyways. To launch JupyterLab, activate your environment and and then simply type 'jupyter lab'.
JupyterLab runs in your browser at localhost:8888. If you accidentally close the tab running Jupyter this doesn't shut down your instance and you can access it again by simply typing localhost:8888 into the address bar.
Note that you can launch multiple instances of an environment and of JupyterLab
JupyterLab Features¶
Themes¶
Jupyter lab offers light and dark themes by default. You can also download more themes via plugins.
Navigating Files¶
The file navigation pane on the left is fully operational and is handy for looking up file paths without having to pull up another window. This is one of those small details that saves a lot of tedium.
Jupyter Lab is also equipped with a lightning fast CSV reader for when you need to inspect the data directly
Plugins and Extensions¶
Jupyter lab is highly customizable with plugins and extensions. I generally recommend going light with plugins though because they are often just one more thing that can break. However, I do recommend taking the time to set up some form of AI integration. I have two recommendations:
Jupyter-ai-core is the official AI extension for JupyterLab. It integrates directinly into the IDE and can directly interface with your notebook. You will need an API key to set up this extension. You can install it directly from the plugins tab or with pip.
You can also use a Chrome extension called "Chat GPT - Jupyter - AI Assistant". This provides an interface with the free chatGPT web app. It's free, but much more limited in its functionality.
The Launcher¶
The launcher is where you open one of JupyterLab's many tools. If you've installed nb_conda_kernels you'll be able to select the environment you want to use. Almost all of what you do in JupyterLab will be in one of three file types:
- Markdown Files
- Python (or R) scripts
- Notebooks
Markdown files can be rendered in real time so you can preview how edits look. You can also integrate HTML and LaTeX into markdown files.
Python and R files are ideal for simple programs or for when you're ready to consolidate your code into a single executable file. You can also open up a console for your script with the right click so that you can execute lines of code in your file.
Notebooks¶
Notebooks are the heart of JupyterLab and are where you will spend the most of your time.
Cells¶
Notebooks consist of individual cells that can be one of three types:
- Markdown: For formatting text and rendering images. This text was written in a markdown cell.
- Code: For interpreting programming languages
- Raw: For unformatted text.
I find that I never use raw cells but they do have some advanced applications.
Markdown Cells¶
Markdown cells are highly flexible and can be used to turn notebooks into a presentation tool or simply as a way to organize your work. Here are a few things you can do:
- organize notebook with a table of contents
- create lists and checklists
- This is an incomplete check list item
- This is a complete check list item
- render images
- write math $ \int_0^\infty \frac{x^3}{e^x-1}\,dx = \frac{\pi^4}{15} $$w $$
What do I use markdown cells for?
- Notes and to do lists at the top of a notebook
- Headings and organization
- Notes and comments throughout the notebook.
Markdown cells are not a replacement for properly commented code
Code Cells¶
# Code cells are the default cell in Jupyter. They work exactly as you expect.
# When you run a code cell, all of the code inside of it will be executed and the output will be printed below.
print("Hello World!")
# You do not necessarily need to call a print statement to display the output.
"Hello World!"
Hello World!
'Hello World!'
Magic Commands¶
Magic commands are shortcuts that provide additional functionality to the notebook. '%' denotes a line magic, and '%%' denotes a cell magic. Cell magic commands need to be placed at the top of a cell or they will return an error. There are a lot of them, but below are some of the more useful ones.
Install packages:
- %pip install packagename
- %conda install packagename
Navigate directories:
- %pwd
- %cwd
Other languages:
- %%bash
- %%html
- %%latex
- %%python2
- %%python3
- %%javascript
- %%ruby
# you can also use magic commands to run a python script from a cell
%run example_script.py
id10 text Rubio Paul \ 0 8848327098 We’ll never know the full extent to which Trum... 0 0 1 7413065131 Jim Jordan Have you been tested? 0 0 2 8979047673 Rubio So you’re implying that the con man posi... 1 0 3 1750604534 COMING UP TODAY ON #AMJOY:\n\nGiuliani \n\n#SA... 0 0 4 8259143616 @MSNBC @NBCNews Biden’s attorney already calle... 0 0 Cruz Jordan McCarthy Scott Scalise Giuliani ... Scott_lab \ 0 0 0 0 0 0 0 ... 0.0 1 0 1 0 0 0 0 ... 0.0 2 0 0 0 0 0 0 ... 0.0 3 0 0 0 0 0 1 ... 0.0 4 0 0 0 0 0 1 ... 0.0 Scalise_lab Giuliani_lab Ivanka_lab Hawley_lab McConnell_lab Bush_lab \ 0 0.0 0.0 -1.0 0.0 0.0 0.0 1 0.0 0.0 0.0 0.0 0.0 0.0 2 0.0 0.0 0.0 0.0 0.0 0.0 3 0.0 0.0 0.0 0.0 0.0 0.0 4 0.0 1.0 0.0 0.0 0.0 0.0 Reagan_lab conel_lib conel_con 0 0.0 1 0 1 0.0 0 1 2 0.0 1 0 3 0.0 0 0 4 0.0 0 1 [5 rows x 30 columns]
The %%time magic command is easily my most used command. Place it at the top of a cell to time how long it takes to execute the cell. This is incredibly useful for benchmarking long executions
%%time
from time import sleep
words = ['this', 'should', 'take', 'five', 'seconds']
for word in words:
print(word)
sleep(1)
this should take five seconds CPU times: total: 0 ns Wall time: 5 s
Keyboard Shortcuts.¶
I cannot stress this enough: You will hate using Jupyter unless you get comfortable with the basics of keyboard shortcuts. Jupyter is designed to keep your hands on the keyboard and is very cumbersome if you use point and click. Luckily, most shortcuts are a single button press and will quickly feel natural. There are a lot of shortcuts and you can bring up a list with ctrl+shift+H. However, these are the ones that are the most essential and will get you moving quickly:
I also recommend you get comfortable with using the Home, End, Page Up, and Page Down keys.