Python

Basics

This cheat sheet provides beginners and intermediate users a guide to starting using Python.

Accessing help and getting object types

# Everything after the hash symbol is ignored by Python
 
# Display the documentation for the max function
help(max) 
 
# Get the type of an object — this returns str
type('a')    

Importing packages

Python packages are a collection of useful tools developed by the open-source community. They extend the capabilities of the Python language. To install a new package (for example, pandas), you can go to your command prompt and type in pip install pandas. Once a package is installed, you can import it as follows:

# Import a package without an alias
import pandas                
 
# Import a package with an alias
import pandas as pd          
 
# Import an object from a package
from pandas import DataFrame 

The working directory

The working directory is the default file path that Python reads or saves files into. An example of the working directory is C://file/path. The os library is needed to set and get the working directory.

import os                          # Import the operating system package
os.getcwd()                        # Get the current directory
os.setcwd("new/working/directory") # Set the working directory to a new file path

Operators

Arithmetic operators

10 + 2    # Add two numbers with +
10 - 2    # Subtract a number with -
4 * 6     # Multiply two numbers with *
22 / 7    # Divide a number by another with /
22 // 7   # Integer divide a number with //
3 ** 4    # Raise to the power with **
22 % 7    # Get the remainder after division with %

Assignment operators

a = 5         # Assign a value to a
x[0] = 1      # Change the value of an item in a list

Numeric comparison operators

3 == 3    # Test for equality with ==
3 != 3    # Test for inequality with !=
3 > 1     # Test greater than with >
3 >= 3    # Test greater than or equal to with >=
3 < 4     # Test less than with <
3 <= 4    # Test less than or equal to with <=

Logical operators

~(2 == 2)             # Logical NOT with ~
(1 != 1) & (1 < 1)    # Logical AND with &
(1 >= 1) | (1 < 1)    # Logical OR with |
(1 != 1) ^ (1 < 1)    # Logical XOR with ^

Getting started with lists

A list is an ordered and changeable sequence of elements. It can hold integers, characters, floats, strings, and even objects.

Creating lists

x = [1, 2, 3]    # Create lists with [], elements separated by commas

List functions and methods

sorted(x)      # Return a sorted copy of the list e.g., [1,2,3]
x.sort()       # Sorts the list in place (replaces x)
reversed(x)    # Reverse the order of elements in x e.g., [2,3,1]
x.reverse()    # Reverse the list in place
x.count(2)     # Count the number of element 2 in the list

Selecting list elements

Python lists are zero-indexed (the first element has index 0). For ranges, the first element is included but the last is not.

x = ['a', 'b', 'c', 'd', 'e']    # Define the list
x[0]        # Select the 0th element in the list
x[-1]       # Select the last element in the list
x[1:3]      # Select 1st (inclusive) to 3rd (exclusive)
x[2:]       # Select the 2nd to the end
x[:3]       # Select 0th to 3rd (exclusive)

Concatenating lists

x = [1, 3, 6]
y = [10, 15, 21]
x + y       # Returns [1, 3, 6, 10, 15, 21]
3 * x       # Returns [1, 3, 6, 1, 3, 6, 1, 3, 6]

Getting started with dictionaries

A dictionary stores data values in key-value pairs. That is, unlike lists which are indexed by position, dictionaries are indexed by their keys, the names of which must be unique.

Creating dictionaries

# Create a dictionary with {}
{'a': 1, 'b': 4, 'c': 9}    

Dictionary functions and methods

# Define the x dictionary
x = {'a': 1, 'b': 2, 'c': 3}   
 
# Get the keys of a dictionary, returns dict_keys(['a', 'b', 'c'])
x.keys()    
 
# Get the values of a dictionary, returns dict_values([1, 2, 3])
x.values()    
 
# Get a value from a dictionary by specifying the key
x['a']      

NumPy arrays

NumPy is a Python package for scientific computing. It provides multidimensional array objects and efficient operations on them. To import NumPy, you can run this Python code import numpy as np

Creating arrays

np.array([1, 2, 3])         # Convert a Python list to a NumPy array
np.arange(1, 5)             # Return a sequence from start (inclusive) to end (exclusive)
np.arange(1, 5, 2)          # Return a stepped sequence
np.repeat([1, 3, 6], 3)     # Repeat values n times
np.tile([1, 3, 6], 3)       # Repeat values n times

Math functions and methods

All functions take an array as the input.

np.log(x)            # Calculate logarithm
np.exp(x)            # Calculate exponential
np.max(x)            # Get maximum value
np.min(x)            # Get minimum value
np.sum(x)            # Calculate sum
np.mean(x)           # Calculate mean
np.quantile(x, q)    # Calculate qth quantile
np.round(x, n)       # Round to n decimal places
np.var(x)            # Calculate variance
np.std(x)            # Calculate standard deviation

Getting started with characters and strings

Creating strings

"Ali Azlan"    # Create a string with double or single quotes
"He said, \"Ali Azlan\""    # Embed a quote in string with the escape character \
 
"""
A Frame of Data
Tidy, Mine, Analyze It
Now You Have Meaning
"""    # Create multi-line strings with triple quotes

Selecting string elements

str[0]        # Get the character at a specific position
str[0:2]      # Get a substring from starting to ending index (exclusive)

Combining and splitting strings

"Data" + "Framed"     # Concatenate strings with +
3 * "data "           # Repeat strings with *
"beekeepers".split("e")    # Split a string on a delimiter

Mutate strings

str = "Jack and Jill"
str.upper()           # Convert a string to uppercase
str.lower()           # Convert a string to lowercase
str.title()           # Convert a string to title case
str.replace("J", "P") # Replace matches of a substring

Getting started with DataFrames

Pandas is a fast and powerful package for data analysis and manipulation in Python. To import the package, you can use import pandas as pd. A pandas DataFrame is a structure that contains two-dimensional data stored as rows and columns. A pandas series is a structure that contains one-dimensional data.

Creating DataFrames

# Create a dataframe from a dictionary
pd.DataFrame({
    'a': [1, 2, 3],
    'b': np.array([4, 4, 6]),
    'c': ['x', 'x', 'y']
})
 
# Create a dataframe from a list of dictionaries
pd.DataFrame([
    {'a': 1, 'b': 4, 'c': 'x'},
    {'a': 1, 'b': 4, 'c': 'x'},
    {'a': 3, 'b': 6, 'c': 'y'}
])

Selecting DataFrame Elements

df.iloc[3]              # Select the 3rd row
df['col']              # Select one column by name
df[['col1', 'col2']]   # Select multiple columns by names
df.iloc[:, 2]          # Select 2nd column
df.iloc[3, 2]          # Select the element in the 3rd row, 2nd column

Manipulating DataFrames

pd.concat([df, df])                     # Concatenate DataFrames vertically
pd.concat([df,df], axis=1)              # Concatenate DataFrames horizontally
df.query('logical_condition')           # Get rows matching a condition
df.drop(columns=['col_name'])           # Drop columns by name
df.rename(columns={"oldname": "newname"})   # Rename columns
df.assign(temp_f=9/5 * df['temp_c'] + 32)  # Add a new column
df.mean()                               # Calculate the mean of each column
df.agg(aggregation_function)            # Get summary statistics by column
df.drop_duplicates()                    # Get unique rows
df.sort_values(by='col_name')           # Sort by values in a column
df.nlargest(n, 'col_name')             # Get rows with largest values in a column

Last updated on