How to extract a column from a list in python

Photo by Elizabeth Kay on Unsplash

ve been working with data for long. However, I sometimes still need to google “How to extract rows/columns from a data frame in Python/R?” when I change from one language environment to the other.

I am pretty sure that I have done the same for thousands of times, but it seems that my brain refuses to store the commands in memory.

You must know my feeling if you need to work with R and Python simultaneously for data manipulation.

Therefore, I would like to summarize in this article the usage of R and Python in extracting rows/columns from a data frame and make a simple cheat sheet image for the people who need it.

To note, I will only use Pandas in Python and basic functions in R for the purpose of comparing the command lines side by side. Some comprehensive library, ‘dplyr’ for example, is not considered. And I am trying my best to keep the article short.

Let’s begin.

The toy dataset to use.

We will use a toy dataset of Allen Iverson’s game stats in the entire article. The dimension and head of the data frame are shown below.

R output 1
Python output 1

Extract rows/columns by location.

First, let’s extract the rows from the data frame in both R and Python. In R, it is done by simple indexing, but in Python, it is done by .iloc. Let’s check the examples below.

which yields,

R output 2

which yields,

Python output 2

Please note that in the example of extracting a single row from the data frame, the output in R is still in the data frame format, but the output in Python is in the Pandas Series format. This is an essential difference between R and Python in extracting a single row from a data frame.

Similarly, we can extract columns from the data frame.

which yields,

R output 3

which yields,

Python output 3

When extracting the column, we have to put both the colon and comma in the row position within the square bracket, which is a big difference from extracting rows.

In our dataset, the row and column index of the data frame is the NBA season and Iverson’s stats, respectively. We can use those to extract specific rows/columns from the data frame.

For example, we are interested in the season 1999–2000.

which yields,

R output 4

which yields,

Python output 4

Please note again that in Python, the output is in Pandas Series format if we extract only one row/column, but it will be Pandas DataFrame format if we extract multiple rows/columns.

When we are only interested in a subset of columns, we can also add the column index.

which yields,

R output 5

which yields,

Python output 5

In addition to extracting rows/columns by index, we can also do the subsetting based on conditions. For example, we want to extract the seasons in which Iverson has played more than 3000 minutes.

which yields,

R output 6

which yields,

Python output 6

Of course, more complicated conditions can be passed to the square bracket, which only needs a True/False list with the length of the row number of the data frame.

For example, we want to extract the seasons in which Iverson’s true shooting percentage [TS%] is over 50%, minutes played is over 3000, and position [Pos] is either shooting guard [SG] or point guard [PG].

which yields,

R output 7

which yields,

Python output 7

We can apply any kind of boolean values in the “cond_” position.

Photo by Cody Board on Unsplash

In this article, we will cover how to extract a particular column from a 1-D array of tuples in python.

Example 



Input:  [[18.18,2.27,3.23],[36.43,34.24,6.6],[5.25,6.16,7.7],[7.37,28.8,8.9]]

Output: [3.23, 6.6 , 7.7 , 8.9 ]

Explanation: Extracting the 3rd column from 1D array of tuples.

Method 1: Using Slice

As a first step let us first define a 1D array of tuples with each tuple having 3 elements, if we consider these 3 elements as 3 columns, we can use the slicing technique to extract a particular column.

arr = np.array[[[18.18, 2.27, 3.23], [36.43, 34.24, 6.6],

                [5.25, 6.16, 7.7], [7.37, 28.8, 8.9]]]

Output:

array[[3.23, 6.6 , 7.7 , 8.9 ]]

Method 2: Using the lambda function

In this example, we are taking a pandas data frame and one of the columns is an array of tuples, we can slice that particular column and apply a lambda function to extract a particular column from the tuple of an array.

data = pd.DataFrame[{'approval': [10, 20, 30, 40, 50],

                     'temperature': [[18.18, 2.27, 3.23],

res = data['temperature'].apply[lambda x: x[2]].values

Output:

approval temperature 0 10 [18.18, 2.27, 3.23] 1 20 [36.43, 34.24, 6.6] 2 30 [5.25, 6.16, 7.7] 3 40 [7.37, 28.8, 8.9] 4 50 [12, 23, 3 The output for extracting 3rd column from the array of tuples [3.23 6.6 7.7 8.9 3. ]

Article Tags :

Video liên quan

Chủ Đề