Loop over list of lists DataCamp
Datacamp course notes on iteratiors, list comprehensions and generators. Show IteratorsFor loopWe can use a for loop to loop over a list, a string, or over a range object. The reason why we can iterate over these objects is that they are iterables. Iterables:
Iterator:
To sum up:
To create an iterator from an iterable, all we need to do is to use the function iter() and pass in the iterable.
To iterate over dictionaries, for key, value in my_dict.items(): is necessary when calling the for loop. To iterate over file connections:
Using enumerate()enumerate() is a function that takes any iterable as an object, such as a list, and returns a special enumerate object, which consists of pairs containing the elements of the original iterable, along with their index within the iterable. We can use the function list to turn this enumerator object into a list of tuples (index, element), and print it to see what it contains.
Using zip()zip() accepts an arbitrary number of iterables and returns a iterator of tuples (list1_element1, list2_element1, list3_element1).
Using iterators to load large filesWhen the file is too large to be hold in the memory, we can load the data in chunks. We can perform the desired operations on one chunk, store the result, disgard the chunk and then load the next chunk of data. An iterator is helpful in this case. We use pandas function: read_csv() and specify the chunk with chunksize.
Applying the trick in the tweeter case
List ComprehensionsList comprehension can collapse for loops for building lists into a single line. It create lists from other lists, DataFrame columns, etc., and is more efficient than a for loop since it only takes a single line of code. Required Components:
When we have a list of number and we want to create a new list of numbers, which is the same as the old list except that each number has 1 added to it. Instead of using a for loop in multiple lines, we can use list comprehension to finish this operation in one line as follows:
List comprehension is not restricted to lists, and can be used on any iterables.
We can also replace nested loops with list comprehensions:
To create a matrix by list comprehension:
Advanced ComprehensionsConditionals on the iterable:
Conditionals on the output expression:
Dictionary comprehensions to create dictionaries:
GeneratorsGenerator is very similar to a list comprehension, except that it is not stored in the memory and does not construct a list. But we can still iterate over the generator to produce elements of list as required. It becomes very useful when you dont want to store the entire list in the memory.
Lets say we want to iterate through a large number of integers from 0 to 10 ** 1000000:
We can also apply conditions to generators:
Generator FunctionsGenerator functions are functions that, when called, produce generator objects. It yields a sequence of values instead of returning a single value. It is defined just as other functions, except that it generates a value with yield in steadt of return at the end.
Another example:
.item() and range() actually also creates generators behind the scenes when they are called. Case Study: World bank data
Turn dictionary into dataframe
Writing a generator to load data line by lineUse a generator to load a file line by line. If the data is streaming, which means if more data is added to the dataset while you doing the operation, it will read and process the file until all lines are exhausted. Context manager: The csv file 'world_dev_ind.csv' is in the current directory for your use. To begin, you need to open a connection to this file using what is known as a context manager. For example, the command with open('datacamp.csv') as datacamp binds the csv file 'datacamp.csv' as datacamp in the context manager. Here, the with statement is the context manager, and its purpose is to ensure that resources are efficiently allocated when opening a connection to a file. Rough thoughts:
Writing an iterator to load data in chunks
|