From Data Engineering to Career Growth and Everything In Between

Advanced Python Topics

So you have a solid working understanding of Python but would like to take your knowledge to a new level? I don’t think the plateau between intermediate and advanced tech topics is talked about enough. Perhaps because it’s hard to quantify what level different topics are, but perhaps also because some of the feelings around not being advanced enough could be due to imposter syndrome. This is my attempt to bridge the gap by giving readers a starting point. This list is not meant to be exhaustive, but to help get ideas flowing. Read on for some concepts I consider advanced in Python and some linked articles that I found helpful that go into much more detail.

Generators

Generators allow you to create an iterable (a lazy iterable, meaning it is not stored in memory) using the yield keyword within a function. The resulting generator can be iterated over like a list but has the added advantage of not being stored in memory, making generators an excellent choice for reading large files and streams. For example:

def read_csv_file(file_name):
    for row in open(file_name, "r"):
        yield row

Note that unlike with return, using the yield keyword allows you to save the state of the function instead of completely quitting it and starting anew if you rerun the function. Click here for further reading.

Comprehensions

List comprehensions are not just a fancy way of writing a one-liner to create a list, they also have the added benefit of being more efficient and less memory-intensive than for loops. They’re awesome for filtering and performing a simple operation on every item in a list. For example:

list_comprehension = [n + 1 for n in original_list]

Click here for further reading.

Also, look into dictionary, set, and generator comprehensions here.

Multiprocessing Versus Multithreading

Because of Python’s Global Interpreter Lock trying to prevent conflicts, true multithreading is not possible with one memory heap, so using the multithreading library only creates the illusion that threads are running in parallel. However, since multiprocessing uses different CPU cores which do not share CPU resources, this library allows for actual concurrency. Multiprocessing can be extremely useful for breaking down large data structures or other processing tasks into smaller ones that run a lot faster due to the concurrency. This is a bit of am oversimplification. This article provides more detail.

Specific Applications of Python – Libraries, Frameworks, Etc.

This is a huge topic, so I am going to gloss over it. If you’re using Python for web-app development, you will likely need to know tools/frameworks like Django, Flask, Pyramid, Django Rest Framework, SQLAlchemy, etc. If you’re like me, who mainly uses Python for Data Engineering, you’ll want to know libraries like:

  • Pandas
  • Matplotlib
  • Numpy
  • Seaborn
  • Boto3
  • Your data warehouse’s SQL connector

Memory Management

Python aims to solve the issue of handling garbage collection and low-level memory management yourself (like you need to with Java, C, and C++). This is excellent for writing quick scripts and avoiding memory leaks, but it has some cons, like being more memory-intensive. Though Python handles this out of the box, you should still learn about it. If only just for interviews. This is a great article that goes into much more detail.

Lambda Functions

Lambda functions are anonymous, one-line functions, perfect for concise and on-the-fly operations such as map, filter, and reduce, as well as streamlined data frame manipulation. Here’s an example:

filtered_list = list(filter(lambda val: (val > 100), original_list))

See this article for more info.

Decorators

Decorators allow you to extend the logic of another function without changing that other function. See this article for more info. A common use-case is with @classmethod and @staticmethod. Decorators also promote code reusability and modular design, providing a versatile way to enhance functions with additional behavior. Here are some more use-cases.

OOP, But Make it Pythonic

Object-oriented Programming is a way of structuring your code, often using classes. In Python, this approach emphasizes simplicity and readability, enabling developers to express complex ideas with clean and concise code. For more information on OOP in Python, see this article. For more info on code-organization paradigms in Python, see this article.

PEP 8

PEP 8 is the official code standard for Python, and trust me when I say there are some people who make what’s “Pythonic” or not their entire personality. Jokes aside, learning about code standards can help you produce more consistent, readable code. The docs are a bit intimidating, so I recommend having an idea of what topics are covered, but not reading each page thoroughly. Setting up a linter can also help tremendously. Embracing PEP 8 not only enhances the clarity of your code but also fosters collaborative development and facilitates code maintenance. 

TDD, But Make it Pythonic

Test-Driven Development is an important standard to know about. In my experience, teams are organized so differently that you may not even be responsible for writing unit tests. That said, you should absolutely know how to communicate with the quality engineers who are writing them. In data engineering, tests can take many different forms. In my daily work at my last position, I ran a lot of manual tests using mock source-data in our data warehouse and loaded it into mock tables to make sure it looked as expected. Previously, I also helped set up a system to perform load testing on data pipelines. I’d like to write an article on this topic soon, so stay posted. In short, collaboration on testing practices ensures the robustness of your code and data processes, cultivating a culture of reliability and quality in your projects.

Conclusion

This is a very abridged list of topics with minimal information on each. Hopefully it helps give you an idea of some topics to look into. Of course, depending on how you plan to use Python, there will be myriad other topics that you’ll want to look into. Maybe the articles I linked will even inspire you to go down a Pythonic rabbit hole.

Here are some other topics that might interest you:

  • Efficiency of different data types under the hood
  • *args and **kwargs
  • Context management
  • Metaclasses
  • Asyncio
  • Sorting and searching algorithms
  • Itertools
  • Exception Handling
  • Magic Methods
  • Closures
  • Inhertiance and Encapsulation
  • Machine learning libraries
  • Data Science libraries
  • Regular Expressions

If you’re not sure how to continue from here, I highly recommend using ChatGPT to curate a list of topics to study based on your goals and building on what you already know.

Remember, the Python ecosystem is vast, and there will always be new areas to explore and master. Don’t hesitate to engage with the vibrant Python community, participate in forums or local meetups and stay curious. Continuous learning is the key to becoming a proficient developer. Let me know if you’d like me to elaborate on any of these topics or to do another article like this!

Leave a comment