Announcing Learn Python the Hard Way's Next Edition

Announcing the new version of Learn Python the Hard Way which will be entirely focused on Pre-Beginner Data Science and not web development.

By Zed A. Shaw

Announcing Learn Python the Hard Way's Next Edition

Did you know when you sign a contract with a publisher you have to update your books? Neither did I! I'm mostly joking but I've had enough demands and complaints from readers of Learn Python the Hard Way that it was time for an update, but I was too deep in JavaScript land to have bandwidth for it. Then last month my Publisher started bothering me for updates as well, so now I'm on the hook for a new edition.

I was reluctant to work on anything new related to Python due its stagnation in the web development space, but a few recent events have changed my mind: Codon and the popularity of Data Science.

Codon

I'm really excited about Codon and I'll be playing with it in the near future. I have a couple fun projects in mind that specifically leverage Codon's abilities, and I'll hopefully have a few articles about Codon in practice. Mostly I'm interested in how Codon compiles Python, and it's ability to interface with C fairly easily. It also seems to be really well designed and apparently it can embed the cpython interpreter for those cases where you absolutely have to run Python.

Here's their example showing the @python decorator embedding the Python interpreter when you need it:

@python
def scipy_eigenvalues(i: List[List[float]]) -> List[float]:
    # Code within this block is executed by the Python interpreter,
    # so it must be valid Python code.
    import scipy.linalg
    import numpy as np
    data = np.array(i)
    eigenvalues, _ = scipy.linalg.eig(data)
    return list(eigenvalues)
print(scipy_eigenvalues([[1.0, 2.0], [3.0, 4.0]]))  # [-0.372281, 5.37228]

What's amazing about this design is it's combined with very a easy C FFI interface due to Codon's use of LLVM as the backend:

from C import pow(float, float) -> float
pow(2.0, 2.0)  # 4.0

# Import and rename function
# cobj is a C pointer (void*, char*, etc.)
# None can be used to represent C's void
from C import puts(cobj) -> None as print_line
print_line("hello".c_str())  # prints "hello"; c_str() converts Codon str to C string

You can even inline the LLVM IR directly in your code for the rare cases when the compiler needs a little help:

@llvm
def popcnt(n: int) -> int:
    declare i64 @llvm.ctpop.i64(i64)
    %0 = call i64 @llvm.ctpop.i64(i64 %n)
    ret i64 %0

print(popcnt(42))  # 3

I have a few projects in mind that could use this in the future, but I will need to fully review it and I do have some reservations about its license. More on that later.

Python is Data Science Now

Codon is awesome, and it's definitely getting me interested in Python again, but the real winner in the Python world is Data Science. Right now AI, Data Science, and Machine Learning are hot, and they're the primary thing Python is being used for. I think most of the students who contact me wanting to learn Python are interested in the world of Data Science and not web development or "backend" programming. I think languages like Go, Rust, and JavaScript have largely supplanted Python for general systems programming, and there's some evidence from Github that shows this trend.

Here's a list of the top 20 Python projects on Github by stars. Do you notice something?

NameStarsCategory
public-apis241137Scraping
system-design-primer220942Systems
awesome-python169124Education
TheAlgorithms/Python159025Education
Python-100-Days136201Education
Auto-GPT135364ML/DS
youtube-dl120574Scraping
transformers101767ML/DS
stable-diffusion-webui78293ML/DS
thefuck77571Systems
django71010Web
HelloGitHub69243Education
pytorch67235ML/DS
flask63061Web
home-assistant/core60641Systems
awesome-machine-learning58946ML/DS
keras58428ML/DS
fastapi58363Web
ansible57471Systems
scikit-learn54360ML/DS
cpython53404Python
manim51485Graphing
funNLP50741ML/DS
requests49698Scraping
face_recognition48357ML/DS
yt-dlp47975Scraping
PayloadsAllTheThings47941Security
you-get47412Scraping
scrapy47303Scraping
localstack47235Systems

WARNING! If you attempt to query Github's API to determine popularity yourself it will return RANDOM RESULTS. Don't trust any metrics that use this as a measure of popularity without running the request for a few hours every 5 minutes to get enough samples to be sure you actually got every project in the top 30. I will have a blog post on this stupidity in the near future, but the results shown here are based on repeated queries.

If we count the projects by their categories we have the following breakdown:

Project TypeCount
ML/DS9
Scraping6
Education4
Systems4
Web3
Security1
Graphing1
Python1

It's almost entirely data science projects, especially if you consider things like Graphing and Scraping being something primarily used in Data Science. If you do that then 80% of the top most popular projects on Github are related to Data Science. This fits with the wild success of Data Science, AI, and Machine Learning in the last five years, and the relative lack of innovation in Python's other use cases such as web development and systems management.

Now, if you think this isn't a fair analysis of popularity I want to stress that everyone is also quoting this as a measure of Python's general popularity. You aren't allowed to rave about Python climbing to the top of the Github stars chart and then balk at the suggestion that, actually, it's Data Science that's popular. Either stars are meaningless and Python's not popular, or stars are important and Python Data Science is popular.

The Master Plan

Learn Python the Hard Way has always been focused on Pre-Beginners in that it assumes nothing and aims at building the knowledge someone needs to eventually learn the topic. My approach is not to teach someone to be a master of the subject, but to teach them all the things other writers assume "beginners" already know. If you've ever read a book that starts with print("Hello World") then jumps to "a monad is just a monoid in the category of endorfunctors" then my book teaches you what that author assumes you know.

Focusing on Data Science in my style means that I won't teach you the entire world of Data Science, since that's already covered by many more qualified people than me. My goal in the new Learn Python the Hard Way is to teach you everything about Python programming that those courses assume you already know. When you're done with my book you'll have the skills you need to then understand other books.

A secondary goal in the new book is to get you familiar with the basic tools used in Data Science, like Jupyter, Pandas, Anaconda, and low level topics like data munging, testing, and graphing. I won't go extremely deep into these topics, but having a familiarity with them will make other books easier to understand.

Finally, I'm going to target the new book at a secondary audience of people who are knowledgeable of Data Science, but maybe they feel their Python skills are lacking. This would be anyone who has impostor syndrome when they write Python code and who wants to feel more confident in their basic Python knowledge. I want to "upgrade" people from strictly using Jupyter to creating full Python projects with automated testing for repeatable results in addition to detailed explanations of basic Python topics.

The Outline Thus Far

I've submitted the following outline to my publisher, but I'll be changing this as I work through the exercises using Jupyter. Remember that the goal of this course is not to craft a grand master of Python Data Science, but to teach a Pre-Beginner the basics of Python most other books assume you have.

First I start off with the usual first set of exercises to get people into controlling a computer with language, but I'll be using Anaconda and Jupyter exclusively to get people started.

Then I move on to simple I/O but focused on how to use Jupyter to create the files and open them. It's at this point that I'll start "weening" people off Jupyter and start making little scripts using a simple external text editor. This will help when they want to move their work into an external project to share, or start adding more traditional Python resources such as automated testing, deployment, and package sharing.

It's at this point I can start introducing simple functional programming and data structures. There's some people who hang out on Stack Overflow yelling at beginners that think you should start with OOP right away, but there's a significant problem with this belief:

You can construct all of Object Oriented Programming from just functions and dicts. You can't construct functions and dicts from objects and classes without first explaining functions and dicts.

With that in mind I'll teach functions and functional programming first so that later I can show them how to build their own Object Oriented System from first principles.

With functions covered I can then get into deeper into strings and the basics of simple data types:

After learning an introductory level of these basic data structures, and the previous information on jumps and functions, it's time to get into boolean logic, loops, and if-statements. Once again, if you know about jumps, and you know about boolean tests, then you can understand if-statements. If you understand jumps and if-statements then you can figure out basic looping. After that it's a process of combining data structures with more advanced loops like for-loops:

It's at this point that I've taught the fundamental parts of how programming works, so everything after this is either practicing those concepts or adding on concepts that use those fundamentals.

Object Oriented Programming is an example of something that's far easier to teach once someone knows about dict and functions, so we get into this here. In the past I tried to "sneak" in an understanding of OOP with a weird method, but my JavaScript course has taught me that it's easier to teach people how to build their own basic OOP system with dict and closures, then show how that "maps" to the built-in OOP of the language:

Once they reach this point they're probably ready to move off Jupyter and learn how to create a regular Python project with automated testing. This will cover more traditional developer tools, and I might throw in an exercise that has a CLI crash course right here rather than as an appendix.

Finally, this is a book about getting someone ready to study other Data Science books, so I'll spend the final exercises lightly touching on various data science topics. Things like Data Munging, DataFrames, Graphing and simple analysis. I might add in a bit of SQL but I'm not sure if I could cover enough SQL in a few exercises to be useful.

That's the plan so far. If you have feedback on this list of topics based on what you do as a Data Scientist then feel free to contact me @lzsthw on Twitter. My only warning is, if you're looking to get me to teach people that one thing you found annoying at your last job or to turn them into Python true believers, then don't bother. I don't indoctrinate people. I create independent learners who question what they learn and form their own opinions.

Price Increase and Upgrades

Inflation is kicking everyone's grapes and I'm no different, so the price on the finished course will be $59 going forward. However, I will offer an upgrade price for the difference if you already bought my previous version, and I'll give a free upgrade to anyone who buys (or has bought) the current version of Learn Python the Hard Way after April 2023.

This means if you bought it April 30th, 2023 you can pay $20 for an upgrade. If you bought it after May 1, 2023 you'll get a free upgrade. You'll also get early access to the content as I work on it and access to my Discord for help and feedback just like with Learn JavaScript the Hard Way.


More from Learn Code the Hard Way

Announcing _Learn Python the Hard Way_'s Next Edition

Announcing the new version of _Learn Python the Hard Way_ which will be entirely focused on Pre-Beginner Data Science and not web development.

AnnouncementPublished May 11, 2023

Ten Reasons Youtube's Streaming is Awful

I did a test of Youtube and its streaming has tons of problems. Here's 10 reasons why Youtube's streaming is mostly pointless when compared to Twitch. I'll use Twitch for streaming, then post to youtube.

JavaScriptPublished Mar 25, 2023

SPA vs. MPA, FIGHT!

Getting realistic about Single-Page vs. Multi-Page applications.

JavaScriptPublished Feb 16, 2023

How to Create Your Own `npm init` and Get Off npmjs.com

After struggling with npm init I figured out a way to avoid it entirely that ends up being easier.

JavaScriptPublished Dec 10, 2022