PyCon AU 2018 -- Day Three

Following on from the Specialist Tracks Day at PyConAU 2018 and the First Main Conference Day at PyConAU 2018, was the Second Main Conference day, and then the Development Sprints.

The second day started with a keynote from Tracy Osborn, and then broke out into four rooms of 30-minute talk sessions for the day, and finished up with more lightning talks.

Keynote: Tracy Osborn

Tracy Osborn (@limedaring is someone who grew up around tech, was persuaded "tech was not for her" in college, and then returned to tech through web development. Tracy is best known in the Python community for her "easy to get started in web development" book series, Hello Web.

Tracy grew up in the mountains of Northern California, in a family that was into tech -- her grandfather worked at IBM for much of his career, and an uncle work in tech too -- in an area that was into technology: Northern California. So she was around computers basically from birth, and around the Web as it was starting to become popular. She even made her own websites, instead of writing school reports, back when that involved writing HTML by hand -- Tracy observed this was a shortcut to a good grade as the teachers were very impressed. She even made her own fan sites, incuding Tiryn Forest (hosted on AngleFire, and last updated in 1999!).

So naturally when Tracy went to college at Cal Poly she chose to study Computer Science. She had been doing it all her life, so it would be easy, right? Within the first hour of the first class, an introductory Java class, she was suddenly out of her depth and thinking she had missed something (as much as Java is a common teaching language, because of late 1990s history, it is not an easy language to get started with :-( ). She struggled on with Computer Science for much of her first year in college, doing well in the courses that involved design and less well in the courses involving studying Algorithms, until eventually a professor suggested "maybe computers are not for you" -- and then she quit Computer Science, and got an art degree in Graphic Design instead.

After college she worked as a front end web designer, avoiding JavaScript due to the trauma of Java classes (JavaScript is very different from Java, but was deliberately named to seem related back in the early days of the web; how unfortunate that naming backfired).

The rest of the keynote was the story of how Tracy found her way back into technology -- and ended up writing books about programming and web development. The short version is that she moved to Silicon Valley, everyone had a startup, and so she wanted to have a startup too.

Her startup was Wedding Invite Love, which has branched out into a number of related websites. Because her attempt to find a "technical co-founder" was unsuccessful, Tracy was drawn back into web development, this time with Django. She wrote ugly Python code -- but it worked. And in the process of running a startup, and seeing other startups' code, she learnt that working code was more important that beautiful code for a startup -- "end users don't see what is inside", and you can refactor later once you have learnt more. "Learn by doing," which is pretty much the only successful way to run a startup -- and the motto of her college.

After getting burnt out on running a startup she took a break. And for her break, she wrote a book: a better tutorial for starting with Django, based around a simple customisable tutorial site, without assuming any programmming background. Inspired by A Book Apart, she tried to get them to buy her book -- and then No Starch Press. But royalties are complicated and self-publishing was becomming more common, so she ran a Kickstarter campaign, publicised it at PyCon 2014 and published the book herself. The success of that book led to more kickstarter projects for more books, all published herself. And now she has helped many users learn programming, and design, despite being told computers were not for her.

Tracy also gave some advice for others wanting to follow their own startup / book path, including:

"Keep the marketing in mind when you build a product", eg talking to everyone at PyCon 2014 about her upcoming book helped make the Kickstarter a success (and it seems many of her books have been a success due to word of mouth in Python community).
Projects always take longer than you think; build in a bigger buffer in your timeline than you think you'll need.
Writing her book in Google Docs, using Markdown, allowed people, who were concerned with how long the book was taking to come out, review the content before publication -- and that feedback helped improve the book. (She laid the book out in InDesign, due to being familiar with it from her graphic design background.)
Mathematical perfection does not mean visual perfection, for instance the perceived colour can be affected by the amount of colour present, even with exactly the same colour used; and the bottom matte on art needs to be a little wider than the rest of the edges to avoid it appearing "too thin" (when the art was hung above eye level; but it is still done for effect now).

Earlier in her keynote Tracy also referenced her PyCon 2017 KeyNote: Anxiety, Self-Advocacy, and Promoting Yourself (video on YouTube), which seems worth going back and watching too.

Implementing a decorator for thread synchronisation

Graham Dumpleton, author of mod_wsgi (to link Apache to Python web applications), wanted to replicate the Java synchronized feature in Python, rather than needing to use lock objects directly.

His approach was to create a synchronized decorator, which can be applied to various Python features and automagically makes them synchronised, using a lock on an appropriate thing (eg, the object for a member function). The talk described how he evolved the design to make it more flexible, including how he used the wrapt module (which he wrote; documentation) to make the decorators more context-aware.

For anyone interested in the implementation details, the presentation slides contain lots of detail on the subtle edge cases the implementation had to handle. But for anyone who just wants to use it, the wrapt module includes his final synchronized decorator, usable with:

from wrapt import synchronized

@synchronized
def function(...):
    ...

whether the function is a top level function, an instance method or a class method.

Reflections on the Creative Process - Illustrated with Watercolour Painting

Grace Nolan, who works on IT Security at Google, and is helping organise PurpleCon, also does "wet on wet" watercolour painting as a creative outlet. She spoke, illustrated with painting intermissions, about what wet on wet watercolour painting had taught her about being creative in the context of software engineering. Grace started out with interactive art, and ended up in programming because of Kiwicon (an IT security related conference).

"Wet on wet" watercolour painting involves painting onto pre-wet paper, with the result that the colour tend to bloom a lot, and slowly blend together in watever way they want -- it is not entirely predictable.

Parallels with technology include:

The paper type matters -- textured paper tends to absorb the paint, and a more flat paper lets it sit on top ("know your hardware").
Laying the groundwork is important. Preparing the paper for painting is a lot like writing psuedo code before writing the real code.
Watercolour painting, and programming, can be stressful
You start with an optimistic belief that you can do it, but it does not quite work out how you hoped -- in painting and technology. (With programming there is more "reputation risk" -- an echo of Tom Eastman's keynote the day before.)
When you end up stressed by a task, especially a programming task, take time away. Eg, slowly sip a glass of water, and be present to the experience of the water.
Accept the situation: you cannot fight against what the water wants to do when painting.
The main reason she gets stressed is that she does not know what is happening. Giving up is often a short term solution. But the problem may still be there later. Getting more information (eg, looking at logs, or research) allows her to keep going, which is more in line with her values. Then she can commit to that decision.
Learning the techniques of others can help improve your craft.
Water colour "black" is usually made by blending different complementary colours together -- and you get a different "black" depending on which ones you choose.
Water colour gets its vibrancy from the white paper beneath; one of the worst things you can do is overpaint. Knowing when to stop is important in painting, and in programming.

Grace finished with some key takeaways:

Reflect on how you work
Self soothe
Talk to others about how you feel
Know that your community of people are willing to help and support you.

The approach she describd was based in Acceptance and Commitment Therapy. Grace also credited Chantal Jodin, a French artist (Google translation as a key inspiration to her painting, including the piece she painted during her talk.

The video of the presentation is well worth watching, for the inspiring painting while presenting.

FP Demystified

Eugene Van den Bulke was "FP Curious" -- curious about Functional Programming -- so he went to Lambda Jam and came away enthuaistic. He recommended Eugenia Cheng's keynote on Category Theory and Life, from Lambda Jam 2018; Eugenia is the author of "The Art of Logic".

Eugene's aim was to port Brian Londsdorf's class on Functional Programming in Javascript (featuring claymation hedgehogs teaching FP in Javascript) to Python. I think he succeeded in porting the code, but it felt like the 25-30 minute presentation format... did not help with demystifying a large complex topic!

Because of the speed of the presentation I struggled to take notes on everything covered -- it was a whirlwind tour of category theory, with examples in Python, presented from a Jupyter notebook which he filled in as he went. I suspect even watching it again one would need to pause repeatedly to take notes!

Some (hopefully not too inaccurate!) highlights:

A Box wraps a value; and a fold can extract the value from the box and apply a function to it. For instance in Python map can apply a function over a (set of) values. A Box is a functor -- something that can be mapped over. (The Box here is a custom implementation, rather than a Python built in.)
Currying translates a function taking multiple arguments into a collection of more specialised functions each taking one argument. This allows partial specialisation or pre-binding. In Python partial specialisation can be done with:
```
from functools import partial
foo = partial(FUNCTION, ARG)
```
or via returning a closure with the first argument bound.
An Applicative Functor has more structure than a plain Functor, but less than a Monad. They allow you to use apply() as well.
A lift makes a function usable with another type (eg, a wrapped type, like Box above).
Either is an type that allows storing two types of values, by convention a left value or a right value, such as a result of a function or an exception. An Option is a special case where the right type is None (so you can have a value or nothing, making the value optional, like NULLable columns in a database). A fold on an Either takes two functions (one for the left value type and one for the right value type).
A Monad) is a design pattern that allows boxing and chaining function calls together, because each function call returns the same (boxed) type. (See also Crockford's Law that when you understand a monad you lose the ability to explain it to others :-) )
A SemiGroup adds a concat method; a Monoid has a function that combines the object with another of the same type.

It was a valiant attempt, but as noted above felt fairly rushed, particularly in an area like Functional Programming which suffers a lot from "obscure" terminology (borrowed from Mathematical Category Theory -- the terms are precise, and accurate, but obscure/complex for what they represent).

Perhaps reviewing the video along side the Jupyter notebook might help make it clearer. Or watching it along side the original Claymotion hedgehogs :-)

Task Queues in Python: A Celery Story

Celery is the default Task Queue interface used with Python. It provides message queue of tasks, and a broker that distributes tasks to workers, both pieces of which have multiple alternative iplementations (eg, Amazon SQS). It is commonly used in Python to get message passing concurrency (because Python has relatively poor support for in-process threading, due to its internal locking).

While Celery is simple to start with, there are lots of features and it can quickly get complex to configure if you have a more specialsed use case. Out of the box Celery is tuned for short running tasks, and is not ideal for longer running tasks (minutes/hours/days). Unfortunately there are multiple places to configure Celery, and they interact in complex ways -- the presenter found they had to disable prefetching of tasks, in multiple places, to make their workload usable. And even then sometimes jobs got stuck in the queue while they still had capacity available.

There are other task queues available for Python including:

RQ -- Redis Queue -- which uses Redis as the queue storage. It is very simple and understandable.
Huey -- another little task queue, supporting Redis and SQLite. It has nice task chaining and distributed locking.
Dramatiq, a task queue supporting RabbitMQ and Redis. The presenter noted the documentation is note as good as they had hoped.
TaskTiger, another Redis based queue, with distributed locking support and more queueing primitives.

There is also Dask, which is not a task queue, but can be used for similar things; if you are already using Pandas it might be the best option.

The present switched to using RQ, which they are happy with. They chose it in part because they were already using Redis for caching in their application, so it did not introduce any more dependencies. (They are also using their own serialisation; the RQ default is JSON, which has some limitations on what can be serialised.)

How To Publish A Package On PyPI

The talk was renamed "Publishing (Perfect) Python Packages on PyPi" after submitting the abstract, as the presenter liked the alliteration :-)

There were two main approaches presented, the manual approach, and an almost entirely automated approach.

The manual approach involves using setuptools:

Create a new directory, with a src subdirectory
Write your module, put it in the src directory (putting the code in src avoids accidentally running uninstalled versions).

Write setup.py in the top directory, using setuptools:

from setuptools import setup
setup(name=...,
      version=....,
      description=....,
      py_modules=[...],
      package_dir='src')

Then run:
```
python setup.py bdist_wheel
```
to create a build folder, and an egg-info folder (`eggs are a specific Python version release)
Test your installation locally, in a virtualenv:
```
virtualenv venv
. venv/bin/activate
pip install -e
```
(-e so that it imports the thing you are editing, with links, which allows you to edit and retest that it works)
Add a .gitignore file; see gitignore.io to help create a useful .gitignore template for your lanuage.
Add trove classifiers to help make your project findable by common search terms, these go in classifiers=[...] argument to your setup() call).
Add a license, in LICENSE.
Create documentation in RST or Markdown (RST is common in Python). Write at least a README.md. Use Sphinx or ReadTheDocs to publish the documentation.
Consider using the README.md as your long description in the documentation, getting your setup.py to read in README.md from a file; PyPI now supports Markdown syntax.
Use pipenv for testing:
```
pipenv install -e
pipenv install --dev --pytest ...
pipenv shell
```
This creates a Pipfile file, and a Pipfile.lock which records the exact versions/hashes used. (The "lock" here is locking in the versions, not a traditional unix lock flag file.)
Recommendaton is to use setup.py for production dependencies, and Pipfile for development dependencies. Keep the versions in the Pipfile as relaxed as possible; the lock file will record the known good/last used ones.
Build a source distribution:
```
pip install check-manifest
check-manifest
python setup.py sdist
```
This builds a source tarball. It wants a URL and author details in the metadata, and a Manifest of files to include; use Manifest.in to add extra files. (check-manifest will create a Manifest from the file checked into git.)
Upload your packge with `twine:
```
pip install --dev twine
twine upload dist/*
```
You need to create a PyPI account first. (Do not use setup to upload; it uses an insecure upload method.)
Other things to do: test against other Python versions, eg, with tox, which creates a virtualenv for each Python version you want to test and runs your tests in that environment. Use Travis to do automated testing if your code is on GitHub, eg to automatically check pull requests.

The automated approach is to use cookiecutter, which will ask you a few questions and then give you a "best practice" template directory for your project with everything ready to go:

pip install cookiecutter
cookiecutter gh:audreyr/cookiecutter-pylibrary  # or
cookiecutter gh:audreyr/cookiecutter-pypackage

It directly grabs the template out of GitHub.

The examples used in the talk are a useful reference, in addition to the Python Packaging Authority guides.

Watch out for the Safety Bandits!

Tennessee Leeuwenburg wanted to highlight two security related tools to help make your code more secure:

safety checks your installed dependencies for packages with known issues (eg, CVEs):
```
pip install safety
safety check
```
There is an insecure-package which you can install to make sure safety is finding issues.
pyup.io can automtically alert you to issues with dependencies, for free if your package is maintained in public.
bandit looks for common programing patterns that are known to be weak. It analyses the internal structure of your code, as a tree. (Suggestion: group code with the same trust level in the same directory hierachy, so you can focus your most paranoid scans on the code most at risk, and reduce the noise on code that only works with "trusted" input.)

Lightning Talks

chunksof: a generator

Tim Heap wrote a generator which will break an iterable up into chunks, because there were many attempts on StackOverflow of varying correctness and lots of other approaches were not perfect:

slice() is good for lists, but breaks on generators.

* itertools.islice() is good but in some situations may not terminate (eg, if the iterator is empty; need lookahead using itertools.tee() and contextlib.suppress()). Unfortunately tee means that the iterated elements are not garbage collected until the end, so you can run out of memory.

Another alternative using itertools.chain() which works, and allows garbage collection of everything but the first example.
More elaborate alternative using a yieldone() inner generator, which allows everything to be garbage collected, but does not allow consuming the items out of order.

The "perfect" version is apparently only available as a Tim Heap gist of chunksof.py. To see the examples it appears you have to watch the video, and freeze frame, as they do not appear to be anywhere else.

PyO3

Nicholle James is making an anthology of Python 2.7 fan fiction, to be made online for free under a CC-BY-SA license, with print edition at PyCon 2019. Pitches were due by 1 September, with work due by around first quarter 2019; look out for an online release by about mid 2019.

Tracking trucks in East Africa (Nairobi, Kenya)

Tisham Dhar of Lori Systems helps implement a system to track trucks in Africa. "Uko Wapi?!": Where is the truck?! They have a smartphone application, but there is only about 40% penetration of smartphones amongst truckers. So they are using Traccar, a Java app, to track vehicles using vehicle GPS tracking hardware, that they reverse engineered (which send data back via a M2M SIM).

Where the tracker does not send data back, they call the driver, ask "uko wapi?", get a place name from the driver, and then use Google Maps API to locate the truck, sanity checking that the location makes sense on the route that the truck is on.

Python Sphinx and Jira

Brett Swanson, of Cochlear, Ltd works in a heavily regulated industry (hearing implants). Which means they need detailed software release reports. They used to build these by hand which was slow and error prone; now they generate tables in Sphinx (using list tables, which were easy to generate) with a Python script that can pull in the Jira issues.

String encodings and how we got in this mess

Amber Brown (hawkie) covered a whirlwind history of text encoding including:

Baudot, a 5 bit encoding used by telegraphs, with multiple variations (ITA1, Murray, Western Union, ITA2), since they mostly did not need to interoperate.
BCD, a 6-bit punch card encoding, from the 1920s
EBCDIC, an 8-bit encoding from IBM in the 1960s (part of the System/360 project).
ASCII, a 7-bit American standardised encoding, also from the 1960s

Eventually it was released none of these were enough to hold all the world's characters, which led to:

Shift JIS, an 8-bit/16-bit variable encoding for Japanese which includes ASCII as a subset
ISO 8859-x, a series of 8-bit encodings compatible with ASCII (but incompatible with each other)
UCS-2 and UTF-16, two 16-bit encodings to handle more characters
Unicode and UTF-8; Unicode is regularly extended to add more characters

Detection of which character set is in use, without metadata saying so, is difficult, which can lead to Mojibake -- strange results from decoding in the wrong character set encoding (originally named in Japan).

Building 3D physics simulations

Cormac Kikkert is a student who built a physics simulation in Python, using PyGame (rather than trying to use a 3D library directly). His approach is to divide the object into lines, then move those lines in the physics simulation, and redisplay; movement is with points and a velocity vector.

Python Bugs: Pentastomida

Libby Berrie gave her first talk, at her first Pycon, about Pentastomida, an actual bug which affects actual snakes. She is a front end developer who has a degree in bio informatics.

Pentastomida is a very old parasite (dated to around 450 million years old), which primarily infects snakes. Eventually the snakes cough up the parasites, which last for ages and are eventually eaten by the prey of snakes... and get back into snakes.

Apparently Pentastomida can also affect humans, generating flu-like symptoms and cysts in some (but most human infections are asymmtomatic).

Don't do this

Lewis Bobbermen, a student who works at Polymathian, wrote a decorator that allows using square brackets instead of paranthesis -- by abusing __getitem__. He also implemented Currying by overiding __or__ and __call__, and f-strings (new in Python 3.6) in Python 2 by walking back up the stack frame.

This eventually lead to his flip-flop operator, the first one submitted.

Confessions of a goto user

Alen Pulford, a student who was part of the student showcase, likes the GOTO statement. He gave examples in bash, Arduino (C), and a Python example... which reopens the source code. He refined that to a version that figures out where it is called from, and a recursive definition.

Flip Flop Face Offerator

Merrin MacLeod returned to the stage with a followup to her Saturday lightning talk on the Flip Flop operator. In the interveening 24 hours things had gotten a little out of hand, and in addition to Lewis's implementation above (the first submitted) there were several more implementations. So they had a Face Off, with two judges and voting.

You need to watch the lightning talk or look at the collected implementations, as I cannot do justice to the judges reactions!

MicroPython and Jupyter

Andrew Leech described jupyter-micropython-remote (source) which allows connecting a Jupyter notebook to MicroPython, which is very useful for debugging. It currently requires a daily build of MicroPython (or 1.9.5 when that is released), as the communication interface ia via mprepl.

Controversial PEPs (of the past)

Nick Coghlan, a Python core developer summarised some contraversial Python PEPs of the past:

PEP 227 -- staticly nested scopes (eg, functions in functions), available since Python 2.1
PEP 318 -- decorators, available since Python 2.4
PEP 308 -- conditional expressions, available since Python 2.4
PEP 340 -- annonymous block statements, rejected. But he noted PEP 342 (coroutines via generators), PEP 343 (the with statement), PEP 380 (delegating to a sub-generator) and PEP 492 (async/await) all came from ideas in PEP 340.

Many of which have become accepted core parts of Python.

Development Sprints

I was back on Monday and Tuesday for the Development Sprints, mostly because I was staying in Sydney for another conference at the end of the week -- so it was easy to go to two days of sprints, have one day off, and then go to the second conference.

For my sprint days I mostly worked on FuPy, MicroPython on FPGA, including updating to the current upstream MicroPython version; I posted a summary of work at the sprint to the FuPy Mailing List.