Fundamental Interconnectedness

This is the occasional blog of Ewen McNeill. It is also available on LiveJournal as ewen_mcneill, and Dreamwidth as ewen_mcneill_feed.

Modern iPhones (6s or later) take "Live Photos" by default: these are video recorded 1.5 seconds either side (3 seconds total) of when you take a still photo (see also a guide to Live Photos).

I suppose if you were mainly taking photos of people acting silly for the camera then this might be a fun feature to try out a few times, before putting aside. But if you mainly photograph other things, or take photos as a record of slides, etc, it is an actively useless waste of storage space, that can make the intended photo harder to see (eg due to flickering lighting).

Imagine, not entirely hypothetically, that you finally got an iPhone new enough to have this feature, long after the feature was introduced, and had forgotten about the feature entirely. Then took a bunch of photos of very still things (slides, screens, paper, etc) to record them, ignoring the "bulls eye" icon and "live" -- perhaps assuming they were related to focus or "showing live view with effects" or something. Only to realise a couple of hundred photos later that all your recent photos were bloated with this additional baggage you did not want, and visual flicker.

Obviously the first thing you would do is turn the feature off when taking a photo (tap the bulls eye icon, so it is not yellow and "Live" does not appear in yellow). Then you realise that by default every time you go back into the Camera application is is turned back on again. So you accidentally end up taking more "Live Photos" when distracted and just picking up your phone to record something. Which leads to a mix of "Still" and "Live Photos" that are even harder to clean up.

Eventually, after some searching, you realise it is possible to change the default as well:

  1. First go to iPhone Settings -> Camera -> Preserve Settings, and enable "Preserve Settings";

  2. Then go to the Camera application and click the bulls eye icon to turn off Live Photo (making sure "Live" is no longer displayed in yellow).

After that, the iPhone Camera application will actually remember your last setting, rather than passively agressively turning the feature on every time. So the unwanted bloat, and difficulty viewing the still photo you were trying to take, is at least no longer getting worse.

Then secondly you would want to "clean up" these "Live Photos" just making them still photos. After some hunting I found out since iOS 9.3 it is possible to strip the video out of Live Photos on the iPhone, and reclaim the storage, but it is a multi-step process:

  • In the iOS Photos application, select one or more Live Photos (and only live photos). Batches of 10 seem to work; large batches, or batches including non-Live photos appear not to give the right options.

  • Choose Share (bottom left) -> Duplicate -> Duplicate as Still Photos, to create a new file with just the still photo.

  • Check that the photos duplicated to still photos correctly.

  • When you are happy, go back and delete the original Live Photos to mark them for deletion in 30 days.

  • Then go to "Recently Deleted" and delete the original Live Photos again to delete them now.

You can check your progress on cleaning this up by going to the automatic "Live Photos" smart album in the Photos application and seeing what is left. If you remove all the "Live Photos" then the "Live Photos" smart album will vanish. When you are complete, copy the new still photos over to your computer (and force a backup of your iPhone) to discard the bloated "Live Photos" versions from your computer as well. It may be necessary to restart the iPhone and/or disconnect/connect from Image Capture.app to stop seeing the deleted "Live Photos" versions in the image list (which makes it confusing to know if you are copying the right version or not).

The biggest risk here is deleting a "Live Photo" without duplicating it, due to the need to select photos multiple times. I would recommend making a backup to your computer first, and cross checking against that before permanenty deleting the "Live Photos" from "Recently Deleted", at least for any photos for which you would miss them if they were gone.

Note that if you accidentally include a non-Live Photo in the selection to duplicate, you will not get the final prompt to "Duplicate as Still Photos", and it will just duplicate everything as is, making the problem worse. If that happens, delete the duplicates, and then try again being more careful in identifying the Live photos (which in the "Select" mode have no visual distinction so you just have to remember; thanks Apple).

For a couple of hundred accidental live photos, processed in batches of 10 or so, this is merely frustrating busy work, but actually possible to do. The main catch is the duplicated photos will appear at the end of the "Camera Roll" (as the files are created more recently). This makes it easier to tell which is the original "Live Photo" and which is the "Duplicate as Still Photo", but harder to move back and forth between them (or find photos by the order in which they were taken) if a lot of time has passed between when the "Live Photos" were accidentally taken and the clean up effort. Particularly if you have accidental intermixed "Live Photos" and taken-as-intended Still Photos (as the taken-as-intended Still Photos will be left behind in the timeline, and the fixed "Live Photos" will be added to the end of the Camera Roll).

Fortunately this is a one-off cleanup issue once the iPhone camera default settings are fixed. But user hostile defaults, plus delaying finding out what magic new features mean, leads to wasting half a day cleaning up the resulting mess. Thanks Apple.

For the record, other ways that do not seem to work / work reliably for everyone:

  • Editing the photo on the Phone, and unselecting the bulls eye ("live") will stop the photo displaying as "Live", but appears not to update the photo storage -- as you can "edit" again, and turn the "Live" flag back on again. Plus this creates a second file (IMG_Ennnn.JPG) for the edited version, adding to the storage problems. Several guides do still suggest this approach though, and I suspect it might hide them from the "Live Photos" folder in the iPhone photos albumn.

  • Deleting the IMG_nnnn.MOV companion file via Image Capture.app does not seem to work -- it vanishes from the list, but it is unclear if it is actually removed (eg, it does not appear in the "Recently Deleted" folder for further cleanup), unlike what some people report using the Photos application on MacOS. I did not pursue this further as it was not clear if it worked, and I do not use the MacOS Photos application.

  • The Lean iOS app supposedly allows bulk changes to photos to make them "not Live". But the reviews suggest while it claims to save storage space for them it did not (I am unclear exactly what it does; if it uses the "duplicate as still photo" approach, the storage may not be reclaimed until the original Live Photos are deleted...; if it just toggles the "Live Photo" tag, then no storage will be reclaimed as best I can tell). Since it was no longer a free App ($1.99) I did not try this, having found a manual solution which was merely fiddly and time consuming.

Constantly changing bytes in JPEGs

While investigating this I came across another somewhat related frustrating: when copied with Image Capture.app, the JPEGs of photos captured by default with a modern iPhone Camera application will change every time they are copied, but only in a small range of bytes (about bytes 1663-1683 from memory). This means that the photos are no longer byte for byte identical, which breaks sha1sum / sha256sum style checking for identical copies (and hard linking those together to save space).

After some digging it appears this is caused by the default settings in the modern iPhone Camera being to storage images in HEIF format -- a High Efficiency Image Format, used with HEVC a High Efficiency Video Coding (H.265) (see also HEIF iPhone Photos in iOS 11). HEIF is a container format, capable of storing multiple image frames and H.264 video. I assume this video format is also being used as part of "Live Photos", at least by default.

The result of this is when you copy "JPEG" images with Image Capture.app they appear to be synthesised on demand, with the result that the JPEG files vary slightly. In particular it appears everything is created identically except a UUID value, which appers to be recreated per copy (or at least each time you connect Image Capture.app to the phone):

ewen@ashram:~$ exiftool Desktop/IMG_5260.JPG  >/tmp/desktop
ewen@ashram:~$ exiftool Pictures/IMG_5260.JPG  >/tmp/pictures
ewen@ashram:~/Desktop/apse$ diff -u /tmp/desktop /tmp/pictures | grep "^[-+]"
--- /tmp/desktop    2018-11-24 20:50:52.000000000 +1300
+++ /tmp/pictures   2018-11-25 10:14:27.000000000 +1300
-Directory                       : Desktop
+Directory                       : Pictures
-File Access Date/Time           : 2018:11:24 20:50:33+13:00
-File Inode Change Date/Time     : 2018:11:24 20:50:28+13:00
+File Access Date/Time           : 2018:11:24 20:35:26+13:00
+File Inode Change Date/Time     : 2018:11:15 13:11:39+13:00
-Content Identifier              : 0F3C5F38-B578-4CDF-84F6-50C89A5B5C10
+Content Identifier              : 51A6B6AD-8484-465D-913C-112A4CC97931
ewen@ashram:~/Desktop/apse$ 

This does not happen when the photos are saved in JPEG directly, as was done with the iPhone Camera application prior to iOS 11. (It is an unfortunate oversight really, as it appears to me that the "Content Identifier" UUID for the JPEG could have been stashed in the HEIF file somewhere, or derived from stable values, which would have resulted in reproducible bit for bit identical exported files; I would consider that a bug, but Apple possibly do not.)

In iOS 11 you can change the format in which the iPhone Camera application stores photos via iPhone Settings -> Camera -> Formats where:

  • "High Efficiency" means store in HEIF/HEVC format

  • "Most Compatible" means store in JPEG foramt

For now, I have set mine back to "Most Compatible" as I value bit for bit identical files to reduce the storage requirements on my computer.

Eventually, when HEIF files can be copied and manipulated directly -- avoiding the constant data changes of synthesising JPEGs inexactly -- the HEIF format is probably a better choice (amongst other things it can store a greater bit depth -- 10 bit currently, but up to 16 bit in the formats -- than JPEG, and will typically provide better compression on higher resolution photos). See also Apple WWDC 2017 presentation on High Efficiency Image File Format, 500px blog post in HEIF, and Nokia Technology site on HEIF. It appears Apple are, as is often the case, at the forefront of deploying the HEIF format.

I think if you change this setting and then use "Duplicate as Still Image" (above, to resolve the "Live Photo" mess), the resulting files are being resaved as JPEG at that point. But I have not been able to completely verify that. (Certainly the "Duplicate as Still Photo" versions are not bit for bit identical with the JPEG from the "Live Photo" by any means, after having changed this setting, but it is unclear if that is due to the "Duplicate as Still Photo" feature or changing the default storage format.)

Posted Sun Nov 25 12:08:39 2018 Tags:

Following on from the Specialist Tracks Day at PyConAU 2018 and the First Main Conference Day at PyConAU 2018, was the Second Main Conference day, and then the Development Sprints.

The second day started with a keynote from Tracy Osborn, and then broke out into four rooms of 30-minute talk sessions for the day, and finished up with more lightning talks.

Keynote: Tracy Osborn

Tracy Osborn (@limedaring is someone who grew up around tech, was persuaded "tech was not for her" in college, and then returned to tech through web development. Tracy is best known in the Python community for her "easy to get started in web development" book series, Hello Web.

Tracy grew up in the mountains of Northern California, in a family that was into tech -- her grandfather worked at IBM for much of his career, and an uncle work in tech too -- in an area that was into technology: Northern California. So she was around computers basically from birth, and around the Web as it was starting to become popular. She even made her own websites, instead of writing school reports, back when that involved writing HTML by hand -- Tracy observed this was a shortcut to a good grade as the teachers were very impressed. She even made her own fan sites, incuding Tiryn Forest (hosted on AngleFire, and last updated in 1999!).

So naturally when Tracy went to college at Cal Poly she chose to study Computer Science. She had been doing it all her life, so it would be easy, right? Within the first hour of the first class, an introductory Java class, she was suddenly out of her depth and thinking she had missed something (as much as Java is a common teaching language, because of late 1990s history, it is not an easy language to get started with :-( ). She struggled on with Computer Science for much of her first year in college, doing well in the courses that involved design and less well in the courses involving studying Algorithms, until eventually a professor suggested "maybe computers are not for you" -- and then she quit Computer Science, and got an art degree in Graphic Design instead.

After college she worked as a front end web designer, avoiding JavaScript due to the trauma of Java classes (JavaScript is very different from Java, but was deliberately named to seem related back in the early days of the web; how unfortunate that naming backfired).

The rest of the keynote was the story of how Tracy found her way back into technology -- and ended up writing books about programming and web development. The short version is that she moved to Silicon Valley, everyone had a startup, and so she wanted to have a startup too.

Her startup was Wedding Invite Love, which has branched out into a number of related websites. Because her attempt to find a "technical co-founder" was unsuccessful, Tracy was drawn back into web development, this time with Django. She wrote ugly Python code -- but it worked. And in the process of running a startup, and seeing other startups' code, she learnt that working code was more important that beautiful code for a startup -- "end users don't see what is inside", and you can refactor later once you have learnt more. "Learn by doing," which is pretty much the only successful way to run a startup -- and the motto of her college.

After getting burnt out on running a startup she took a break. And for her break, she wrote a book: a better tutorial for starting with Django, based around a simple customisable tutorial site, without assuming any programmming background. Inspired by A Book Apart, she tried to get them to buy her book -- and then No Starch Press. But royalties are complicated and self-publishing was becomming more common, so she ran a Kickstarter campaign, publicised it at PyCon 2014 and published the book herself. The success of that book led to more kickstarter projects for more books, all published herself. And now she has helped many users learn programming, and design, despite being told computers were not for her.

Tracy also gave some advice for others wanting to follow their own startup / book path, including:

  • "Keep the marketing in mind when you build a product", eg talking to everyone at PyCon 2014 about her upcoming book helped make the Kickstarter a success (and it seems many of her books have been a success due to word of mouth in Python community).

  • Projects always take longer than you think; build in a bigger buffer in your timeline than you think you'll need.

  • Writing her book in Google Docs, using Markdown, allowed people, who were concerned with how long the book was taking to come out, review the content before publication -- and that feedback helped improve the book. (She laid the book out in InDesign, due to being familiar with it from her graphic design background.)

  • Mathematical perfection does not mean visual perfection, for instance the perceived colour can be affected by the amount of colour present, even with exactly the same colour used; and the bottom matte on art needs to be a little wider than the rest of the edges to avoid it appearing "too thin" (when the art was hung above eye level; but it is still done for effect now).

Earlier in her keynote Tracy also referenced her PyCon 2017 KeyNote: Anxiety, Self-Advocacy, and Promoting Yourself (video on YouTube), which seems worth going back and watching too.

Implementing a decorator for thread synchronisation

Graham Dumpleton, author of mod_wsgi (to link Apache to Python web applications), wanted to replicate the Java synchronized feature in Python, rather than needing to use lock objects directly.

His approach was to create a synchronized decorator, which can be applied to various Python features and automagically makes them synchronised, using a lock on an appropriate thing (eg, the object for a member function). The talk described how he evolved the design to make it more flexible, including how he used the wrapt module (which he wrote; documentation) to make the decorators more context-aware.

For anyone interested in the implementation details, the presentation slides contain lots of detail on the subtle edge cases the implementation had to handle. But for anyone who just wants to use it, the wrapt module includes his final synchronized decorator, usable with:

from wrapt import synchronized

@synchronized
def function(...):
    ...

whether the function is a top level function, an instance method or a class method.

Reflections on the Creative Process - Illustrated with Watercolour Painting

Grace Nolan, who works on IT Security at Google, and is helping organise PurpleCon, also does "wet on wet" watercolour painting as a creative outlet. She spoke, illustrated with painting intermissions, about what wet on wet watercolour painting had taught her about being creative in the context of software engineering. Grace started out with interactive art, and ended up in programming because of Kiwicon (an IT security related conference).

"Wet on wet" watercolour painting involves painting onto pre-wet paper, with the result that the colour tend to bloom a lot, and slowly blend together in watever way they want -- it is not entirely predictable.

Parallels with technology include:

  • The paper type matters -- textured paper tends to absorb the paint, and a more flat paper lets it sit on top ("know your hardware").

  • Laying the groundwork is important. Preparing the paper for painting is a lot like writing psuedo code before writing the real code.

  • Watercolour painting, and programming, can be stressful

  • You start with an optimistic belief that you can do it, but it does not quite work out how you hoped -- in painting and technology. (With programming there is more "reputation risk" -- an echo of Tom Eastman's keynote the day before.)

  • When you end up stressed by a task, especially a programming task, take time away. Eg, slowly sip a glass of water, and be present to the experience of the water.

  • Accept the situation: you cannot fight against what the water wants to do when painting.

  • The main reason she gets stressed is that she does not know what is happening. Giving up is often a short term solution. But the problem may still be there later. Getting more information (eg, looking at logs, or research) allows her to keep going, which is more in line with her values. Then she can commit to that decision.

  • Learning the techniques of others can help improve your craft.

  • Water colour "black" is usually made by blending different complementary colours together -- and you get a different "black" depending on which ones you choose.

  • Water colour gets its vibrancy from the white paper beneath; one of the worst things you can do is overpaint. Knowing when to stop is important in painting, and in programming.

Grace finished with some key takeaways:

  • Reflect on how you work

  • Self soothe

  • Talk to others about how you feel

  • Know that your community of people are willing to help and support you.

The approach she describd was based in Acceptance and Commitment Therapy. Grace also credited Chantal Jodin, a French artist (Google translation as a key inspiration to her painting, including the piece she painted during her talk.

The video of the presentation is well worth watching, for the inspiring painting while presenting.

FP Demystified

Eugene Van den Bulke was "FP Curious" -- curious about Functional Programming -- so he went to Lambda Jam and came away enthuaistic. He recommended Eugenia Cheng's keynote on Category Theory and Life, from Lambda Jam 2018; Eugenia is the author of "The Art of Logic".

Eugene's aim was to port Brian Londsdorf's class on Functional Programming in Javascript (featuring claymation hedgehogs teaching FP in Javascript) to Python. I think he succeeded in porting the code, but it felt like the 25-30 minute presentation format... did not help with demystifying a large complex topic!

Because of the speed of the presentation I struggled to take notes on everything covered -- it was a whirlwind tour of category theory, with examples in Python, presented from a Jupyter notebook which he filled in as he went. I suspect even watching it again one would need to pause repeatedly to take notes!

Some (hopefully not too inaccurate!) highlights:

  • A Box wraps a value; and a fold can extract the value from the box and apply a function to it. For instance in Python map can apply a function over a (set of) values. A Box is a functor -- something that can be mapped over. (The Box here is a custom implementation, rather than a Python built in.)

  • Currying translates a function taking multiple arguments into a collection of more specialised functions each taking one argument. This allows partial specialisation or pre-binding. In Python partial specialisation can be done with:

    from functools import partial
    foo = partial(FUNCTION, ARG)
    

    or via returning a closure with the first argument bound.

  • An Applicative Functor has more structure than a plain Functor, but less than a Monad. They allow you to use apply() as well.

  • A lift makes a function usable with another type (eg, a wrapped type, like Box above).

  • Either is an type that allows storing two types of values, by convention a left value or a right value, such as a result of a function or an exception. An Option is a special case where the right type is None (so you can have a value or nothing, making the value optional, like NULLable columns in a database). A fold on an Either takes two functions (one for the left value type and one for the right value type).

  • A Monad) is a design pattern that allows boxing and chaining function calls together, because each function call returns the same (boxed) type. (See also Crockford's Law that when you understand a monad you lose the ability to explain it to others :-) )

  • A SemiGroup adds a concat method; a Monoid has a function that combines the object with another of the same type.

It was a valiant attempt, but as noted above felt fairly rushed, particularly in an area like Functional Programming which suffers a lot from "obscure" terminology (borrowed from Mathematical Category Theory -- the terms are precise, and accurate, but obscure/complex for what they represent).

Perhaps reviewing the video along side the Jupyter notebook might help make it clearer. Or watching it along side the original Claymotion hedgehogs :-)

Task Queues in Python: A Celery Story

Celery is the default Task Queue interface used with Python. It provides message queue of tasks, and a broker that distributes tasks to workers, both pieces of which have multiple alternative iplementations (eg, Amazon SQS). It is commonly used in Python to get message passing concurrency (because Python has relatively poor support for in-process threading, due to its internal locking).

While Celery is simple to start with, there are lots of features and it can quickly get complex to configure if you have a more specialsed use case. Out of the box Celery is tuned for short running tasks, and is not ideal for longer running tasks (minutes/hours/days). Unfortunately there are multiple places to configure Celery, and they interact in complex ways -- the presenter found they had to disable prefetching of tasks, in multiple places, to make their workload usable. And even then sometimes jobs got stuck in the queue while they still had capacity available.

There are other task queues available for Python including:

  • RQ -- Redis Queue -- which uses Redis as the queue storage. It is very simple and understandable.

  • Huey -- another little task queue, supporting Redis and SQLite. It has nice task chaining and distributed locking.

  • Dramatiq, a task queue supporting RabbitMQ and Redis. The presenter noted the documentation is note as good as they had hoped.

  • TaskTiger, another Redis based queue, with distributed locking support and more queueing primitives.

There is also Dask, which is not a task queue, but can be used for similar things; if you are already using Pandas it might be the best option.

The present switched to using RQ, which they are happy with. They chose it in part because they were already using Redis for caching in their application, so it did not introduce any more dependencies. (They are also using their own serialisation; the RQ default is JSON, which has some limitations on what can be serialised.)

How To Publish A Package On PyPI

The talk was renamed "Publishing (Perfect) Python Packages on PyPi" after submitting the abstract, as the presenter liked the alliteration :-)

There were two main approaches presented, the manual approach, and an almost entirely automated approach.

The manual approach involves using setuptools:

  • Create a new directory, with a src subdirectory

  • Write your module, put it in the src directory (putting the code in src avoids accidentally running uninstalled versions).

  • Write setup.py in the top directory, using setuptools:

    from setuptools import setup
    setup(name=...,
          version=....,
          description=....,
          py_modules=[...],
          package_dir='src')
    
  • Then run:

    python setup.py bdist_wheel
    

    to create a build folder, and an egg-info folder (`eggs are a specific Python version release)

  • Test your installation locally, in a virtualenv:

    virtualenv venv
    . venv/bin/activate
    pip install -e
    

    (-e so that it imports the thing you are editing, with links, which allows you to edit and retest that it works)

  • Add a .gitignore file; see gitignore.io to help create a useful .gitignore template for your lanuage.

  • Add trove classifiers to help make your project findable by common search terms, these go in classifiers=[...] argument to your setup() call).

  • Add a license, in LICENSE.

  • Create documentation in RST or Markdown (RST is common in Python). Write at least a README.md. Use Sphinx or ReadTheDocs to publish the documentation.

  • Consider using the README.md as your long description in the documentation, getting your setup.py to read in README.md from a file; PyPI now supports Markdown syntax.

  • Use pipenv for testing:

    pipenv install -e
    pipenv install --dev --pytest ...
    pipenv shell
    

    This creates a Pipfile file, and a Pipfile.lock which records the exact versions/hashes used. (The "lock" here is locking in the versions, not a traditional unix lock flag file.)

  • Recommendaton is to use setup.py for production dependencies, and Pipfile for development dependencies. Keep the versions in the Pipfile as relaxed as possible; the lock file will record the known good/last used ones.

  • Build a source distribution:

    pip install check-manifest
    check-manifest
    python setup.py sdist
    

    This builds a source tarball. It wants a URL and author details in the metadata, and a Manifest of files to include; use Manifest.in to add extra files. (check-manifest will create a Manifest from the file checked into git.)

  • Upload your packge with `twine:

    pip install --dev twine
    twine upload dist/*
    

    You need to create a PyPI account first. (Do not use setup to upload; it uses an insecure upload method.)

  • Other things to do: test against other Python versions, eg, with tox, which creates a virtualenv for each Python version you want to test and runs your tests in that environment. Use Travis to do automated testing if your code is on GitHub, eg to automatically check pull requests.

The automated approach is to use cookiecutter, which will ask you a few questions and then give you a "best practice" template directory for your project with everything ready to go:

pip install cookiecutter
cookiecutter gh:audreyr/cookiecutter-pylibrary  # or
cookiecutter gh:audreyr/cookiecutter-pypackage

It directly grabs the template out of GitHub.

The examples used in the talk are a useful reference, in addition to the Python Packaging Authority guides.

Watch out for the Safety Bandits!

Tennessee Leeuwenburg wanted to highlight two security related tools to help make your code more secure:

  • safety checks your installed dependencies for packages with known issues (eg, CVEs):

    pip install safety
    safety check
    

    There is an insecure-package which you can install to make sure safety is finding issues.

  • pyup.io can automtically alert you to issues with dependencies, for free if your package is maintained in public.

  • bandit looks for common programing patterns that are known to be weak. It analyses the internal structure of your code, as a tree. (Suggestion: group code with the same trust level in the same directory hierachy, so you can focus your most paranoid scans on the code most at risk, and reduce the noise on code that only works with "trusted" input.)

Lightning Talks

chunksof: a generator

Tim Heap wrote a generator which will break an iterable up into chunks, because there were many attempts on StackOverflow of varying correctness and lots of other approaches were not perfect:

  • slice() is good for lists, but breaks on generators.

* itertools.islice() is good but in some situations may not terminate (eg, if the iterator is empty; need lookahead using itertools.tee() and contextlib.suppress()). Unfortunately tee means that the iterated elements are not garbage collected until the end, so you can run out of memory.

  • Another alternative using itertools.chain() which works, and allows garbage collection of everything but the first example.

  • More elaborate alternative using a yieldone() inner generator, which allows everything to be garbage collected, but does not allow consuming the items out of order.

The "perfect" version is apparently only available as a Tim Heap gist of chunksof.py. To see the examples it appears you have to watch the video, and freeze frame, as they do not appear to be anywhere else.

PyO3

Nicholle James is making an anthology of Python 2.7 fan fiction, to be made online for free under a CC-BY-SA license, with print edition at PyCon 2019. Pitches were due by 1 September, with work due by around first quarter 2019; look out for an online release by about mid 2019.

Tracking trucks in East Africa (Nairobi, Kenya)

Tisham Dhar of Lori Systems helps implement a system to track trucks in Africa. "Uko Wapi?!": Where is the truck?! They have a smartphone application, but there is only about 40% penetration of smartphones amongst truckers. So they are using Traccar, a Java app, to track vehicles using vehicle GPS tracking hardware, that they reverse engineered (which send data back via a M2M SIM).

Where the tracker does not send data back, they call the driver, ask "uko wapi?", get a place name from the driver, and then use Google Maps API to locate the truck, sanity checking that the location makes sense on the route that the truck is on.

Python Sphinx and Jira

Brett Swanson, of Cochlear, Ltd works in a heavily regulated industry (hearing implants). Which means they need detailed software release reports. They used to build these by hand which was slow and error prone; now they generate tables in Sphinx (using list tables, which were easy to generate) with a Python script that can pull in the Jira issues.

String encodings and how we got in this mess

Amber Brown (hawkie) covered a whirlwind history of text encoding including:

  • Baudot, a 5 bit encoding used by telegraphs, with multiple variations (ITA1, Murray, Western Union, ITA2), since they mostly did not need to interoperate.

  • BCD, a 6-bit punch card encoding, from the 1920s

  • EBCDIC, an 8-bit encoding from IBM in the 1960s (part of the System/360 project).

  • ASCII, a 7-bit American standardised encoding, also from the 1960s

Eventually it was released none of these were enough to hold all the world's characters, which led to:

  • Shift JIS, an 8-bit/16-bit variable encoding for Japanese which includes ASCII as a subset

  • ISO 8859-x, a series of 8-bit encodings compatible with ASCII (but incompatible with each other)

  • UCS-2 and UTF-16, two 16-bit encodings to handle more characters

  • Unicode and UTF-8; Unicode is regularly extended to add more characters

Detection of which character set is in use, without metadata saying so, is difficult, which can lead to Mojibake -- strange results from decoding in the wrong character set encoding (originally named in Japan).

Building 3D physics simulations

Cormac Kikkert is a student who built a physics simulation in Python, using PyGame (rather than trying to use a 3D library directly). His approach is to divide the object into lines, then move those lines in the physics simulation, and redisplay; movement is with points and a velocity vector.

Python Bugs: Pentastomida

Libby Berrie gave her first talk, at her first Pycon, about Pentastomida, an actual bug which affects actual snakes. She is a front end developer who has a degree in bio informatics.

Pentastomida is a very old parasite (dated to around 450 million years old), which primarily infects snakes. Eventually the snakes cough up the parasites, which last for ages and are eventually eaten by the prey of snakes... and get back into snakes.

Apparently Pentastomida can also affect humans, generating flu-like symptoms and cysts in some (but most human infections are asymmtomatic).

Don't do this

Lewis Bobbermen, a student who works at Polymathian, wrote a decorator that allows using square brackets instead of paranthesis -- by abusing __getitem__. He also implemented Currying by overiding __or__ and __call__, and f-strings (new in Python 3.6) in Python 2 by walking back up the stack frame.

This eventually lead to his flip-flop operator, the first one submitted.

Confessions of a goto user

Alen Pulford, a student who was part of [the student showcase]](https://2018.pycon-au.org/education-showcase), likes the GOTO statement. He gave examples in bash, Arduino (C), and a Python example... which reopens the source code. He refined that to a version that figures out where it is called from, and a recursive definition.

Flip Flop Face Offerator

Merrin MacLeod returned to the stage with a followup to her Saturday lightning talk on the Flip Flop operator. In the interveening 24 hours things had gotten a little out of hand, and in addition to Lewis's implementation above (the first submitted) there were several more implementations. So they had a Face Off, with two judges and voting.

You need to watch the lightning talk or look at the collected implementations, as I cannot do justice to the judges reactions!

MicroPython and Jupyter

Andrew Leech described jupyter-micropython-remote (source) which allows connecting a Jupyter notebook to MicroPython, which is very useful for debugging. It currently requires a daily build of MicroPython (or 1.9.5 when that is released), as the communication interface ia via mprepl.

Controversial PEPs (of the past)

Nick Coghlan, a Python core developer summarised some contraversial Python PEPs of the past:

  • PEP 227 -- staticly nested scopes (eg, functions in functions), available since Python 2.1

  • PEP 318 -- decorators, available since Python 2.4

  • PEP 308 -- conditional expressions, available since Python 2.4

  • PEP 340 -- annonymous block statements, rejected. But he noted PEP 342 (coroutines via generators), PEP 343 (the with statement), PEP 380 (delegating to a sub-generator) and PEP 492 (async/await) all came from ideas in PEP 340.

Many of which have become accepted core parts of Python.

Development Sprints

I was back on Monday and Tuesday for the Development Sprints, mostly because I was staying in Sydney for another conference at the end of the week -- so it was easy to go to two days of sprints, have one day off, and then go to the second conference.

For my sprint days I mostly worked on FuPy, MicroPython on FPGA, including updating to the current upstream MicroPython version; I posted a summary of work at the sprint to the FuPy Mailing List.

Posted Sun Sep 16 21:56:20 2018 Tags:

Following on from the Specialist Tracks Day at PyConAU 2018 were two full days of "main conference" programme. The first day had two invited speakers (morning and afternoon), as well as four rooms of 30-minute talk sessions and lightning talks.

Keynote: Annie Parker

Annie Parker is the founder of Techfugees Australia, a tech community response to the refugee crisis in various countries. They have run several meetup and Hackathon events, around Australia, bringing together enthuiastic tech people, refugees and organisations working with recent arrivals to the country.

From these Hackathons have come several refugee focused startups, and other tech projects, including a job matchup site (many refugees are highly skilled, but their qualifications are not always recognised), and a website to help refugees find out about resources available to help them.

Annie brought along Shaqeaq, a refugee from Afganastan/Iran, who came along to one of the Sydney Hackathons to talk about the issues refugees face, got enthuastic about solving the problem and stayed to hack, and then ended up founding her own refugee-focused startup -- while she is still in high school.

I would highly recommend the video of this keynote if you want to be inspired and have not seen the talk already.

Describing Descriptors

The Python Desciptor Protocol is a way of binding behaviour to (class) attributes, so that you can override getters and setters. This facility exists in a lot of programming languages. Python has a few facilities for implementing overrides, including implementing the __setattr__(self, attribute, value) method on a class, and the @property decorator but the Descriptor Protocol is the most flexible approach.

The core of the Descriptor Protocol is four methods:

  • __get__(self, instance, owner)

  • __set__(self, instance, value)

  • __delete(self, instance)

  • __set_name__(self, owner, name) -- since 3.6, called when the property is created on the class (can be used, eg, to capture the name of the attribute to use later in error messages)

These methods override the behaviour of retrieval, storage, and deletion of the attribute.

There are two major categories of Descriptors:

  • Data Descriptors: implement __get__() and __set__()

  • Non-Data Descriptors: implement __get__() but not __set__()

Typically non-Data Descriptors would allow data retrieval, as if it was a property, with the results being calculated or retrieved from somewhere else. They can be used for @staticmethod type overrides, etc.

Data Descriptors will take precedence over the class's internal dictionary lookups; but non-Data Descriptors will only be used if there is no other definition of the attributes name in the class.

The Python WeakKeyDictionary can be useful as a storage location for data associated with the attribute as the WeakKeyDictionary itself will not keep the data around; but the reference within the class to the attribute will keep it around (so when the object with the attribute goes away, the entry in the WeakKeyDictionary can also go away).

The talk gives some examples for how these can be used together.

End to End Energy Monitoring in Python

Tisham Dhar (@whatnick described an energy metering platform using ESP8266/ESP32 (16MB of flash or more) and MicroPython to capture energy usage values and stream them out to a data collection platform for graphing (with Graphite). It is based around the MicroChip ATM90E26 measurement chip, with the CrowdSupply campaign for his monitoring kit launched during PyConAU 2017 and discussed in the talk by Joel Stanley presented at last year's PyCon AU (talk video). (Thisham originally started on Arduino, but outgrew the platform fairly quickly, which motivated the move to MicroPython -- helped by work by Joel Stanley getting MicroPython going on the board.)

To make the results available, the PicoWeb web server is run on the microcontorller (ESP32 best, due to RAM requirements), and vue.js is used for the display. The data is also streamed out over raw TCP/IP sockets for management.

This solution can be used both for measuring consumption (eg, per device or per outlet), and for measuring energy generation (eg, solar power). It can monitor voltage, current, power usage, and power factor (basically the phase shift between current and voltage sign waves).

The slides have lots of interesting detail, as well as some discussion of NILM -- Non Invasive Load Monitoring and some other power usage meters.

The Case of the Mysteriously High System Load

"Debugging is like being the detective in a crime movie where you are also the murderer." -- Filipe Fortes

Our case is set amongst the Geri embryo development machine, and its companion Geri Connect, which allows remote monitoring of the embryos. The machine takes time lapse photos, every 30 minutes, at 11 focus depths, of 6 chambers with up to 16 embryos from the same patient in each. These time lapse photos are then turned into short "delelopment movies" for trained staff to review to check on the development progress, and identify the most viable embryo.

They started to run into performance load issues, and started experiencing load averages of 9 to 30, on a 6 core system -- resulting in it getting further behind. Initially they had no metrics being recorded, and resorted to "hand parsing" the logs to get some metrics on how long steps were taking. Then they started adding observability by using a Python statsd library, from Etsy, to send out statistics from their application at key points, which gve them more visiblity. They also used pg-statsd to get stastics out of PostgreSQL.

Sending metrics from Python can be as simple as:

from statsd import StatsClient
...
sclient = StatsClient(HOST, PORT)  # optionally prefix=...
sclient.timing('tag', TIME)        # submit timing result, eg ms
sclient.incr('tag')                # increment counter

As they investigated their metrics showed two main issues:

  • database updates were the slowest part -- because they were making a lot of attempts at updating something which actually could only be updated 1/176 times (once all videos of all focus stacks of all slots in all chambers had been processed). So they moved that to a background thread to happen periodically, rather than every single time a video was made.

  • Celery did not respond well to oversubscribing the CPU cores -- they got much better results by slightly undersubscribing the CPU cores (n-1), leading to a huge decrease in system load.

The presentation was very entertaining, and viewing the video is highly recomended.

Creepy, frivolous and beautiful art made with machines

J Rosenbaum presented a fabulous talk on the fantastically weird generated art coming out of machine learning and other algorithms, particularly when someone feeds them a very selective world view. You need to see their slides or view their presentation to really appreciate it -- it was an awesome presentation. (A content warning note -- there is some artistic nudity in the slides so you may not want to view this at work; and some of the machine learning generated art is very weird, and sometimes rather creepy.)

J's website has more information on their process, and more examples (although note that too contains artistic nudity including on the front page). Their work at the intersection of art, machine learning, and gender seems well worth following.

Context Managers: You can write your own

Daniel Porteous described how you can write your own Python Context Manager, and how to save yourself some effort by using the tools in contextlib.

Context managers are invoked by the Python "with" statement, and can be implemented by hand with a class that has the methods:

  • __enter__(self)

  • __exit__(self, type, value, traeback) -- the extra arguments are for exception handling, to give the context manager a chance to alter what happens when an exception is thrown within the context managed block.

The class can also optionally have an __init__ method, but Daniel cautioned against doing anything "expensive" (time/space intensive) in the __init__ method, as it may be done and never used.

contextlib makes this process simpler, with various decorators, including @contextmanager, which allows immplemeting your context manager as a simple function which has a yield at the point that caller code should run. It also has other helper methods that make implementation simpler.

Refactoring Code with the Standard Library

John Reese described how FaceBook have managed to repurpose lib2to3, originally intended to help port Python 2 code to Python 3, to do other automatic source refacoring (eg, adding or removing function arguments, inserting calls to UI translation routines, etc) in a safe manner. Because this refactoring happens in a parsed source tree, rather than just a wild strings with a regex, it has more context to work with and is less at risk of changing the wrong thing or making changes in unexpected ways. And because lib2to3 has been maintained with all the language updates in Python 2 and Python 3, it does a much better job of parsing Python than trying to write a separate parser yourself.

Overall it looks like a good solution if you need to make changes to a large Python code base in a safe manner.

Keynote: Tom Eastman

Tom Eastman, a PyCon AU (and PyCon NZ) regular was invited to give the Saturday afternon Keynote -- and took the role very seriously, including wearing a suit jacket to present, and delivering a well rehearsed presentation.

Tom's theme was learning -- really learning, rather than just "learning light" where you put yourself in an environment where maybe learning could happen but do not actually do the work.

Like most of us who were gifted as children, Tom found himself assuming that things were either easy, or you could not do them. That "effort is what you need when you're not talented". It was only much later that he came across Carol Dweck and her descriptions of the "fixed mindset", and recognised those beliefs in himself.

The contrast to a "fixed mindset" is a "growth mindset" -- believe that you can change with effort. The growth mindset allows for the opportunity to improve -- and expects there to be effort involved so does not give up when it is "hard", has a thirst for the challenge, and sees failures as minor setbacks. The right mindset can have a big difference in what you do.

Tom illustrated the difference by talking about the difference between his mindset in areas where he perceived himself to be experienced (eg, Python programming), and those where he perceived himself to be a beginner (eg, Information Security). In the areas he believed himself to be experienced, and he perceived others as thinking he was experienced, he felt his reputation was at stake -- and he always wanted to be his best. So it was either easy or "he did not have talent"; a "fixed mindset". In areas where he believed himself to be a beginner he was quite happy "not knowing" and making mistakes, messing up, because he was "just a beginner" (and if you listen to any of Tom's many talks about information security over the least 5 years or so you'll hear him start by saying he is not an expert... giving himself permission not to be perfect).

Real learning happens best with "effortful retrieval", particularly using your own words (eg, teaching others, or writing your own flash cards). Otherwise the learning is "like a backup system you've never run a test restore on" (ie, no idea if there's anything "stored"). He recommended a Pomodoro App for focused time management (25 minute chunks, with 5 minutes break). And giving yourself permission to be a beginner -- to make mistakes, to not know it all immediately. Find a place to be "stragegically dumb". Accepting that it might be difficult at first. "If you're feeling comfortable maybe you're not actually learning."

In the course of his keynote, Tom also happened to mention cram learning Ruby, for a project, and coming across a weird operator... which he pointed out was very weird. That small side note, a lightning talk by someone Tom asked about the operator, and a subtle challenge on Twitter ended up creating a meme for PyConAU 2018 -- a Flip Floperator, about which there is more on day three of the conference.

Lightning Talks

Bex Dunn, Landscape Scientist at Digital Earth Australia

They have open satellite data, going back several years, of Australia, which gives them a compliation in space over time. By rendering that into an animated image, they can have a satellite image that changes over time as water flows into, eg, the Murray Darling basin and back out again.

The code is apparently somewhere on the GeoScience Australia GitHub page.

Tim Ansell, FuPy

Tim is not actually a robot -- he's actually all human. A constant bug generator. So he likes software. You can fix bugs in software after you write them. Whereas hardware is hard to patch. FPGAs make hardware into software.

This was all a pitch to get people to come hack on FuPy at the Sprints, and get hardware.

Merrin MacLeod, The Flip Floperator

Apparently Tom Eastman challenged her to talk about the Ruby "Flip Flop" operator because she is a Ruby programmer, so this followed on neatly from Toms mention in passing in his keynote (immediately before the lightning talks). The ".." operator in Ruby (and Perl) flips on when the first expression is true, then stays on until the second expresion is true, acting like a generalised range expression.

It turns out to not be used in Ruby, and is probably being removed (or at least deprecated) around Ruby 2.6.

Adam Jacquier-Parr, DRF Model Pusher

DRF Model Pusher is a REST framework to send real time updates out using Pusher. It is implemented for Django (as a Model), and allows sending up to 10kB messages through public and private channels (optionally with a background worker).

The package is on GitHub.

Claire Krause, GeoScience Australia

She works with satellite imagery, studying agriculture from space. One of the things they can do is monitor when storage dams are being filled (with water) and emptied. This allows them to detect whether the water use rules are being obeyed. Previously this was hand digitised information, but now they have an algorithm which can detect water and identify dams. (It creates a raster data set, which they then automatically trace to build a vector dataset of geospatial coordinates.)

They are in the process of moving to a every 5 days dataset, which will let them better watch what happens in near real time.

This appears to be part of Digital Earth Australia, who have an Open Data Cube, including an example of doing water detection.

Felicity Robson, student, Captcha Cracker

Felicity talked about attempting to crack CAPTCHAs, starting from the example of recognising Phoenix, her cat. In particular she was targetting the Google CAPTCHA system -- which currently asks about vehicle, storefront and street sign recognition. The aim was to use a SIFT to recognise the images -- particularly street signs, which are relatively standardised. However Recaptcha (Google's CAPTCHA system) kept locking them out... for acting too robotically. (In later discussion with a Google engineer they pointed out that image recognition is one of the things that Google uses to identify a robot, others being patterns of behaviour, like trying to do things over and over again quickly... so there'll always be an arms race between CAPTCHAs and those who try to automatically break them.)

Nick Moore, Rocket Surgery

As a project for BuzzConf (held in Ballan, Vic), Nick and some others created a small MicroPython based telemetry system that could be used with water powered rockets to track their launch system. They used MQTT to stream the data back for real time display. Based on ESP32 chips

Benno Rice, Blockchain

"Waste electricity faster than anyone else". A repeat of a lightning talk at another conference, I believe.

Peter Lovett, PEP505

Peter Lovett is not a fan of PEP505, a proposal to add None-aware operators to Python in the form of "question assignments". While the PEP is from 2015, it appears to have gotten more discussion recently -- but still relatively few fans.

Philip James, Gratitude

Philip offered thanks to Guido van Rossum, the recently "retired" creator and BDFL of Python. And all the other Python core team and package maintainers.

As part of this he built the thanks package (on GitHub, which uses PyPI data to help find ways to fund Python development and make those projects sustainable.

ETA, 2018-08-29: Added Tom Eastman Keynote and lightning talks.

Posted Wed Aug 29 15:43:46 2018 Tags: