Memory-efficient processing of large XML documents requires the use of a streaming parser. This post gives an introduction to XML stream processing with the Haskell programming language, in particular to the streaming API of the xml-conduit package. It shows examples for reading, writing and transforming XML data in a conduit pipeline.
This post is based on a talk I gave at iPres 2022 (slides). It explains how to read a file format specification (in this example, TIFF) and based on that build a minimal binary file by hand.
Videos created by mobile phones are often rotated by means of embedded metadata tags. Playback software that respects these tags applies the correct rotation when rendering a video while software that doesn’t only displays the tilted, un-rotated (original) video. Let’s see if we can rotate a Matroska (mkv) video using metadata! (spoiler: not really, but maybe in the future)
A collection of random Bash commands/idioms/patterns I keep forgetting.
Inconsistent glyph width information is a common cause for PDF/A validation errors, but the details are not easy to understand. This text provides the necessary background knowledge and dissects an example file.
Last week I attended the “PDF Days Europe 2017” conference in Berlin. While a whole conference dedicated to one file format may sound funny it was in fact quite interesting. Here’s my personal summary of the most important topic, the soon to be published PDF 2.0 ISO standard.
When you write a GUI application and want to install it properly on a Linux system you will ask yourself where to store the application’s icon so that it is shown in the application menu of the desktop environment. To my surprise, I found this a bit confusing. In particular, I was interested in Debian and the Gnome 3 desktop because this is what I happen to use myself. Here are what seemed to be the most relevant sources to me.
Thumbs.db
. Suppose you want to clean a large directory tree
to keep just the actual image files. Here’s how.Installing TLS CA root certificates in Linux is actually quite easy. Well, at least if you know where to put the certificate files … Unfortunately, different distributions keep their certificate stores in different places. Here is a short overview on installing root certificates in Debian and Red Hat Enterprise Linux/CentOS. Other distributions based on Debian or RHEL probably handle this similar to one of the two approaces described here.
When editing metadata of single image files I usually use my graphical metadata editor Verso. But when it comes to working with lots of files en masse, like shifting the date of all images in a directory by two hours, nothing beats the command line. ExifTool is great for this. Here is a random collection of handy commands.
Just some quick notes on computing hash values (aka checksums) for one or more files on Linux and Windows.
Usually the system wide web proxy settings on Windows are configured via the (graphical) Internet Explorer’s or the System Control’s internet options panel. However, sometimes it would be nice to do this with PowerShell.
If you try to download a file with Curl but keep getting wrong data there may be a web proxy getting in your way that holds an outdated version of the file in its cache.
You may think you can do only one thing at a time in Bash. This notion seems plausible – you have exactly one command prompt, so how should you run different commands at the same time? However, this notion is wrong. Bash offers a set of built-in multitasking features called Job Control.
Regularly installing software updates is one of the most basic measures to keep a computer system safe. However, searching for and installing those updates is a tedious job that lends itself to some degree of automation. This article will show you how to configure automatic updates in Debian.
When was the last time you used the Caps Lock key on your keyboard? Unless you like being rude on the internet or write all SQL statements in uppercase, it’s probably been a while. That’s a pity because the key itself is located very conveniently right next to the home row. (This is where your hands are centered when touch typing: ‘asdfjkl;’ on US keyboards.) Only its function is rather useless.
To mount a Windows network share in a Linux system you will usually
use the CIFS
protocol. On Debian and RHEL/CentOS, the necessary tools are provided in
the cifs-utils
package.
The uptime of a Windows machine can be found on the “Performance” tab of the Windows Task Manager. To get the system startup time you have to resort to the command line.
The Debian “netinst” or network installer is a great way to download only the packages you need when installing a Debian system. However, as you might have guessed from the name, an internet connection is required during the installation. This might pose a problem if you find yourself installing a Debian system in a (corporate) Windows environment where internet access is restricted by a web proxy that uses Microsoft’s Active Directory for user authentication.
Suppose you are running a Debian system in a (corporate) Windows environment where internet access is restricted by a web proxy. In many cases the proxy will be configured to use Microsoft’s Active Directory for user authentication as well as the rest of the environment. If you don’t want to integrate your Debian system completely into the Active Directory structure but only need internet access, you have to tell the proxy your Windows/Active Directory user credentials. The following article will show you how this can be done.
Recently, I was writing a program in Perl that uses a GTK+ 3 GUI to display an image file. When I did something similar with Perl and GTK+ 2 a few years ago, there was a great image viewer widget available that provided all kinds of nice things like scaling and zooming out of the box. Sadly, I found nothing like this for GTK+ 3, so I had to resort to the basic GTK+ widgets to display image files.