Parabolic Logic

The world isn't flat..Software shouldn't be flat either

You are here: Home


  • Scripting languages part 2

    Community support DevOps, cloud deployment, test- driven development and continuous integration – the demands on a sysadmin change and evolve, […]

Cryptography: Old and New ( part 2 )

Going public

The other good thing to come out of the ’70s was Public Key Cryptography. This finally solved the problem of being able to communicate securely without first having to meet in order to establish a shared secret. The method is called the Diffie-Hellman key exchange, after the gentlemen responsible for its invention. It exploits the chiral mathematics of finite fields, in which it’s straightforward to exponentiate an element (that is, raise a number to a power), but very difficult to conduct the opposite process not easy like putting a infant to a double stroller for infant and toddler, known as the discrete logarithm.

Thus field exponentiation is an example of a ‘one way function’. The illustration (at the foot of the facing page) shows an example of the exchange between Alice and Bob, who are fairly ubiquitous in cryptographic literature. The shared secret s=g ab can be calculated by both Alice and Bob. An onlooker, Oscar say, can see the public keys A and B , and the exchange parameters g and p , but these are of no help in deducing the shared secret s unless one of the secret keys a or b is also known. Once thusly established, the shared secret s can be used as an ephemeral encryption key for a symmetric cipher, such as DES.

The secret keys a and b could at this point be destroyed, which would ensure so-called perfect forward secrecy, but a proper public key infrastructure would require that private and public keys remain largely immutable. Further, public keys should be as well- advertised as possible, to reduce chances that a man in the middle, say Mallory, could impersonate either party with a bogus public key: the key exchange provides confidentiality, but doesn’t of itself guarantee authenticity. To achieve the latter, one needs to be sure of whose public keys belong to whom.

To do this in general, one requires a trusted third party, known as a Certificate Authority (CA), to act as a directory of keypair owners. Since public key cryptography is such a different animal from its private counterpart, one can use various bits of mathematical trickery to reduce the search space to one significantly smaller than that of a brute-force attack. This being so, the classic public key algorithms all have much longer keys. For example, the AES algorithm is considered secure with a 128-bit key, but people are already concerned that 1,024-bit RSA keys are no longer secure. The new-fangled Elliptic Curve cryptography, based again on discrete logarithms but in a more abstract algebraic space, offers shorter keys, but still of the order of twice the security parameter.

The security of all these public key systems rests on the supposed intractability of factoring integers and the discrete logarithm problem. While mathematicians have studied these problems extensively and come up with some good tricks for speeding up the process, they both remain sufficiently time-consuming to solve as to still be considered secure – at least on conventional hardware. Up until 1992 cryptographic software was classified as a form of munitions in the US, and even after this date was governed by export restrictions. These precluded the export without licence of any software using a key length of more than 40 bits. This led to a lengthy criminal investigation of PGP founder Paul Zimmerman, which ended in nought.

Zimmerman came up with novel ways of circumventing these restrictions, including publishing the source code as a book, protected by the First Amendment. Netscape was forced to release a crippled ‘International Edition’ which permitted only 40-bit SSL keys, in contrast to its 128-bit US edition.

Are you Shor?

In 1994, Peter Shor announced an algorithm which could be run on a quantum computer which would enable it to (among other things) factor integers and compute discrete logarithms much faster than a classical computer. While no one has yet succeeded in building the right kind of quantum computer, there’s sufficient concern to give rise to a burgeoning field of study known as post- quantum cryptography. Perhaps a more practical concern is the problem of producing secure keys in the first place.

This relies on being able to produce a sufficiently random stream of bits, which computers are notoriously bad at. On Linux we have the /dev/random and /dev/ urandom nodes (go on, run the cat command on them), which both harvest entropy gathered from (among other sources) keyboard and mouse input in order to augment a pseudorandom number generator (PRNG). This is why it’s good practice to make erratic mouse gestures and batter the keyboard when running, for example, the ssh- keygen command. A very early version of Netscape contained a weak PRNG that was seeded using the time of day and process ids. Since an attacker would be able make educated guesses as to these variables, the supposedly randomly generated SSL keys could be broken. In 2008 sysadmins were sent into a widespread panic when it was revealed that OpenSSL was generating weak keys, and had been doing so for two years.

More recently, Ed Snowden has revealed that the NSA paid RSA security to use a generator called Dual EC DRBG as the default in their software. The constants that the NSA recommends to initialise this generator with are suspected to have been contrived in such a way as to provide a back door into the algorithm. Besides ciphers, an important concept is that of a hash function. This scrambles an input to a fixed length output (so if the input is longer than the output there could be collisions) in a one-way manner. Hashed passwords in Linux are stored in /etc/ shadow. Originally the MD5 hashing algorithm was used, but nowadays SHA-512 is becoming the standard. Often we hear news of hackers managing to obtain databases, which often contain hashed passwords.

If you are in possession of a large database, the popular John the Ripper password cracker is able to weed out any weak passwords in a matter of minutes. For research purposes we ran it on a real world database (which has several thousand users), and managed to get 2,500 passwords over the course of a few hours. Other tools such as oclHashcat can leverage GPU power as well, so database security is important, as is changing your password if it is compromised.

In sum, we have seen great changes in how we encrypt our secrets, but it’s important to see how we have been inspired by the past. Unfortunately, we make the same mistakes too – whenever security is breached, it is far more likely to be due to poor security practice than weaknesses in the cipher. Misconfigured servers, phishing attacks, malicious or lazy operators are by far the greater problem. ?

Cryptography: Old and New ( part 1 )

For as long as there have been stories there have been secrets – words unspoken for tactical advantage or for fear of reprisal. Secrets often need to be sent afar, and their remaining secret en route is of paramount importance. So it was when Xerxes’ attack on Sparta was thwarted by Demaratus (a Greek exile living in Persia, whose warning message was sent to Sparta hidden on an apparently blank wax tablet).

And so it is when you send your credit card details across the ether to pay for gadgets, snacks or socks. Most people will likely be familiar with a substitution cipher, in which one letter is replaced by another. The best- known of these is the Caesar cipher, in which each letter is replaced by one a fixed distance further down the alphabet, wrapping around when one runs out of letters. It is said that Julius Caesar used this method, replacing A with D, B with E, and so on, wrapping around with A replacing X, whereas his nephew Augustus favoured a shift of just one letter, in which A is replaced by B, B by C etc, but with no wraparound, so that Z is replaced by the symbol AA.

The Kama Sutra also describes, among other rather more interesting tricks, the art of mlecchita-vikalpa (secret writing). It details a substitution cipher in which letters are paired and interchanged by a fixed random scheme, so that lovers can “conceal the details of their liaisons”.

An even older substitution system is Atbash, originally found in old (circa 500 BC) Hebrew texts. Here the first letter of the alphabet, aleph, is replaced by the last, tav; the second, beth, by the second to last, shin, and so on, effectively reversing the alphabet.

The latinic equivalent is interchanging A and Z, B and Y, and so forth. The ROT13 system (a Caesar cipher with a shift of 13) is still used on some websites and newsgroups to obfuscate plot spoilers, punchlines or naughty words. These monoalphabetic substitution ciphers (MSCs) are not in any way cryptographically secure by today’s standards, but in their time they were likely effective enough – the highway bandits of Caesar’s time being likely illiterate, unlike the masterful wordsmiths of the modern internet.

These ciphers do contain a germ of the idea of the modern cryptographic key, though. Whether it’s the length of the shift in a Caesar cipher, the dimensions of the Scytale, or the pairings used in the Kama Sutra (no, not those pairings), knowledge of the method of encryption, together with the key, allows one to decipher the message. We have 26 possible keys (including the trivial zero-shift) for a Caesar cipher, whereas ROT13 and Atbash are essentially single-key systems. The Kama Sutra cipher has a fairly large keyspace – there are about 8 trillion (8 followed by 12 zeroes) unique ways of pairing the alphabet.

The general MSC has an astounding number of possible combinations (26 factorial – about 4 followed by 26 zeroes – or a little more than 88-bits in modern binary terms), but size isn’t everything… The Arab polymath Al-Kindi, in a ninth-century manuscript titled On Deciphering Cryptographic Messages, gave the first description of breaking MSCs by frequency analysis – exploiting the fact that in an ‘average’ message, some letters will occur more frequently than others.

For example, in English the letter ‘e’ occurs with a relative frequency of about 13%, followed by ‘t’ with 9%, and so on. This is why Scrabble scoring is the way it is – the more common the letter, the less it scores. Other languages have different letters and frequencies, but the principle remains the same: replace the most frequently occurring letter in the ciphertext with the most frequently occurring letter in the language, then repeat for the next most frequent letter, and continue until you are able to fill in the blanks.

The original message might not have exactly the same letter frequencies as the language, but provided it’s long enough it will at least be close enough that decryption will be possible with a little tweaking. The discovery of the 1586 Babington Plot (which sought to assassinate Queen Elizabeth I) led to Mary Queen of Scots and her co-conspirators being executed after their correspondence was decrypted by renowned codebreaker Thomas Phelippes. Letters between Mary and Babington had been encrypted by substitution using symbols mostly from the Greek alphabet, and Phelippes was able to forge an addendum to one of Mary’s letters requesting the identities of the co-conspirators.

Once they were thus incriminated, heads were off’d. A milestone in the history of cryptography was the invention of the so-called Vigenère cipher in 1553. This was actually the work of cryptologist Giovan Battista Bellaso, who built on the ideas of Trithemius and Alberti. Vigenère did in fact publish a stronger autokeying cipher in 1586, but history has misattributed this earlier cipher to him. The cipher is a polyalphabetic substitution cipher which uses a keyword to switch cipher alphabets after each letter. Each letter is encrypted by a Caesar cipher with shift determined by the corresponding letter of the keyword.

This (providing the keyword has more than one unique letter) thwarts traditional frequency analysis. The cipher was considered so strong that it was dubbed le chiffre indéchiffrable , and indecipherable it remained until work by Babbage and Kasiski in the mid-19th century. Their efforts centred on isolating the length of the key: once that is known then the ciphertext can be separated into as many chunks; each chunk will be encrypted by a different Caesar shift, which is easily dealt to by frequency analysis.

Later, this cipher was augmented with the letter V to make the imaginatively-titled ADFGVX cipher. In 1918, in a phenomenal tour- de-force, the French cryptanalyst Georges Painvin managed to decrypt an ADFGVX- encrypted message which revealed where the German forces were planning to attack Paris. Painvin lost 15kg of body weight over the course of this crypto-toil. One may wonder if anyone can make a truly unbreakable cipher, and one may be shocked to learn that such a thing already exists.

This triptych shows another WWI example: the ADFGX cipher (these letters were chosen because they’re different in Morse code). The first plate is the fractionating key: it encodes each letter of our alphabet (sans the letter z because the LXF style guide doesn’t like it) into a bigram, so that our message ‘kernel panic’ encodes to XF GA DA GF GA AG DX GD GF FD FA (the space is ignored). In the second plate, we fit this message onto a grid below a second keyword, ‘LINUS’, which is our transposition key. In practice, a longer transposition key would have been used, and both keys would be changed according to a daily code book. We rearrange the columns by putting the second key in alphabetical order, and then read off the ciphertext column-wise. Thus our encoded message is FGGGA XAADF GFDF DAGD AGXF.

That it has been patented since 1917 may leave one so utterly aghast as to impinge permanently on one’s health, but this is fact nonetheless. The chap responsible (for the patent at least) was Gilbert Vernam, and his invention is known as the One Time Pad. The trick is to ensure that there is as much key material as there is plaintext, that the key material is entirely random and perfectly secret, and no part of the key material is used more than once. In practical terms, though, Vernam’s system is largely useless.

Generating truly random material is difficult, as is distributing a huge amount of it in secret and ensuring its destruction post-use.

Enigmatic mathematics

Wartime cryptography relied heavily on codebooks which contained daily keys, and these had a bad habit of falling into enemy hands. Once such a breach occurred and news of it reached HQ, generals were faced with the tremendous logistical problem of alerting relevant personnel as to the breach and then manufacturing and distributing new key material. Long-range naval missions often failed to receive this, necessitating that messages be retransmitted using old keys. This exchange was sometimes intercepted, providing clues as to the new key.

During World War I, the decrypting of the Zimmerman telegram (which invited Mexico to ally with Germany) was instrumental to American involvement in the war. By World War II the Germans had upgraded the Enigma series of machines to present a sufficient cryptographic challenge to Bletchley Park. Polish researches had broken the original design as early as 1932, and just prior to the outbreak of war they shared their intelligence with the British. Alan Turing designed the Bombe machine, which by 1940 was doing a fine job of breaking Jerry comms.

The Enigma machine, despite having a huge number of rotor, plugboard and stecker settings, had a weakness in that a letter was never encrypted to itself. This vastly reduced the amount of work that the Bombe and the computers (usually women with a good eye for detail and skill at crossword puzzles) had to do. After a letter was typed on the Enigma, the cipher alphabet was changed by the rotor mechanism, in a manner not dissimilar from the Vigenère cipher.

There were other layers of encryption too, but a lot of these were constant settings made redundant when Enigma machines were captured. By the end of the war there were around 200 Bombes in use throughout England. The Americans, being in a much better position for obtaining supplies, were able to build and design 125 much faster Bombes, and the Allies were able to farm out work to these remote behemoths via (encrypted) cable.

Turing’s genius notwithstanding, much of the Enigma traffic was decrypted thanks to sloppy operational security. Message keys could have been changed with every transmission but were not, or when they were the change was only slight and easily guessed. Numbers were often spelled out, so ‘einsing’ was a common technique – looking for occurrences that might decrypt to ‘eins’.

If numerals had been allowed, this technique would have failed. In the 1970s, two developments brought the cryptography game into the computer age. The first of these developments was the Data Encryption Standard, a block cipher based on work by Horst Feistel at IBM. Prior to its standardisation, it was slightly modified at the behest of the NSA. With no reasons being cited for these agency-mandated changes, suspicions were raised about a possible back door.

AES was introduced as a replacement for DES in 2001. To date it has defied all cryptanalytic efforts to find weaknesses. One reason for its selection was its relatively simple structure. There are four main layers, repeated over several rounds. With a bit of imagination, one can see echoes of the ADFGX cipher in the ShiftRows stage. The SubBytes stage is the only non-linear part of the cipher. Typically linear operations are much quicker to carry out, but without a non-linear stage a cipher will be trivial to break using the methods introduced by Matsui.

Two decades later, it emerged that the opposite was true: the S-boxes of the original cipher were susceptible to a technique called ‘differential cryptanalysis’, which at the time (cryptography being considered a munition) was classified. The NSA changes made the cipher more resistant to the technique, although they did also recommend a smaller 48-bit, as opposed to 64-bit, key size. Being the first publicly available cipher, DES became the subject of intense scrutiny and in many ways bootstrapped serious academic study of cryptography.

While the thousands of pages of journal articles on the subject provide all manner of theoretical attacks on DES, by far its most serious weakness is the short key size. IBM and the NSA eventually compromised on a nominal 64-bit key, but eight of these 64 bits were redundant checksum bits. At the time of its introduction this was probably sufficient, but in the early 1990s machinery was proposed that could brute-force a key within hours. In 1997 an Internet-wide project successfully cracked a DES key for the first time. In 1998, the Electronic Frontier Foundation built a device (for a princely $250,000) which successfully cracked a key in a little over two days.

Among the other attacks on DES it’s worth mentioning Matsui’s ‘linear cryptanalysis’. The attack involves building up approximations to parts of the cipher by finding modulo 2-linear expressions that hold with a probability significantly different from 0.5. By collecting a huge number (2 43 ) of plaintext-ciphertext pairs, one can deduce a sufficient number of bits of the key that the remainder can be brute-forced.

Linear expressions can be found speedily thanks to the Walsh-Hadamard transform, and modern ciphers all are very careful to include a heavily nonlinear component to mitigate against these attacks. In some ways one can look at Matsui’s work as an abstraction of basic letter frequency analysis, using characteristics of the cipher rather than the language, and 1s and 0s rather than characters.

Encrypt your hard drive

There’s been a lot of talk in the past year or so about the security of your internet data, first with the Edward Snowden revelations and later with the Heartbleed bug in OpenSSL and Shellshock, the Bash vulnerability. There was also a lower-profile bug in GnuTLS discovered shortly before Heartbleed. As a result of all this, we’re paying more attention to the security of our data in transmission – but what about when we store it?

We’ve previously mentioned TrueCrypt , which is great for encrypting removable storage (as long as you use the right version, see p51 for details ), especially because it’s available for Windows and Mac too, but if you’re really concerned you may want to encrypt your entire home directory, or even the whole hard drive. This is not just about protection from black hat hackers: what if you have personal, business or otherwise confidential information stored on your laptop and it’s lost on the train or taxi or simply stolen? There are two popular types of encryption that are supported by the Linux kernel: dm-crypt and ecryptfs.

Running cryptsetup –help not only shows the commands you can use, but also displays a list of the available hashes and ciphers.

 

The latter is an encrypted filesystem that sits on top of a standard filesystem. If you mount the ‘lower’ filesystem, you’ll see all the files but their contents, and usually their names, are encrypted. It works at the directory level, and other directories on the same filesystem can be left unencrypted or encrypted separately. This is the method used by Ubuntu, among others, to encrypt users’ home directories.

The other method, which is the one we’ll look at here, is dm-crypt , and it works at a lower level, encrypting the block device that the filesystem is created on. A benchmark run by Phoronix showed better performance from a whole disk that was encrypted with dm-crypt than with using ecryptfs on home directories.

A stack of blocks

Before we look at the encryption, it’s important to understand how block devices work. Block devices are the system’s interface to storage hardware, for example /dev/sda1. Underneath the block device is the hardware driver, such as a SATA driver, and then the hardware itself. The operating system then works with the block device to create a filesystem on it.

That is the usual view of block devices, but they can be much more. In particular, a block device can be an interface to another set of block devices – they can be stacked. You already do this: you have a filesystem on /dev/sda1 (a disk partition) that is a block device referencing /dev/sda (the whole disk). Technologies such as RAID and LVM (Logical Volume Management) also stack block devices.

You could have LVM on top of a RAID array which itself is stacked on the block devices of the individual disks or their partitions. Whole device encryption using dm-crypt works like this: it creates a block device on top of your storage medium which encrypts data as it is saved and decrypts it as it is read.

You then create a standard filesystem on top of the encrypted block device and it functions just the same as if it had been created on a normal disk partition. Many distros have an option to install to an encrypted disk, but here we’ll look at creating and working with dm-crypt devices directly to see how they work, as opposed to some black magic that’s been set up by the installer.

Dm-crypt uses the kernel’s device mapper subsystem (hence the name) to manage its block devices and the kernel’s cryptographic routines to deal with the encryption. This is all handled by the kernel, but we need userspace software to create and manage dm-crypt devices, with the standard tool being cryptsetup . It’s probably already installed on your distro – if not, it will definitely be in its main package repositories.

Encrypting something

Cryptsetup can create two types of encrypted devices: plain dm-crypt and LUKS. If you know you need to use plain dm-crypt , you already know far more about disk encryption than we’ll cover here, so we’ll only look at LUKS, which is the best choice for most uses.

Experimenting with filesystems, encrypted or otherwise, risks the data on the disk while you’re learning. All examples here use /dev/sdb, which we take to be an external or otherwise spare device – do not try things out on your system disk until you’re comfortable doing so.

All these commands need to be run as root, so log into a terminal as root with su , or prefix each command with sudo . Let’s start by creating an encrypted device: cryptsetup luksFormat /dev/sdb1 This sets up an encrypted partition on /dev/sdb1 after prompting you for a passphrase.

You can open the encrypted device with: cryptsetup luksOpen /dev/sdb1 name This will ask for the passphrase and then create the device in /dev/mapper, using the name given on the command line. You can then use /dev/mapper/name as you would any disk block device: mkfs.ext4 /dev/mapper/name mount /dev/mapper/name/ /mnt/encrypted The usual rules about passphrases apply: keep them long and varied, hard to guess but easy to remember. If you lose the passphrase, you lose the contents of the device.

Keeping the keys safe

A LUKS encrypted device contains eight key slots. Keys are another term for passphrases, so you can assign multiple passphrases to a device, which is useful if you maintain multiple systems and want to have a master passphrase that only you know.

When you use LuksFormat , the passphrase you give is stored in slot 0. You can then add another with: cryptsetup luksAddKey /dev/sdb1 You’ll be asked for an existing passphrase and then prompted for the new one.

A key can also be the contents of a file instead of a passphrase; the file can contain anything but it’s usual to use random data: dd if=/dev/urandom of=/path/to/keyfile bs=1k count=4 chmod 0400 /path/to/keyfile cryptsetup luksAddKey /dev/sdb1 /path/to/keyfile cryptsetup luksOpen –key-file /path/to/keyfile /dev/sdb1 name It goes without saying that keyfiles should be stored securely, readable only by root and not stored on the encrypted device.

Personally, even if a volume is always unlocked by key file, I prefer to also set a very strong passphrase, recorded in a secure place, to guard against the key file ever becoming corrupted or otherwise inaccessible. Keys can also be changed or removed with the luksChangeKey and luksRemoveKey commands.

More options

So far, we’ve stuck with the default encryption choices, but cryptsetup accepts –hash and –cipher options. The former sets how the passphrases should be hashed, while the latter selects the encryption method.

Use cryptsetup luksDump to find out about a LUKS encrypted partition. There are also backup and restore commands to keep a copy of the LUKS information.

The defaults are usually more than sufficient but you can see the available options by using: cryptsetup –help These options are needed only with LuksFormat .

Once the encrypted device has been created, cryptsetup will automatically use the correct settings when opening it. It’s wise to stick with popular ciphers and hashes unless you have a very good reason for using something different. A less frequently used method is more likely to harbour unknown deficiencies due to the fewer number of people using it, as happened recently with the Whirlpool hash implementation in the libcgrypt library used by cryptsetup .

Fixing the implementation caused problems for those with systems that were already using the broken hashes. Another reason for sticking to commonplace methods is portability. This doesn’t matter for an internal disk, but if you want to use an encrypted disk on another system, that must have the used hashes and ciphers installed too. ?

Scripting languages part 2

Community support

DevOps, cloud deployment, test- driven development and continuous integration – the demands on a sysadmin change and evolve, but the requirement to learn something new is constant. Everyone uses Bash to some extent but, you’ll need to learn Bash plus one other. Perl was the traditional Swiss Army chainsaw of Unix admins through the ‘80s and ‘90s, gradually losing ground to Python and then Ruby over the last decade or so.

Anyone who started work in the ‘90s or earlier will be comfortable with it, so finding someone to help with your scripts is often not a problem. However, the world doesn’t stand still, and many tech businesses have standardised on Python, which is used extensively at Google, for example.

Much of the software necessary for modern sysadmin work is Python based although the same can be said of Ruby. Ruby benefits from being the basis of Chef and Puppet, as well as Vagrant and Travis CI, meaning a little familiarity will be helpful anywhere that uses them for deployment.

The web frameworks and testing tools written in Ruby have popularised the language at many of the younger web companies. NewLISP has a much smaller community supporting it, and there aren’t many ready made solutions and you may know no-one who uses it. The keenness of the online community goes some way to ameliorate this deficiency, but you have to ask who will maintain your tools when you leave a company?

Programmability

Before reaching 1,000 lines of code, Bash scripts become unmanageable. Despite its procedural nature, there are attempts to make an object-orientated (OO) Bash .

There’s more than one library for that – CPAN is a useful resource for Perl.

We don’t recommend it, we think it’s better to modularise. Functional programming (FP) in Bash (http://bit. ly/BashFunsh) is also impractical. Perl’s bolted on OO won’t be to everyone’s taste, but does the job. Perl has fully functional closures, and despite syntactical issues, can be persuaded into FP – just don’t expect it to be pretty. For that you should wait for Perl 6.

Python is equally happy with imperative, OO and also manages FP. Functions are first class objects but other features are lacking, even if its list comprehension is very good. Mochi, the FP language (http://bit.ly/FPMochi), uses an interpreter written in Python 3. Ruby is designed as a pure OO language, and is perhaps the best since Smalltalk. It can also be persuaded to support a functional style of programming.

But to get FP code out of Ruby, you’ll have to go so far from best practices that you should be using another language entirely. This brings us neatly to NewLISP, an elegant and powerful language with all the functional features at your fingertips. NewLISP uses a pseudo OO implementation in the form of functional-object oriented programming (FOOP), but this doesn’t mean, however, that it can cut it for real OO programming.

Extending the language

None of these scripting languages are as bloated with classes as, for example, Java so that you’ll need to use non-core libraries (or modules as they are sometimes called) for writing many scripts. How comprehensive these are, and how easy they are to manage with your script varies greatly.

Perl continues to impress with the mind-boggling choice to be found on CPAN, but its ‘there’s more than one way to do it’ approach can leave you easily overwhelmed. Less obvious, is the magnitude of Bash extensions created to solve problems that are perhaps not best suited to any sh implementation.

We can’t help acknowledging Ruby’s power and charms.

Python has excellent library support, with rival choices considered very carefully by the community before being included in the core language. The concern to “do the right thing” is evident in every decision, yet alternate solutions remain within easy reach. At least the full adoption of the pip package manager, with Python 3.4, has ensured parity with Ruby. RubyGems provide the gem distribution format for Ruby libraries and programs, and Bundler which manages all of the gems for dependencies and correct versions. Your only problem will be finding the best guide through Ruby’s proliferation of libraries.

Read around carefully. NewLisp is not a large language, but it’s an expressive one, accomplishing much without the need of add-ons. What modules and libraries that there are address key needs, such as database and web connectivity. There’s enough to make NewLISP a useful language for the admin, but not in comparison to the other four choices.

Network security

Penetration testing and even forensic examination after an attack will fall under the remit of the hard-pressed sysadmin in smaller organisations. There are enough ready made tools available that you can roll everything you may need into a neat shell script, kept handy for different situations, but writing packet sniffers or tools for a forensic examination of your filesystem in Bash isn’t a serious option.

Perl has lost some security community mindshare since the early days of Metasploit, but the tools are still there, and are actively maintained by a large user group who aren’t about to jump ship to another language. Perl has tools like pWeb – a collection of tools for web application security and vulnerability testing – which is included in distros, such as Kali and Backbox. Tools such as Wireshark are a powerful aide to inspecting packets, but sometimes you’ll need to throw together your own packet sniffer.

NewLISP has impressive networking features, even if it lacks the pen-testing tools of the others.

Python not only has Scapy, the packet manipulation library, but provides a socket library for you to easily read and write packets directly. Ruby’s blocks (write functions on-the-fly without naming them) and other features are great for writing asynchronous network code, and its rapid prototyping matches (and even beats) Python. But Ruby’s biggest boon is Metasploit, which is the most-used pen-testing software.

In terms of ready rolled tools, you can mix and match as needed, but Perl, Python and Ruby all provide everything you need to quickly examine a network for weaknesses or compromises on-the-fly. Note: Python is featured in more security-related job adverts now.

Last, NewLISP isn’t well-known among penetration testers and grey hat hackers, but thanks to the networking built in to the language, a function call and a few arguments will create raw packets for pen testing. Once more, NewLISP has clear potential but suffers from its relatively tiny user base.

Web native scripts

Such of a sysadmin’s life has migrated to the web, so you’ll need a scripting language that has kept pace. We examined both ease of writing our own code, and finding available solutions for doing anything from web interfaces to system stats. What’s noticeable about these languages is the difference in expressiveness and style to produce similar results.

However, this is, once again, secondary to personal preference and local support for many admins. Ruby is quick and enjoyable; Python ‘feels right’ probably due to it being more human readable; newLISP is astonishingly powerful. But these observations remain partisan clichés without a supportive and maintainable environment to use and develop the code for your own networks.

  1. Bash

    While Bash will be no one’s first choice for a web programming language, it’s good to know that when your server doesn’t provide for your first choice you can fall back on it thanks to bashlib. This a shell script that makes CGI programming in the Bash shell somewhat more tolerable.
    Your script will be full of echo statements, interspersed with your commands to produce the desired output. Security considerations mean we wouldn’t recommend running this on the open Internet, but it’s worth bearing in mind that Bash works well as a prototyping language.
    It’s easy to fill a text file with comments describing the broad structure that you want, then fill in the gaps – testing snippets interactively and pasting into www.shellcheck.net to check your code as you go. You’ll soon be up and running with a proof of concept.

  2. newLISP

    Code Patterns, by NewLISP creator Lutz Mueller, is available on the www.newlisp.org website and has chapters on HTTPD and CGI, as well as TCP/IP and UDP communications. If you add in the section on controlling applications, and you’ll have everything to get you started.
    NewLISP’s built-in networking, and simple (or lack of) syntax, makes it surprisingly easy to generate HTML pages of results from, for instance, your monitoring scripts. For a ready built framework, newLISP on Rockets – which uses Bootstrap, jQuery and SQLite – combines rapid application development with good performance. NewLISP on Rockets provides several functions, from (convert-json- to-list) via (twitter-search) to (display-post-box), which will help you add web functionality.
    We’re impressed but we remain concerned by the small size of the community and the intermittent pace of development.

  3. Perl 5

    Perl was the first web CGI scripting language and has more or less kept pace with the times. It certainly has the libraries, and enough examples to learn from, but with no dominant solution you’ll have to pick carefully. Catalyst, Dancer, and Mojolicious are all good web application frameworks.
    More likely you’ll find everything you need in CPAN. You can glue together a few of the libraries – many of which are already collected together in distros – to handle a pipeline of tasks, such as retrieving XML data, converting the data to PDF files and indexing it on a web page.
    Perl’s traditional CGI interface is still available, and despite better performing alternatives abstracted through PSGI, you may find that use CGI; is all you need to web-enable your script, and remember: ‘there’s always more than one way to do it’.

  4. Python

    Python’s Web Server Gateway Interface (WSGI), which was defined in PEP 333, abstracts away the web server interface, while WSGI libraries deal with session management, authentication and almost any other problem you’d wish to be tackled by middleware.
    Python also has plenty of full- stack web frameworks, such as Django, TurboGears and Pylons. Like Rails, for some purposes you may be better off coding web functionality onto an existing script. But Python’s template engines will save you from generating a mess of mixed HTML and Python.
    Python has many other advantages, from the Google App Engine cloud with its own Python interpreter, which works with any WSGI- compatible web application framework, for testing of scalable applications to supporting a clean style of metaprogramming.

  5. Ruby

    Don’t imagine for one moment that Rails is a panacea for most sysadmin problems. It’s not. And while Sinatra certainly makes it easy to roll out anything web-based in Ruby, even this is overkill for most purposes.
    That said, Rails does a good job of getting code up quickly and just doesn’t drown in all that magic, generated code. Ruby is ideal for getting any script web-enabled, thanks to gems that are written by thoughtful people who have made sane decisions.
    Putting a web interface on our backup script, for example, was fun, but distracting as we played with several gems, eg to export reports to Google spreadsheets. Tools like nanoc , which generate static HTML from HAML, and some of the reporting gems complement the language’s expressiveness, and make adding any functionality to scripts a breeze.

Scripting languages part 1

Every admin loves time-saving shortcuts, and carries a selection of scripts from job to job, as well as inheriting new ones when arriving in post. The question any new admin asks is which is the best language to learn? (Followed by, where’s the coffee?) Veterans of language wars should know that the best language question rarely has a simple or definitive answer, but we thought it would be well worth comparing the most useful choices to make your Linux life easier. Most scripting languages have been around longer than you think.

For example, NewLISP was started on a Sun-4 workstation in 1991. They’ve borrowed from each other, and elsewhere, and accumulated a long legacy of obsolete libraries and workarounds. Perl’s Regular Expressions, for instance, are now found everywhere, and in some cases better implemented elsewhere.

So what matters most? How fast the script runs, or how quickly you can write it? In most cases, the latter. Once up and running, support is needed both from libraries or modules to extend the language into all areas of your work, and from a large enough community to support the language, help it keep up with trends, and even to innovate it. So, which scripting language should you learn to improve your Linux life this year?

How we tested…

Comparisons, they say, are invidious. This is certainly true for programming languages, where personality and local support are, at least, of equal import to criteria such as speed, and the level of support for different paradigms.

Given this, we’re presenting a mixture of facts, collective opinions and our own prejudices, but it’s a basis for further investigation.

The key to a scripting language’s usefulness to the sysadmin lies not just in how easily it helps solve problems, but in how many of the solutions have already been written, and are available to download and adapt, and preferably well-documented.

We tried to work across the range of versions installed on a typical network, but insisted on Python 3. Other than that, we’ve tried to stay in the context of working with what you’re likely to find on your network.

The learning curve

The key questions are: how easy is the language to pick up? Are the learning resources at least adequate? Even if these two questions are answered in the positive, they still need to be backed up by a helpful community to assist you in quickly producing something useful, and help maintain that initial enthusiasm as you hit inevitable problems.

To produce a backup script and test scripts in each of the languages, we started by browsing Stack Overflow. But downloading random code means no consistency between Posix (pure Bourne Shell) scripts, modern Bash, and legacy code that occasionally fails.

From MOOCs to the bookshop, Python learning resources are everywhere.

Fortunately, www.shellcheck.net is a great tool for checking the correctness of scripts, and teaches you best practice as it corrects them. The Linux Document Project’s (perhaps overly) comprehensive Advanced Bash Scripting Guide (www.tldp.org/LDP/ abs/html) is also excellent and will help you quickly gain confidence.

Perl’s online and built-in documentation is legendary, but we started by running through an exercise from the classic O’Reilly admin book, Running Linux , then leapfrogged the decades to No Starch’s recent Perl One- Liners by Peteris Krumins.

Those who eschew the book form should try http://perlmonks.org, a source of cumulative community wisdom. Recent efforts at getting youngsters learning through Code Club (www. codingclub.co.uk) and the rest of us through PyConUK education sprints and open data hackdays have shown Python to be easily picked up by anyone.

But out-of-date advice, such as the many ways of running subprocesses which persist for compatibility reasons, means careful reading is needed, and it’s yet another good reason for starting with Python 3, not Python 2.

Head to www.python. org/about/gettingstarted for large list of free guides and resources. Ruby is also an easy sell to learners, and before Rails, command-line apps were what it did best.

David B. Copeland’s book, Build Awesome Command Line Applications in Ruby will save you hours of wading through online documentation, but we were able to get up and running on our test scripts with a couple of web tutorials.

Last, we come to NewLISP: a challenge to programmers schooled only in non-LISP family languages, but you’ll be amazed by what it manages to accomplish with just lists, functions and symbols. We dived right in with the code snippets page on http://newlisp.org, adapting to build our backup script, and were rewarded with terse, powerful code, that was easier to read than its equally compact Perl counterpart.

Version and compatibility

The question here is: have I got the right version? Lets start with Bash . Every modern Linux distro ships with a version that will run your scripts and anyone else’s. Bash 4 , with its associative arrays, coproc (two parallel processes communicating), and recursive matching through globbing (using ** to expand filenames) appeared six years ago.

Bash 4.2 added little, and is four years old and Bash 4.3 ‘s changes were slight. Perl is still included in the core of most distros. The latest version is 5.20 (with 5.22 soon to appear), but many stable distros ship with 5.18. No matter, you’re only missing out on tiny improvements, and just about every script you’d want to write will be fine. The switch from Python 2 to 3 still catches out the unwary.

As the Unix shell dates back decades, you will find that recent Bash versions contain a few unexpected syntax changes.

Run Python 3 if you can and check the documentation if you come unstuck. Python 3.3 is our baseline for Python 3 installs and Python 3.4 didn’t add any new syntax features.

Ruby version changes have caused enough problems that painless solutions have appeared, rvm enables you to run multiple versions of Ruby, and bundle keeps track of the gems you need for each script. NewLISP’s stability and lack of third- party scripts is an advantage here. We can’t, however, guarantee every script will run on the latest versions.