System Administrator Articles – Alex on Linux

Python for bash replacement

Alexander Sandler — Sun, 26 Dec 2010 16:25:37 +0000

When I started learning Python, I was looking for a programming language that would replace BASH, AWK and SED. I am a C/C++ programmer and as such I better invest my time into studying C and C++. Instead, every time I needed some complex script I opened up a book on BASH and refreshed my knowledge. And since bumping into boundaries of what BASH can do is relatively easy, I always opened awk/sed book few minutes later.

Actually, this is quiet common. Once in a while I see my colleagues, just like myself, open up a book on BASH. The problem is that because we don’t actively program BASH, the knowledge and experience that we gain from this experience wear out over time. So next time we approach, so we have to repeatedly study BASH stuff over and over again. And again, this is not only BASH I am talking about, but also AWK and SED.

It is utterly broken state of affairs and I wish there was a solution. Unfortunately there is no solution yet. The good thing is that with some effort the solution may arise. I am talking about Python programming language.

Needless to say that Python can do everything that BASH can do. However, it was never designed as shell environment and as such it is hardly convenient.

For instance, running external commands is easy. You should use subprocess module to run the command and there you go, you have both return value and the output. But this is miles away from convenience of simply typing the command and hitting enter.

Another issue is input/output redirections and piping. Regular shell does this with ease. Python obviously does this as well, but not without a hassle of importing the right module and creating the right object, etc.

On the other hand things are not that bad. For instance implementing command line should be somewhat easy in Python because it has native support for completion (which it uses in its own command line interface).

There is a project that makes few steps in the right direction. I am talking about IPython of course. Unfortunately, I doubt that IPython will ever be able to replace BASH. It was never their goal. Even though IPython developers implemented some features that make IPython command line a little down to earth, I am in doubt they will continue moving in this direction. Also, I must say that IPython evolves quiet slowly.

What do you think? Will Python ever be able to replace BASH?

MSI-X – the right way to spread interrupt load

Alexander Sandler — Wed, 18 Nov 2009 07:46:11 +0000

When considering ways to spread interrupts from one device among multiple cores, I can’t not to mention MSI-X. The thing is that MSI-X is actually the right way to do the job.

Interrupt affinity, which I discussed here and here, has a fundamental problem. That is inevitable CPU cache misses. To emphasise this, think about what happens when your computer receives a packet from the network. Packet belongs to some connection. With interrupt affinity the packet would land on core X, while the chances are that previous packet on the same TCP connection has landed on core Y (X ≠ Y).

Handing the packet would require kernel to load TCP connection object into X’s cache. But, this is so ineffective. After all, the TCP connection object is already in Y’s cache. Wouldn’t it be better to handle second packet on core Y as well?

This is the problem with interrupt affinity. From one point of view we want to spread interrupts to even the load on cores. From another point of view, doing simple round robin isn’t enough. The little fella that decides where each interrupt goes, should be able to look into the packet and depending on what TCP connection it belongs to, send the interrupt to core that handles all packets that belong to this connection.

Ideally, NICs should be able to:

Look into packets and identify connections.
Direct interrupt to core that handles the connection.

Apparently, this functionality already here. Devices that support MSI-X do exactly this.

Meet MSI-X

MSI-X is an extension to MSI. MSI replaces good old pin based interrupt delivery mechanism.

Each IO-APIC chip (x86 permits up to 5) has 24 legs, each connected to one or more devices. When IO-APIC receives an interrupt, it redirects the interrupt to one of the local-APICs. Each local-APIC connected to a core that receives an interrupt.

MSI provides a kind of protocol for interrupt delivery. Instead of raising signal on pins, PCI cards send a message over MSI and IO-APIC translates the message into right interrupt. Theoretically this means that each device can have number of interrupt vectors. In reality, plain MSI does not support this, but MSI-X does.

Modern high-end network cards that support MSI-X, implement multiple tx-rx queues. Each queue tied up to an interrupt vector and each NIC has plenty of them. I checked Intel’s 82575 chipset. With igb driver compiled properly, it has up to eight queues, four rx and four tx. Broadcom’s 5709 chipset provides eight queues (and eight interrupt vectors), each handling both rx and tx.

In kernel 2.6.24, kernel developers introduced new member of struct sk_buff called queue_mapping. This member tells incoming NIC driver what queue to use when transmitting the packet.

Before transmitting the packet, kernel decides what queue to use for this packet (net/core/dev.c:dev_queue_xmit()). It uses two techniques to do so. First, kernel can ask NIC driver to provide a queue number for the packet. This functionality, however, is optional in NIC drivers and at the moment both Intel and Broadcom drivers don’t provide it. Otherwise, kernel uses a simple hashing algorithm that produces 16 bit number from two ip addresses and (in case of TCP or UDP) two port numbers. All this happens in function named simple_tx_hash() in net/core/dev.c.

When receiving packets, things are even easier because NIC firmware and the driver decide what queue to use to introduce the packet to the kernel.

Using this simple technique kernel and modern NIC’s can verify that packets that belong to certain connection land on certain queue. Using interrupt affinity binding techniques you can bind certain interrupt vector to certain core (writing to smp_affinity, etc). Thus you can spread interrupts among multiple cores and yet make sure there are no cache misses.

Why interrupt affinity with multiple cores is not such a good thing

Alexander Sandler — Thu, 17 Sep 2009 12:44:23 +0000

One of the features of x86 architecture is ability to spread interrupts evenly among multiple cores. Benefits of such configuration seems to be obvious. Interrupts consume CPU time and by spreading them on all cores we avoid bottle-necks.

I’ve written an article explaining this mechanism in greater detail. Yet let me remind you how it works in two words.

Every x86 motherboard has a chip called IO-APIC. This is a device that controls interrupt delivery within your system. It knows how many CPUs are in your system and can direct various interrupts to various CPUs. It uses so called local APIC-ID as an identifier of the processor.

It has two modes of operation. In one mode it sends interrupts from certain device to single, predefined core. This mode of operation called fixed/physical mode. In another mode, it can deliver interrupts from certain device to multiple cores. The later mode called logical/low priority interrupt delivery mode.

When in logical mode, IO-APIC can deliver interrupts to up to eight cores. Source of this limitation is the size of the bitmask register that tells what CPUs should receive the interrupts. The bitmask is only eight bits long.

When considering the round robin-type of interrupt delivery mode (the logical mode), I cannot stop thinking about how it degrades performance.

You see, having burden of interrupt handling spread among multiple cores may solve some bottle-necks, but it creates a problem.

Consider network interface card for example. Lets say we have a TCP connection to some host out there. When packet arrives, the network card issues an interrupt and IO-APIC directs it to one of the cores. Next, the core handing the packet should fetch the TCP connection objects from the memory to its cache.

IO-APIC does not guarantee that next packet that belongs to the connection will be handled by the same core. So, it is likely that two cores will have to work with TCP connection object. Both of them will have to fetch its content into their cache. This will cause cache coherency problems (cache misses). And as you can learn from the article I’ve written on misaligned memory accesses, accessing memory that is not in cache can take up to 30 times more time than accessing cached RAM.

Moreover, assuming TCP connection object is properly protected using synchronization techniques, one of the cores will inevitably have to wait for the other, adding unnecessary delay to packet processing.

My point is that round-robin style interrupt delivery can be quiet nasty on performance. It is much better to deliver interrupts from certain device to given core.

Luckily, smp_affinity interface, that I mentioned in my old article, allows you to bind interrupts from certain device to certain (single) core.

On some computers IO-APIC does not support logical delivery mode. This can be because of buggy BIOS or too many CPUs. On such computers physical interrupt delivery mode is the only thing that works, so binding single interrupt to single core is the only choice and the only thing you can do is switch the core from one to another.

My point is that round-robin style interrupt delivery can:

Malfunction
Cause performance degradation.

So, when it still might become useful, you may ask? It depends on what you do with your computer. Usually, you don’t need round-robin style interrupt delivery. You only need it if you know that your computer receives lots of interrupts and you have real-time applications.

In this case, scheduler (which has no idea about interrupts) can schedule thread that requires lots of CPU time to run on core that serves interrupts. Since interrupts has higher priority, the thread will receive less CPU time. In case of real-time application it may result in reduced responsiveness.

Even so, you can still assign all interrupts to one core and use thread affinity techniques to make sure that your application doesn’t use that core.

Backup and restore your Linux installation

Alexander Sandler — Wed, 08 Apr 2009 14:18:40 +0000

Quick linksBACK TO TOC

Backing up
Restoring backup up disk
Restore backed up partition

IntroductionBACK TO TOC

Backing up Linux installation and restoring it, is perhaps one of the most fundamental tasks that every system administrator has to deal with.

Here are some important points about backup methods that we will discuss in this article.

This article is about backing up from a command line.
Incremental and periodic backups are good for some things. Yet when it comes to an operating system installation, the whole point of backing up your system is to save you some time reinstalling it, in case something goes wrong.
Hence, spending couple of hours configuring incremental and periodical backup for your system just to save you an hour or two reinstalling, in case something goes wrong, seems rather irrational.
We will talk about how to save disk space and backup your system even if the hard disk that will hold the backup is smaller than hard disk with the installation.

If these bullets talk to you, then you’re reading the right article.

Creating the backupBACK TO TOC

What to backupBACK TO TOC

The actual command that does the backup is quiet simple. However, before backing up we have to decide what to backup. Here are our options.

We can backup content of the entire hard disk.
We can backup content of the Linux installation partition.

What is good for you depends on a structure of a partition table on the hard disk that contains Linux installation.

One common configuration (and my favorite) is when you have only two partitions on the hard disk, one for Linux installation and the other for swap. In this case it is probably wiser to backup your entire hard disk. Indeed, hard disk space occupied by swap partition will be wasted, but on the other hand restoring Linux installation from full hard disk backup is much easier. Easier means less commands to execute to restore the installation and this usually translates into smaller chances of unsuccessful data restoration.

Another common configuration is when you have several partitions on the hard disk. In this case, we may want to backup only the partition that contains Linux installation. The command that does the backup is nearly the same, in this case. But restoring the data will be more difficult. The chances are that you will not have any problem whatsoever, so don’t let those couple of extra commands to scare you.

Bottom line is that it depends on how much information you want to backup. If your hard disk, in addition to Linux installation, occupied by some data, it is probably wiser to backup only the Linux partition. In any case I’ll demonstrate you how to backup both, entire hard disk and the partition.

Figuring out device fileBACK TO TOC

Second thing you have to figure out is the device file that represents hard disk or partition that you want to backup. Usually, mount command without any arguments should give you the answer. Have a look:

alex ~ -> mount
/dev/sda1 on / type ext3 (rw,relatime,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
/proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
varrun on /var/run type tmpfs (rw,nosuid,mode=0755)
varlock on /var/lock type tmpfs (rw,noexec,nosuid,nodev,mode=1777)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
alex ~ ->

mount command, without any arguments, gives you a list of mounted file-systems and corresponding device files. For each file-system, it tells you on type . Since we are backing up Linux installation, we are looking for device file that mounted on root directory (/). In my case this is /dev/sda1.

Note that usually device files with number and the end represent a single partition. If you want to know device file of the entire hard disk, strip the number off. For instance, /dev/sda1 represents first partition on /dev/sda.

In case you’re still not sure what is the correct device file, device files that start with /dev/sd usually stand for SCSI and USB disks. Device files that start with /dev/hd stand for IDE disks. And remember, if you want to backup Linux partition, use device file whose name ends with a number. For entire hard disk, use device file whose name ends with a letter.

PreparationsBACK TO TOC

Before you make actual backup, you have to make sure that the hard disk or a partition you are backing up is not in use by any programs. This is important because if your hard disk is in the middle of something while being backed up, this will inevitably create a data corruption when you restore it. Depending on what kind of things you do with your hard disk, the corruption can be quiet serious.

The best way to avoid a corruption is to boot from a Live Linux CD (Knoppix, or Ubuntu Desktop installation CD). If you know that your hard disk is not heavily used, you may try to backup live, mounted hard disk. In this case, try to bring down services that may use the hard disk while its being backed up.

If you’re backing up mounted hard disk or a partition, run sync before doing the backup. This will flush the data from memory buffers to disks (Linux keeps portions of information from hard disks in memory to speed up hard disk access), reducing chances of data corruption even further.

Prepare the media that will contain the backup. This may be NFS mount or another hard disk. Just make sure it is accessible.

Actual backupBACK TO TOC

One of the issues with backups is that usually you need as much disk space for the backup as the size of the hard disk or a partition you’re backing up. Obviously, you can compress the backup, but normally you still first create backup and then compress it.

Luckily, there’s a way to backup and compress data on the fly, so that you need as much free space as a compressed backup will occupy. This is how you do it:

alex ~ -> dd if=/dev/sda | bzip2 -c > /media/sdb/sda.bz2

Lets try to understand how it works.

First we use dd to read data from /dev/sda. When you read something from a device file, Linux kernel returns you the content of the actual disk. Same thing happens if you read from device file that represents a partition. Only this time, Linux returns you the content of the partition.

Usually we run dd with at least two arguments: if specifies input file, and of specifies output file. It appears that if skipping one of them, dd will use standard output or standard input instead. Hence dd command we use here, sends contents of the device to standard output.

Output of dd command, being sent via pipe to bzip2 command. bzip2 is a compression program. When called with -c command line switch, it compresses whatever it reads from its standard input and sends the result to its standard output. This is why we send bzip2‘ output to /media/sdb/sda.bz2 file. This is the file that will contain the backup once the command finishes.

Note that the file should not be on the disk or partition that we backup. For that reason, I mounted additional hard disk to /media/sdb directory.

Restoring from backupBACK TO TOC

PreparationsBACK TO TOC

First of all, you will need a Live Linux CD. Knoppix CD or Ubuntu LiveCD should do the job. It is needed because you need a Linux to restore the backup, but you cannot use Linux installed on your hard drive because, well, you are going to erase it with a restored backup.

When restoring the Linux installation from backup, your life will be so much easier if you use same hard disk and partition. Even if it is a different hard disk, try replacing old one with the new one and not add it as additional disk. The reason for this is device files allocation.

When Linux kernel detects several hard disks in your computer, it allocate device files for them. Device files allocation are /dev/sda for first disk, /dev/sdb for second disk, etc. When you install Linux on one of them, it will reference itself by its device file. But here is the catch. Name of the device file that represent certain hard disk is position dependent. I.e. if you switch in places first and second disks in your computer, disk that has been represented by sda until now will be represented by sdb and disk that has been represented by sdb will be represented by sda. But, internally, Linux installation on any of them would still refer to itself by old name. As a result, you probably won’t be able to boot your system.

Bottom line is that you have to make sure that hard disk that you restore your Linux installation to, being represented by the same device file as before. I.e. if when you created the backup, your Linux installation was on /dev/sda, make sure that now, when you restore it, you are restoring to device that will be represented by /dev/sda when you boot your system.

RestoringBACK TO TOC

Boot from your Linux CD. Once there, mount device that has the backup. Figure out what device file will hold your Linux installation. Remember that if you add new disks, device files that represent your old disks may shift.

Now to the actual command.

Restoring entire hard diskBACK TO TOC

This is actually the simplest case.

First, we start with mounting the disk that contains the backup. In my case this is /dev/sdb.

knoppix@Microknoppix:~$ mount /media/sdb

Next we restore the data.

knoppix@Microknoppix:~$ bzip2 -c -d /media/sdb/sda.bz2 | dd of=/dev/sda

First of all, I’ve been using Knoppix CD to restore the system. First command mounts /dev/sdb, the disk that contains the backup. Second command is the one that acutually restores the Linux installation.

As with backup command, it doesn’t use extra disk space. It extracts the archive that contains the installation and writes it to the disk, all on the fly. You are already familiar with bzip2‘s -c command line switch. It tells bzip2 to use standard input or output. -d command line switch tells it to decompress the data. Because of -c, it redirect its output to dd that picks the data up and writes it to /dev/sda.

You can boot your restored system right after the above command finishes.

Restoring single partitionBACK TO TOC

This is a little trickier. Things are very easy when restoring entire hard disk. You don’t have to care about partition table and boot loader. Your backup contains everything you need for a happy and working Linux system. All you have to do is to write the data to the disk and you are done.

When restoring single partition, you have to take care of everything. You have to create partition table. You have to install boot loader. However, despite obvious drawbacks, backing up and then restoring single partition has some very nice advantages compared to entire disk backup.

First, backing up a single partition requires less disk space. Second you have an option to restore your installation to a larger partition. Lets an example session that demonstrates both data restoration, partition resizing and boot loader installation.

Once again, we start with mounting the disk that contains the backup.

knoppix@Microknoppix:~$ mount /media/sdb

Demonstrating how to create a partition table is slightly out of scope of this article, so lets assume that we already have partition table ready. We will restore the backup to /dev/sda1.

knoppix@Microknoppix:~$ bzip2 -c -d /media/sdb/sda1.bz2 | dd of=/dev/sda1

Now the data is where it should be. Lets see how we can resize the partition.

For the sake of this article, I’ve been experimenting with a small VMware based computer. So I use rather small disks. In this case, partition I’d backed up is 8GB long, but /dev/sda1 is 10GB long. So after we’ve restored the data on the partition, we have to resize the file-system on it, to utlize entire partition. Otherwise, it would think that it is still 8GB long.

knoppix@Microknoppix:~$ resize2fs /dev/sda1
resize2fs 1.41.3 (12-Oct-2008)
Please run 'e2fsck -f /dev/sda1' first.

knoppix@Microknoppix:~$ fsck -y -f /dev/sda1
.
.
.
knoppix@Microknoppix:~$ resize2fs /dev/sda1
resize2fs 1.41.3 (12-Oct-2008)
Resizing the filesystem on /dev/sda1 to 2409742 (4k) blocks.
The filesystem on /dev/sda1 is now 2409742 blocks long.

As you can see, resize2fs, the tool I’ve been using to resize the file-system has asked me to run e2fsck first. e2fsck is a tool that checks file-systems for errors and fixes them – it is similar to chkdsk tool on Windows. Its output was rather long, so I skipped it – errors it has found are a result of me doing backup of a mounted file-system. Luckily, since I stopped all processes that may access the disk, when I backed it up, no data was corrupted and errors found by e2fsck were superficial.

Once fsck was over, I could do resize2fs. Note that without any arguments resize2fs resizes a file-system to a maximum available size – this is exactly what we wanted.

There’s one thing left to do. That is to install GRUB. In theory, with certain GRUB configurations you may skip this step and try to boot your restored Linux installation right away. However, based on my experience, you better do this step and make sure everything works no matter what configuration you have. So, here’s what we do.

We start with mounting the newly restored file-system. However, to be able to mount the partition, we have to create a temporary mount point directory. So, this is what we do.

knoppix@Microknoppix:~$ mkdir /tmp/sda1
knoppix@Microknoppix:~$ sudo mount /dev/sda1 /tmp/sda1/

And now we can install GRUB. This is how we do it.

knoppix@Microknoppix:~$ sudo grub-install --root-directory=/tmp/sda1 /dev/sda

This command installs GRUB on /dev/sda. –root-directory command line switch tells grub-install to use kernel images and configuration from specified directory. We want to tell grub-install to use kernel images and configuration from the Linux installation that we’ve just restored. This is why I specified /tmp/sda1 as root directory.

Finally, we want to unmount the device and reboot. This is what we do:

knoppix@Microknoppix:~$ umount /tmp/sda1
knoppix@Microknoppix:~$ sudo reboot

This is it. Now, if everything goes well, your system should boot into restored Linux installation. Just remember to remove the Live CD from the CD-ROM, before you boot.

SSH crash course

Alexander Sandler — Tue, 17 Mar 2009 17:07:06 +0000

About this article

I would like to do two things in this article. First I would like to tell you about SSH. How to make it work. How to use public key cryptography to login to a remote computer. How to execute remote commands and copy files to/from a remote machine.

On the other hand, I would like this document to be a sort of reference guide document. For that reason, I provide a list of links to various places in the document, that show you how to do actual stuff without too much talking around.

You can read the document from beginning to the end or you can jump to a place that you need right away.

Here are the jump to links.

Jump to…

How to connect to remote host?
What is RSA/DSA host fingerprint?
How to handle changing RSA/DSA host fingerprint?
How to execute commands on remote computer?
How to copy files to/from remote computer?
How modern internet cryptography works
What are identity files?
How to generate identity files?
How to install identity files?
How to login to a computer without entering a password?
How to enable login as root via ssh?
How to enable X forwarding for single session?
How to enable X forwarding for all future sessions?
How to disable X forwarding?
Where to find more information?

IntroductionBACK TO TOC

Some years ago, when I realized that telnet is out and SSH is in, I was mostly confused about SSH. I heart that it allows you to login to a remote machine without username and password. Yet when I heart people talking about all those cryptography keys I was definite not to get into it. It seemed too complicated for me. So I used it same way as I used telnet before it – with username and password.

What I really needed is a document that describes all those nifty things that you can do with SSH, in simple words. Without getting into too many technical details, yet explaining enough out of it, to turn SSH into useful and handy tool instead of a hostile thing that only gurus know how to operate.

Years passed and as I myself learned how to use SSH, including some of its more advanced features, I decided to write such a document.

Although all information I present in this document is available on the internet, I think I managed to assemble here things that you need the most.

Part 1. BasicsBACK TO TOC

IntroBACK TO TOC

I believe all Linux distributions today come with command line SSH client, called ssh. We should distinguish between two things here. First, there is a SSH the protocol. But also, there is a ssh the program that speaks SSH protocol. ssh is part of larger OpenSSH suite. OpenSSH is what most of the people have.

Connecting to remote host – simple caseBACK TO TOC

Simplest case is when you want to connect to a remote computer using same username as you used to login to your current account. For example, lets say you are logged in on a computer named A and your username is alex. You want to connect to computer B. This is what you type in:

$ ssh B

Instead of hostname (B), you can use an IP address of course. Unless configured otherwise, ssh will ask you to provide a password for the user – alex in our case. Once you type in correct password and hit enter, you will find yourself logged into computer B as user alex.

There might be additional step in between. ssh may ask you to confirm authenticity of the remote host. Usually it happens first time you connect. ssh will present you something called RSA/DSA host fingerprint and will ask you whether you like it or not – yes or no. For now answer yes, but make sure to read about this later in this section of the article.

Let me assume you will not always use same username when connecting from one host to another (not to mention that it is unlikely that you will use username alex). To tell ssh what username to use when connecting to a remote computer, use @ notation. Like this:

$ ssh alex@B

or like this:

$ ssh john@192.168.1.1

RSA/DSA host fingerprintBACK TO TOC

One of the most important features of SSH protocol is security. This means obviously encrypting the data that passes between your computer and a remote computer, but not only. Another thing that embedded into SSH is taking care that you are connecting to right computer. To explain this, I have to introduce a villain.

Villain is a guy who tries to break into our computer or perhaps sniff our traffic attempting to steal some valuable information. He can try to replace parts of information that fly between two connected computers to tempt us to expose some valuable information. He can even try to replace a remote computer to make it look like a real thing, while we are feeding it with valuable information. In two words, villain is a bad guy and we try to protect ourselves with SSH.

As I said, villain can try to replace remote computer. To make sure that it didn’t happen, we want to know for sure what computer we’re connecting to. For this reason, during its installation, OpenSSH suite creates a signature of the computer. This signature called RSA or DSA host fingerprint.

RSA and DSA are two methods of encrypting data. SSH supports both of them and any of them can be used. Differences between two are less important. On my Ubuntu system, RSA is the default, but I guess it can be different. Fow now, what is important is to see what method we’re using. I’ll tell you a little later what for.

When you’re connecting to a remote computer for the first time, its host fingerprint being saved on your machine. Actually, first ssh asks you if the signature it received from a remote computer is right. You can rely on ssh to do things right, but you can actually check and make sure that the fingerprint is right. If you have an access to a remote computer some other than via SSH, you can read its fingerprint and compare it to value ssh has given to you.

To read host fingerprint use following command:

$ ssh-keygen -l -f /etc/ssh/ssh_host_rsa_key.pub
2048 96:72:48:4f:69:70:45:b2:39:3d:55:75:78:52:ce:a7 /etc/ssh/ssh_host_rsa_key.pub (RSA)

$ ssh-keygen -l -f /etc/ssh/ssh_host_dsa_key.pub
1024 17:bd:cd:fb:09:82:9b:70:36:3f:b5:a4:4e:f4:84:d9 /etc/ssh/ssh_host_dsa_key.pub (DSA)

First command returns RSA fingerprint. Second command returns DSA fingerprint. You can use these fingerprints to make sure that you’re connecting to a right computer.

What happens when host fingerprint changesBACK TO TOC

First lets try to understand why this may happen. First option is the villain case that we’ve already mentioned. In this case we should go to police, etc. But this is not the only case. Host fingerprint can change if, for instance, someone has reinstalled the operating system on the computer. Another option is when computer’s IP address has changed, but we still connect to the same IP address which is by now already taken by other computer. In any of these cases host fingerprint of the remote computer changes. Lets see what ssh will tell us when this happens.

$ ssh alex@192.168.1.1
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
96:72:48:4f:69:70:45:b2:39:3d:55:75:78:52:ce:a7.
Please contact your system administrator.
Add correct host key in /home/alex/.ssh/known_hosts to get rid of this message.
Offending key in /home/alex/.ssh/known_hosts:1
RSA host key for localhost has changed and you have requested strict checking.
Host key verification failed.

It is big and slightly overwhelming, but at least you won’t miss it

How to handle expected host fingerprint changeBACK TO TOC

Often you do something to a remote computer that may cause its host fingerprint to change. For instance, you could have reinstalled operating system on a remote computer. In this case it is totally normal that ssh throws that big and scary error message on us. Yet it won’t let us to connect to a remote system and we want to fix this.

ssh saves host fingerprints on a local disk, in a file named /home//.ssh/known_hosts. What we want to do is to delete certain host from the file and by that cause ssh to confirm host fingerprint with us again, as if we were connecting for the first time.

To do this, we will use following command:

$ ssh-keygen -R 192.168.1.1
/home/alex/.ssh/known_hosts updated.
Original contents retained as /home/alex/.ssh/known_hosts.old

Obviously, change 192.168.1.1 to either hostname or IP address or a hostname of remote computer you’re connecting to.

As you can see, the above command deletes localy saved remote host fingerprint from known_hosts file, but keeps a backup copy of the file (known_hosts.old) in case we need it. In case you want to restore known_hosts file, you can simply copy a backup file overwriting the original one.

Also, note that known_hosts file sits in your home directory. Meaning that if you log into different user account, ssh will start saving remote host fingerprints from scratch.

Executing command on a remote computerBACK TO TOC

It is as simple as connecting to a remote computer. All you have to do is to append a command you want to execute on a remote computer to ssh command that we’ve used to connect to the computer. For example:

$ ssh alex@192.168.1.1 ls

This will run ls command on computer 192.168.1.1, in alex‘s home directory.

This way you can run almost any command on a remote computer. But keep in mind one thing. You may want to put command that you want to run on a remote computer in single quotes – shell does not expand commands in single quotes. Take a look at the following example:

$ ssh alex@192.168.1.1 echo "Hello World" > file.txt

This command obviously writes something to a file named file.txt, but on what computer? In this particular case, shell will interpret > character and will write output of ssh command to a file named file.txt. But this is not we wanted. So to make sure that shell will not interfere us, we will put the command in single quotes. Like this:

$ ssh alex@192.168.1.1 'echo "Hello World" > file.txt'

In case you want to put a single quote in the actual command, this is how you can do it:

$ ssh alex@192.168.1.1 'echo '"'"'Hello World'"'"' > file.txt'

To make it easier for you to understand number of quotes let me split the command into several pieces to make it more readable.

'echo '
"'"
'Hello World'
"'"
' > file.txt'

Note that to put single quotes into final command, I first close single quote I’ve opened, then put single quote in between double quotes.

Securely copying files to and from a remote computerBACK TO TOC

scp is part of OpenSSH suite. It is the command used to copy files from/to a remote computer. In terms of security it works same way as ssh. I.e. it is host fingerprint aware and will ask you to confirm a fingerprint once you access some host for the first time.

On the other hand it works same way as cp. With scp like with regular cp you copy a file from one place to another. Also, as with cp you can copy several files into one location.

The syntax is the same as with cp. First you specify what to copy, then you specify where to copy. The difference is however that you can specify a remote host using special notation that I will show you a little later.

scp uses following notation to specify remote files and directories: [username@]:. I.e. first you type username followed by @, then you type hostname and finally you enter colon followed by file/directory name. As with ssh, username (@ included) part is optional and if omitted scp will use username you’ve logged in with. Let’s see few examples:

$ scp alex@alexandersandler.net:/home/alex/wav.wav .

Here I am copying file named sample.wav located in home directory of user alex on alexandersandler.net host, to my current directory. But wait a second, wouldn’t it be easier to do it this way:

$ scp alex@alexandersandler.net:~/wav.wav .

It appears that scp has absolutely no problem to understand ~ instead of user’s home directory, exactly like in shell does.

Copying multiple filesBACK TO TOC

$ scp alex@alexandersandler.net:~/works/project/* .

This will copy all files from directory /home/alex/works/project/ on alexandersandler.net to my current directory. Note that scp has no problem with wildcards. Note that as with cp, the wildcard is not recursive. I.e. it will copy all files from /home/alex/works/project/ directory, but will not copy its sub-directories.

Here’s another example.

$ scp ~/works/another_project/* alex@alexandersandler.net:~/tmp/

In this example, we copy all files from ~/works/another_project/ directory on our computer to alexandersandler.net. Again, the operation is not recursive. But what if we want it to be recursive? Here comes the first difference between cp and scp. With cp you use -R command line switch. With scp you use -r command line switch. Like this:

$ scp -r ~/works/another_project/* alex@alexandersandler.net:~/tmp/

Talking about command line switches, here’s another scp command line switch that I use a lot. -C tells scp to compress the data before sending it. Depending on content of the files you’re transferring, this can make scp much faster. Here’s an example that demonstrates how to use it.

$ scp -C -r ~/works/last_project/* john@alexandersandler.net:~/works/last_project/

Like I already mentioned, scp uses same security mechanism as ssh. All files transferred are encrypted and when using scp for the first time, like with ssh it will ask you to confirm a host fingerprint. Then it will ask for a password, exactly as ssh does.

Part 2. EncryptionBACK TO TOC

Login without entering password?BACK TO TOC

Oh yes. Actually its quite simple. To make this work, we will have to create and install so called identity file. You may also heart it being referred as public/private key or certificate. Don’t these terms to scare you. It is very simple. Keep reading.

How modern cryptography worksBACK TO TOC

What are keysBACK TO TOC

Computers use pair of keys to encrypt messages they sent between each other. Each key is a sequence of numbers. It can contain as many as thousands of numbers, or just a few of them.

These keys are tricky. Once you encrypt something with one key, you can only decrypt it with its pair key. I.e. if you encrypt a message with key A, you can only decrypt it with key B. There is no difference between the keys. I.e. if you encrypt with key A, you can decrypt with key B and if you encrypt with key B, you can decrypt with key A.

I won’t get into details of how keys get generated and how computers encrypt messages keys. There are several ways of doing this and it usually involve complex mathematical transformations. In case you’re still curious, remember those RSA and DSA things? These are algorythms that computer use to generate keys and encrypt messages using those keys. You can find more information about each one of them on wikipedia.

Public/private keyBACK TO TOC

Remember that we need two keys to encrypt/decrypt messages. In computers, usually one of them called private key and the other called public key. This is because computer sends public key to whoever asks for it, making it publicly available. Private key, on the other hand, kept in secret. But why computers send public keys away? Doesn’t it negate whole idea of secrecy?

How actual encryption worksBACK TO TOC

Remember a villain guy I mentioned? In addition to replacing a computer we’re connecting to, villain can do two additional things to steal information from us. First villain can try to decrypt messages we send to another computer. Second, villain may be unable to decrypt the messages, but he may try to intercept them and replace the data with his own. This can cause remote computer to expose valuable information.

Lets see how key cryptography prevents these two problems. Lets say we have two computers – computer A and computer B. Each one of them has a pair of keys – the public and the private key.

Secure session between them begins with key exchange. Each computer sends its public key to another computer. Then, to send private information from computer A to computer B, computer A does this:

Encrypts the message with his own private key.
Encrypts the result of step 1, with B’s public key.

2nd step makes sure that even if villain intercepts the message, he won’t be able to decrypt it because he doesn’t have B’s private key – message was encrypted with B’s public key, so to decrypt it we need B’s private key.

1st step makes sure that villain won’t be able to replace a message with his own. He can get B’s public key because, well, it is public, but it cannot encrypt the message with A’s private key because it is private to A. Computer B on the other hand has A’s public key, so it can decrypt messages encrypted with A’s private key.

This scheme is indeed amazing and it works for making the Internet secure. But lets get back to SSH.

SSH identitiesBACK TO TOC

In SSH, a pair of public/private keys called identity. Keys usually kept in two files. First file contains only the public key. Second file contains both public and private keys. In case you’re wondering why SSH developers have chosen to keep both private and public keys together, it is just more convenient this way. We send public key away so we better have it separate and ready for use. On the other hand, private key is a secret key. It is useless without public key. Then why not to put public key into private key file, to allow us to restore public key from private key file in future? And what if we want to copy private key? Handling one file is lot easier.

How to generate identity filesBACK TO TOC

OpenSSH suite comes with a program named ssh-keygen. It generates identity files. This is how you run it.

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/alex/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/alex/.ssh/id_rsa.
Your public key has been saved in /home/alex/.ssh/id_rsa.pub.
The key fingerprint is:
77:d1:f8:78:31:03:ec:d9:e8:55:58:9a:d7:b2:4d:ef alex@alexandersandler.net
The key's randomart image is:
+--[ RSA 2048]----+
|           ..  o.|
|            .+.oo|
|           .o+O.+|
|            ++oX.|
|        S ..o.+ o|
|         . ... . |
|                E|
|                 |
|                 |
+-----------------+

As with known_hosts file, identity files kept in /home//.ssh directory. Names of the files are id_rsa and id_rsa.pub for private key and public key files respectively. These files include RSA identity. For DSA, file names are id_dsa and id_dsa.pub. If you have slightly older version of OpenSSH suite, you may have files named identity and identity.pub. These are SSH protocol version 1 identity files.

Note that while we’re generating identity files, we are asked for a password. This allows us to protect the session to a remote computer with both identity file and a password. We’ll talk about this in a minute. Now, lets see how to use your identity files.

How to install identity fileBACK TO TOC

Before I explain how to do install identity files on remote computer, let me say few words about why to do it. Once you’ve identity file installed, ssh will no longer authenticate you with a password. Instead it will use your identity files. Now if you’ve supplied a password, when ssh-keygen asked you for it, ssh will still ask you for that password.

Using a password when generating identity files basically gives you an option to have stronger security then regular username/password pair. This way, you’re protecting yourself with both identity files (you’re the only guy who has them) and a password.

If you’ve specified blank password, it will let you in, without asking further questions. I.e. you can use identity files to login to a remote computer, without supplying a password and if you’re using same username then even without supplying a username. Just type in ssh and you’re in.

Note that when you’re installing identity files, you’re basically giving your public key to a remote host. Yet you keep your private key file secret and don’t give it to anyone. This is exactly the same process I have mentioned in How actual encryption works section of this article.

Now to the actual installation. Modern versions of OpenSSH come with nifty script called ssh-copy-id. It will copy identity file (your public key) from home directory on your current computer (/home//.ssh/id_rsa.pub), to a remote computer. Let’s see it in action.

$ ssh-copy-id alex@192.168.1.1

ssh-copy-id will ask you for a password as ssh would, but once you’ve entered the password, you will no longer have to do it again (unless your identity files protected with password). Note that ssh-copy-id has some requirements. When running it, you should already have identity files under your user account. Also, your identity files shouldn’t be installed on the host you’re connecting to, for user you’re connecting with. You can install your identity files under several different user accounts on remote computer.

There is something that may not be entierly clear to you just yet. I am talking about the correlation between identity files and user accounts on both local and remote computer. To understand it, we should learn how to install identity files manually.

Installing identity files manuallyBACK TO TOC

ssh-copy-id may not be installed on your system. Or it may produce an error message and we wouldn’t know where it came from. To address these issues we have to understand identity files installation process.

Actually its quiet simple. First of all you have to understand that identity files located in your user’s home directory. Meaning that even if you’ve installed alex‘s identity on some remote computer, you won’t be able to login to that computer when locally you’re logged in as john. This is because when ssh tries to authenticate you on a remote system, it looks for identity files (private key) in your home directory.

Second, ssh saves installed identity files (public keys) under /home//.ssh/authorized_keys file. Again this is a home directory of the user account you’re trying to log into and if you try logging into different account, ssh will ignore identity files you’ve installed originally.

Identity file installation involves one simple step – copying content id_rsa.pub (or id_dsa.pub) file on local computer into authorized_keys file on remote computer. Content of identity files is usually single line of text. You should copy it as such. If you add a line break symbol in the middle, ssh will not recognize this identity. Moreover, it may break other identities, so be very careful when modifying authorized_keys file. Luckily, if you make a mistake, you can always fix it later.

Lets have a look at a session that demonstrates entire process of generating identity files and installing them on remote computer, manually. I am logged into computer named alexandersandler.net as alex and trying to log into 192.168.1.1, again, as alex.

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/alex/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/alex/.ssh/id_rsa.
Your public key has been saved in /home/alex/.ssh/id_rsa.pub.
The key fingerprint is:
03:e3:7f:03:fa:e9:c6:01:85:12:f7:a4:38:36:19:cd alex@alexandersandler.net
The key's randomart image is:
+--[ RSA 2048]----+
|      .....      |
|      .+.o       |
|      +.E o      |
|     . * o .     |
|      B S .      |
|     . = * .     |
|      . + *      |
|       . o .     |
|                 |
+-----------------+
$ cat ~/.ssh/id_rsa.pub
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqgfxOyV0SiQrF+7qq9lOjPvJsacWagHo3LDnv
5n/ZWnZzvTXHk/gZNL2VoUqnaEuf4P/9apepvIVlLrwoUt6x2goGnErvchhn2Tf/MoHNHQ0px
10EYxYfcFfyRs1w/8i/uM1ySnnTv+fbjdKSFMJeqYKhsTeY06p2f7i+QpJVOMQ68ccaY10wj0
fP4wS6AR/6jXfCWeiOtRWZiZ1amf+w1HPIYxN5iLhDpcEK07eC/0GhBnqOcWgi9okHDxEY0nP
bKjsmnA7Lg4yBNCVbDIAx/zdMADTKtskH9gOrX+NJmLQSx4NEq802s6FP1YazaInhDQ9syQ2t
+HihmQPwCKETw== alex@alexandersandler.net

This is our public key, or identity. Lets copy the key to clipboard, connect to remote computer, 192.168.1.1 in our case, and install the identity.

$ ssh alex@192.168.1.1
The authenticity of host '192.168.1.1 (192.168.1.1)' can't be established.
RSA key fingerprint is 96:72:48:4f:69:70:45:b2:39:3d:55:75:78:52:ce:a7.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.1.1' (RSA) to the list of known hosts.
alex@192.168.1.1's password:
Linux 192.168.1.1 2.6.24.3 #1 SMP Thu Apr 10 11:20:13 EDT 2008 x86_64

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To access official Ubuntu documentation, please visit:
http://help.ubuntu.com/
You have new mail.
Last login: Mon Mar  9 10:16:50 2009 from alexandersandler.net

Note that this is the first time I am connecting to this computer, so I was asked to confirm host fingerprint and asked for a password. Well, the truth is that this is not the first time I am connecting to this computer. I just made things look like this is the first time

alex@localhost:~$ echo 'ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAqgfxOyV0SiQrF
+7qq9lOjPvJsacWagHo3LDnv5n/ZWnZzvTXHk/gZNL2VoUqnaEuf4P/9apepvIVlLrwoUt6x2
goGnErvchhn2Tf/MoHNHQ0px10EYxYfcFfyRs1w/8i/uM1ySnnTv+fbjdKSFMJeqYKhsTeY06
p2f7i+QpJVOMQ68ccaY10wj0fP4wS6AR/6jXfCWeiOtRWZiZ1amf+w1HPIYxN5iLhDpcEK07e
C/0GhBnqOcWgi9okHDxEY0nPbKjsmnA7Lg4yBNCVbDIAx/zdMADTKtskH9gOrX+NJmLQSx4NE
q802s6FP1YazaInhDQ9syQ2t+HihmQPwCKETw== alex@alexandersandler.net'
>> ~/.ssh/authorized_keys
alex@localhost:~$ exit

Now we have key installed and it is time to see it in action. Lets try connecting to the machine again.

$ ssh alex@192.168.1.1
Linux 192.168.1.1 2.6.24.3 #1 SMP Thu Apr 10 11:20:13 EDT 2008 x86_64

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

To access official Ubuntu documentation, please visit:
http://help.ubuntu.com/
You have new mail.
Last login: Mon Mar  9 10:16:50 2009 from alexandersandler.net
alex@localhost:~$

Note that ssh didn’t ask us to provide password. It works!

Part 3. Advanced SSHBACK TO TOC

X forwardingBACK TO TOC

IntroductionBACK TO TOC

You may already know this or may not, but X Windows server (the one that you use to have graphical user interface in Linux) allows you to present a UI for a program on a remote computer. Actually, this is the reason why it is called X server. X Windows server is a server because it gives programs a way to present their user interfaces. X Server serves programs. Actual programs on the other hand are X Windows clients. They use X Windows server to present themselves on screen.

You may be wondering what it is useful for and I won’t blame you. The truth is that it may become exceptionally handy in certain situations. Imagine yourself connecting to a certain computer using telnet or SSH. As you know, both telnet and SSH allow you to run textual shell. Sometimes it suffices, sometimes it doesn’t. What if text only shell is not enough?

This is when the server function of the X Windows server becomes handy. If you have a X Windows server on a computer that you directly work with, you can tell programs on your remote computer to use X Windows server on your local computer and present themselves on your local computer. I.e. you type in xterm inside of your telnet or SSH session, while connected to a remote computer, and xterm window appears right in front of you, on your local computer, despite the actual program runs on a remote computer miles away. Every command you type in the xterm being executed on a remote computer, but you work with it as if it was running on your computer.

Remote X Windows server configurationBACK TO TOC

You don’t want to allow just any program to present itself on your computer. If access to your X Windows server was completely open, someone could try to catch a moment when you run firefox and run firefox of his own, on his own computer, but on your screen. This could make you think that you’re working with program that runs locally, while it actually runs on a remote computer. When something like this happens, every piece of information you type into your browser is available to the villain. Including your browsing history and even your passwords.

Obviously access to X Windows server has to be closed by default and it is. Formerly, two programs controlled who can use your X Windows server: xauth and xhost. There are however, two problems with these two. First of all using them is inconvenient. To present remote application on your local X Windows server you had to run two commands with quiet complex syntax every session. You could make the configuration persistent, but then you had to do it yourself. It was automatically done for you.

Another problem is lack of security. X Windows server uses protocol named XDMCP. It is insecure. As a result someone who has access to your traffic could watch you browsing the internet.

How SSH fixes the situationBACK TO TOC

If you run ssh with -Y command line switch, ssh will automatically configure X Windows forwarding to your local X Windows server. This means that when you run ssh with -Y switch, every command or program that has UI, will present itself on your X Windows server computer, rather than on remote computer. No need to run xhost and xauth. ssh does this for you.

What if -Y doesn’t workBACK TO TOC

Both SSH client and SSH server has an option to enable/disable -Y command line switch. Moreover, by default OpenSSH ships with -Y disabled, meaning that you won’t be Hable to use this command line switch out of the box. Luckily, most of Linux distributors enable this option in OpenSSH configuration files. However, there is a slight chance that your Linux distribution has strict security settings and keeps X forwarding disabled.

If this is the case, we will have to enable X forwarding. To see how, read next section of this article.

ConfigurationBACK TO TOC

In this section, I would like to cover few of the most useful OpenSSH configuration options.

Allowing login as rootBACK TO TOC

Perhaps one of the first things that I needed with SSH is to allow me to login as root. Before you allow this, bare in mind that this is a bad practice. Working as root is generally a bad idea even if you are developing drivers for Linux – something that very root access.

To allow root access, we have to tell SSH server to accept such connections. To do this, we have to modify SSH server’s configuration file and tell sshd (that’s the name of SSH server program) to reread it’s configuration. SSH server configuration file is /etc/ssh/sshd_config. Note that to modify the file you will need root access.

Option that controls root access called PermitRootLogin. To allow root login, simply append:

PermitRootLogin yes

to the end of the file.

Once done we have to tell sshd to reload configuration. Strictly speaking this is distribution dependant, but the command you have to run most likely looks like this:

/etc/init.d/sshd reload

/etc/init.d/ssh reload

First variance works on OpenSuSE. Second works on Ubuntu.

X forwardingBACK TO TOC

We’ve seen how to enable X Windows forwarding with -Y command line switch. But how about doing this automatically? This is doable. Another thing that you may want to do is to disable X forwarding. Again, this is doable.

To enable X forwarding automatically, you have to modify ssh‘ configuration file /etc/ssh/ssh_config and appending ForwardX11Trusted directive. Like this:

ForwardX11Trusted yes

No need to reload anything. Once you save the file, ssh will automatically imply -Y on every new connection.

On the other hand, to prohibit clients from using -Y, we should remove X11Forwarding directive from /etc/ssh/sshd_config. OpenSSH disables this option by default, but many Linux distributors enable it. So, if X forwarding works for you, this means that there’s “X11Forwarding yes” line somewhere in /etc/ssh/sshd_config.

Other optionsBACK TO TOC

There are very nice manual pages for both /etc/ssh/ssh_config and /etc/ssh/sshd_config files named ssh_config and sshd_config respectively. Both manual pages include documentation for all ssh and sshd options that are available.

ConclusionBACK TO TOC

I hope you’ve found this article useful. In case you have further questions, don’t hesitate to send them to me. My email is alexander.sandler@gmail.com.

32bit vs 64bit computers, the QA

Alexander Sandler — Tue, 26 Aug 2008 10:05:00 +0000

IntroductionBACK TO TOC

Many wonder what is the real difference between 64-bit and 32-bit computers. Is paying a little extra for 64-bit support really worth it?

What 64-bit support mean?BACK TO TOC

This can mean many things actually. This article dedicated to recent (relatively of course) addition of the 64-bit support in AMD’s and Intel’s micro-processors. Processors supporting 64-bit calculations exists for many years. However these were industry class processors, very expensive and powerful. Couple of year ago, first AMD and then Intel began selling 64-bit processors designed with home user in mind. This historical change is the one I would like to review in this article.

What the difference between the 32-bit and 64-bit anyway?BACK TO TOC

Processors with 64-bit support are those that natively support operations with 64-bit long numbers – you need lots of bits to accommodate large numbers. You may be a little confused by the fact that built into both Linux and Windows calculators easily sum a pair of nearly every possible numbers, including a pair of very big numbers. The thing is that 32-bit CPUs only simulate calculations with large numbers.

In 32-bit computer, summing two 64-bit numbers takes 10 and even more times than summing two 32-bit numbers. On the other hand, in 64-bit computers summing two 64-bit numbers takes same period of time as summing two 32-bit numbers.

General purpose registersBACK TO TOC

Another, less obvious, difference between the two is that processors with 64-bit support has additional 8 general purpose registers – these are small pieces of memory that are built into processor itself and help it to do its job. Importance of the later addition often overlooked, however this is the change that has brought a truly significant performance boost to 64-bit processors over their 32-bit predecessors.

What additional registers good for?BACK TO TOC

Actually the CPU does most of its operations using registers. Registers work as fast as the CPU itself, while RAM is much slower. So in terms of performance, it is better to do as many calculations using registers as possible. The problem is however that having registers is very expensive.

Intel’s Itanim processors have 128 general purpose registers. Itaniums are an industry class processors and are very expensive.

Additional 8 registers is a significant addition that let the CPU to speed up some of its operations by a large margin.

So how good 64-bit really is?BACK TO TOC

With 64-bit extension you get 10% performance boost for free. This number can be different from system to system of course, but I think we can presume this is pretty much the average.

What about OS and software?BACK TO TOC

When you have some neat feature in your CPU, you need software uses it. This is why there are 32-bit operating systems and 64-bit operating systems. First knows nothing about 64-bit calculations and additional registers, while the later uses both of them all the time.

Same with regular software.

Does that mean that paying extra money for 64-bit Windows XP/Vista worth it?BACK TO TOC

It’s a bit complex. You cannot run 64-bit software on 32-bit operating system. So if you plan running anything that uses 64-bit, you will need 64-bit operating system.

Still for most of the users the answer would be negative. The problem is that most of the software for Windows is 32-bit. 64-bit versions are rarely available. You can run 32-bit software on 64-bit computer and 64-bit operating system, but buying 64-bit operating system is often useless because you may end up using 32-bit software all the time.

update: As Mr. rasmasyean kindly noted in the comments below, amount of physical RAM that you plan to have in your computer is another consideration that should be taken. 32-bit versions of both Windows XP and Windows Vista limit amount of supported physical RAM to 4GB. In case you need more RAM, you will need 64-bit version of Windows.

How about Linux?BACK TO TOC

Linux is completely different story. For most of the software there is a 64-bit version. Even if there is none, you can try to compile it yourself – it is not that complicated. And there is no difference between 64-bit and 32-bit operating system in terms of price. So go ahead, grab yourself a 64-bit Linux and enjoy its improved performance

Swap vs. no swap

Alexander Sandler — Thu, 21 Aug 2008 09:43:25 +0000

IntroductionBACK TO TOC

This short article deals with simple question. How exactly lack of swap partition affects Linux’s performance. What would happen if you turn the swap off?

The obviousBACK TO TOC

Memory leaksBACK TO TOC

The obvious price one would have to pay for turning the swap off is lack of so called virtual memory. The term virtual memory refers to several different things really. What I mean is that when a program leaks memory, the memory it has leaked will remain unused as long as the program still running.

On the contrary, when swap is on, operating system copies memory that is not in use to the swap partition, by that freeing actual memory. So when one of the programs has memory leaks, memory that being lost by the program finds its way to hard drive instead of continuing to take precious memory.

I never heart of a program that claims that it has no memory leaks. Every program looses memory. It is a matter of time until it will ran out of memory and then the program will crash. Swap is not an ultimate solution for this problem because swap space can run out too, yet it significantly prologues live of the individual processes and the overall system.

HibernationBACK TO TOC

It is being referred to as hibernation and perhaps few other ways. At the bottom line this is a method to fast turn on and off the computer.

Instead of turning off all services and programs and then initializing them again, operating system suspends execution of all processes, then writes all used memory to swap space and then turns off the computer.

When you turn your computer back on, instead of starting a brand new session from scratch, operating system restores memory from swap and resumes all programs that were running, without reinitializing them.

By the way, one of the reasons why recommended swap space size is twice the size of the RAM that is in the computer is to let the operating system to accommodate all your RAM in swap partition – to allow hibernation of course.

Obviously, when you don’t have swap partition, you cannot do hibernation anymore.

The less obviousBACK TO TOC

I/O cacheBACK TO TOC

Linux uses all available memory to do I/O cache – that is to save portions of the hard disk space and to serve it straight from the memory avoiding relatively long disk access. This is why for instance when you copy several large files from one place to another several times in a row, second copy takes significantly fewer time then the first. This is also the reason why, in Linux, it takes a second to write a megabyte to a floppy disk, a device known for its slowliness – the data being written to the memory and not to an actual disk.

Also, disks optimized to read and write large chunks of data. Average program on the contrary reads and writes small chunks of data. To speed up the process, Linux collects several small read and write requests into bigger thus increasing disk performance. This however requires memory.

What many people don’t realize is that Linux uses all available memory and always tries to free even more memory for caching. When it sees a piece of memory that some program did claim, but doesn’t use very often, it will copy it to swap partition and use the actual memory for caching. Once the program will try to do something with this memory piece, Linux will detect this and read the contents of the memory piece from swap partition back to the real RAM.

By the way, this is the reason why when you minimize a program and after a while restore it, it takes it a few seconds to become fully functional – Linux had moved pieces of its memory to swap partition and now this memory has to be restored. Reading data from disk takes more time than actual memory access, thus the delay you see when you restore the program. You can change how hard the operating system tries to free memory for cache by changing value in the /proc/sys/vm/swappiness. The value varies between 0 and 100. Value of 100 means that operating system will not wait much time to move memory to cache. Note that whatever swappiness value you have, Linux tries to keep your system stable and will move pieces of memory into cache if it thinks it is necessary.

The real issue here is that this mechanism should be very balanced. Linux is optimized to use all available memory for caching. When it doesn’t have enough memory, time it takes to access data on disks will rise! Since all programs leak memory, over the time Linux will have less and less memory for caching and disk access speed will drop.

Bottom lineBACK TO TOC

Bottom line is that, without swap:

Your system will be less stable.
Your system will not be able to hibernate.
Disk access speed in your system will be slower compared to a system that has swap partition. Moreover, disk access speed will drop in the course of time.

In case you have further questions, please email me to alexander.sandler@gmail.com.

Few problems that you may encounter when booting Linux

Alexander Sandler — Sat, 26 Jul 2008 10:08:30 +0000

Quick links
Introduction
Creating the backup
What to backup
Figuring out device file
Preparations
Actual backup
Restoring from backup
Preparations
Restoring
Restoring entire hard disk
Restoring single partition
Quick links
Introduction
Creating the backup
What to backup
Figuring out device file
Preparations
Actual backup
Restoring from backup
Preparations
Restoring
Restoring entire hard disk
Restoring single partition
Introduction
Part 1. Basics
Intro
Connecting to remote host – simple case
RSA/DSA host fingerprint
What happens when host fingerprint changes
How to handle expected host fingerprint change
Executing command on a remote computer
Securely copying files to and from a remote computer
Copying multiple files
Part 2. Encryption
Login without entering password?
How modern cryptography works
What are keys
Public/private key
How actual encryption works
SSH identities
How to generate identity files
How to install identity file
Installing identity files manually
Part 3. Advanced SSH
X forwarding
Introduction
Remote X Windows server configuration
How SSH fixes the situation
What if -Y doesn’t work
Configuration
Allowing login as root
X forwarding
Other options
Conclusion
Introduction
Part 1. Basics
Intro
Connecting to remote host – simple case
RSA/DSA host fingerprint
What happens when host fingerprint changes
How to handle expected host fingerprint change
Executing command on a remote computer
Securely copying files to and from a remote computer
Copying multiple files
Part 2. Encryption
Login without entering password?
How modern cryptography works
What are keys
Public/private key
How actual encryption works
SSH identities
How to generate identity files
How to install identity file
Installing identity files manually
Part 3. Advanced SSH
X forwarding
Introduction
Remote X Windows server configuration
How SSH fixes the situation
What if -Y doesn’t work
Configuration
Allowing login as root
X forwarding
Other options
Conclusion
Introduction
What 64-bit support mean?
What the difference between the 32-bit and 64-bit anyway?
General purpose registers
What additional registers good for?
So how good 64-bit really is?
What about OS and software?
Does that mean that paying extra money for 64-bit Windows XP/Vista worth it?
How about Linux?
Introduction
What 64-bit support mean?
What the difference between the 32-bit and 64-bit anyway?
General purpose registers
What additional registers good for?
So how good 64-bit really is?
What about OS and software?
Does that mean that paying extra money for 64-bit Windows XP/Vista worth it?
How about Linux?
Introduction
The obvious
Memory leaks
Hibernation
The less obvious
I/O cache
Bottom line
Introduction
The obvious
Memory leaks
Hibernation
The less obvious
I/O cache
Bottom line
Introduction
Understanding the kernel installation
Kernel binary
Kernel version number
Kernel modules
Linux boot process explained
Initial ram-disk
Mounting root file-system
init and services
init levels
Services
Booting Linux with broken init scripts
Conclusion
Introduction
Understanding the kernel installation
Kernel binary
Kernel version number
Kernel modules
Linux boot process explained
Initial ram-disk
Mounting root file-system
init and services
init levels
Services
Booting Linux with broken init scripts
Conclusion

IntroductionBACK TO TOC

I thought I’d write a few things that I do with kernel when administrating my Linux machines. There are plenty of “how to compile your kernel” guides out there and I won’t write another one. Instead I want to mention few things that often neglected in those guides.

Let me give you an example. One very common situation is when you are trying to boot with your newly compiled kernel and it won’t boot. Or something got wrong and your network card isn’t working anymore. Are you familiar with these problems? If so, this article is for you.

Understanding the kernel installationBACK TO TOC

Lets start with simple understanding of the structure of the Linux kernel. In this section we’re not going to dig into kernel source tree. Instead we’ll talk a bit about kernel in its binary form.

Kernel binaryBACK TO TOC

No matter what Linux distribution you have, Linux kernel installed on your machine can be divided into two parts. First comes kernel binary. It is a single file that usually resides in /boot/ directory. The name of the file is usually vmlinuz followed by version number. For instance, vmlinuz-2.6.18.2-34-default. Some distributions have symbolic link /boot/vmlinuz pointing to real kernel file (still somewhere in /boot/). This however is not a necessity.

When I am thinking about why kernel file is compressed, I can’t think of a reasonable explanation. Today disk space is so cheap that we can safely spare few megabytes for uncompressed version of the kernel. I guess its a backward compatibility thing. Anyway, face it. Kernel binary file is compressed.

In case you’re wondering if you can decompress it, the answer is no. The problem is that the compressed portion of the kernel prepended with some uncompressed data. The size of the uncompressed portion varies. I’ve seen few people managing to extract uncompressed kernel image out of compressed one, but its fairly complex process.

Instead, it is a common practice to place an uncompressed version of the kernel in /boot/ directory. Common practice is to place file named vmlinux followed by the kernel version name and set /boot/vmlinux symbolic link to point to the uncompressed kernel (usually residing in the same directory).

Kernel version numberBACK TO TOC

You can have several kernels on one system. Each kernel has a version number. It might be a little long – version numbers like 2.6.21-8-default are common. Modern kernels start with 2.6. You can find the later version of kernel at the moment on kernel.org web-site. 2.6 actually is a major version of the latest stable kernel. Kernel developers use odd numbers to indicate development versions of the kernel and even numbers to indicate stable versions. Hence, 2.6 is stable and 2.5 is a development version.

In case you’re wondering what version of the kernel you have, you can always check this with uname -r command. Like this:

# uname -r
2.6.21-8-default

Kernel modulesBACK TO TOC

Other part of kernel resides in many smaller files under /lib/modules/ directory. These are kernel modules.

Under /lib/modules/ there is a directory per kernel installed on your system. Each sub-directory named as kernel version. I.e if you’re running kernel 2.6.21-8-default, you’ll find directory named 2.6.21-8-default in /lib/modules/.

Every kernel directory under /lib/modules/ contains several files and directories. Few of them are important to remember. For instance, build is a symbolic pointing for directory where the kernel has been built (usually somewhere in /usr/src although it can be anywhere). kernel directory contains several sub-directories that contain actual modules (or directories that contain modules – anyway no module reside outside of kernel). Finally, it contains several files that start with modules. These files used by different utilities such as modprobe to locate right modules and resolve dependencies between them.

Linux boot process explainedBACK TO TOC

Now I would like to say few words about how Linux boots. The overall boot process involves several complicated steps, eventually leading to the kernel.

First, kernel being loaded into memory, decompressed and started. One of the first things kernel does when it starts is loading something called initrd. You may’ve met this name before. I would like to explain what it is.

Initial ram-diskBACK TO TOC

In case you didn’t know, you can create a small virtual hard drive in your RAM. Eventually it does not matter what medium sits beneath the file-system. As long as you can read and write it, you can create a file-system on it, thus you can mount it and work with it as if it was a real hard disk.

RAM disk is exactly this – virtual hard disk that resides in the memory. You can format it and you can mount it.

When you create one, you usually create it empty. However, ram-disks are not necessarily empty. You can create a ram-disk, write some data to it, save the ram-disk image into a file and then mount it again. This is exactly what Linux developers did with initial ram-disk.

Initial ram-disk or shorter initrd is a file-system in a file. When system boots kernel loads it from hard drive into RAM. In case you want to know more about initrd, read my article about internal structure of initrd. You can find it here.

Mounting root file-systemBACK TO TOC

This is where most of the fresh kernel compilations fail. Next thing kernel does after loading initrd and running the script is trying to mount the real root file-system. At this moment it should have a file-system driver, disk driver and proper device file configuration.

You need a file-system driver because kernel has to understand the structure of the data on the disk. File-systems usually implemented in one or several kernel modules – mostly one. One very common exception is ext3 file-system. It’s built of two modules, ext3.ko and jbd.ko.

Having right disk driver can be more complicated. Generic IDE controller support is now built into the kernel in vast majority of distributions. Yet SCSI controllers support may be a bit of a problem. With SCSI disks you need three modules:

scsi_mod.ko – This module contains generic SCSI support.
sd_mod.ko – This module contains SCSI disks driver.
Kernel module with driver for you SCSI controller.

Missing any of these will cause your system to fail to boot.

Finally, last thing that required for your system to boot are device files. Each represent a device, as you know. In particular, SCSI disks should have device files. Each partition on the device should have a device files and most important device files should point to right locations.

A common problem occurs when you introduce a new SCSI disk into your system. Deciding how to present a new SCSI disk to your system is up to SCSI controller and the controller driver. You may find yourself in a situation when before the change in RAID configuration your root file-system was on /dev/sda, but after the change it is on /dev/sdb. This often occurs when you have RAID with sparse LUNs and you fill in the gaps between SCSI LUNs.

Different distributions use different ways to create required device files. In some Linux distributions, mkinitrd itself creates all needed device files. In others initrd carries udev daemon that creates needed files on the go.

In any case, lack of appropriate device files can definitely be a good reason for the system to stop operating properly.

init and servicesBACK TO TOC

Next thing kernel does after mounting real root file-system is running a program named init. This program becomes first process that runs in your system. The executable file usually resides in /sbin/ directory. The way this program works differs from distribution to distribution. Actually this is one of the major differences between different distributions.

One thing that remains unchanged is the fact that there’s a manual page for init that tells how it works. On some systems it runs scripts located in /etc/init.d. On other systems it will run scripts from /etc/event.d directory. Yet it will always run some scripts.

init levelsBACK TO TOC

The scripts divided into several categories. Such division is merely a compatibility issue – older Unix system had them, so Linux has them too. Categories don’t have names. Instead they are numbered from 0 to 6. Each category responsible for certain stage of the session. By session I mean everything that happens from the moment machine boots, until the moment it’s being shut down. During this period, machine is always in one of the init levels, meaning it has executed scripts belonging to all categories from 0 up to the current one.

For instance, normally you work in either init level 3 or level 5. 3 stands for multi-user environment that supports networking – meaning that at level 3 system has ran all scripts needed to allow several users to login and do some networking stuff. Level 5 includes everything that is in level 3 and also support for X windows. Hence when you work in graphical environment, you are at level 5 and if you are working with console, you are in level 3. On the contrary, level 1 stands for single-user no-network environment. Level 6 is a reboot level, switched to when system shuts down.

Some levels are not in use. Different distributions use level 2 and 4 in different manner, but most of the time they do nothing meaningful.

ServicesBACK TO TOC

By the way, in nearly all Linux distributions, init scripts start so called services. Services are programs that run in the background and so some useful stuff, e.g SSH server usually runs as a service. Each service has a script that starts and stops it.

One of the things that may go wrong when booting your system is one of the service scripts (or script that runs service scripts). Perhaps you’ve changed something yourself. Or one of the system variables that affect one of the scripts can get wrong and cause the script to misbehave.

Booting Linux with broken init scriptsBACK TO TOC

Luckily, there’s a simple way to boot the system even if you don’t have a rescue disk or don’t have an option to stop boot process at level 1, before most of the init scripts are executed (in RedHat Linux descendants you can press I to stop boot process on level 1). You can pass init=/bin/bash argument to the kernel, when booting the system. This will cause the system to execute /bin/bash instead of init. If you do that, instead of regular boot process you will find youself in a shell. In this environment you probably won’t have PATH variable set properly so don’t be surprised when it cannot find vi – help it by providing full path to the executable you would like to run. You can use this limited shell to fix whatever got wrong with scripts that being involved in the boot process.

Passing arguments to kernel is easy. You can do this via GRUB boot menu. Usually you select configuration you would like to boot with from the list and then press ‘e’ key to edit it. Then you select line starting with kernel, press ‘e’ again to edit it and append init=/bin/bash to the end. Then you just press enter few times to boot the system.

ConclusionBACK TO TOC

There’s always a place for surprise of course. Also, there’re so many different types of Linux, so it is nearly impossible to describe how to boot a faulty system. Yet I think I’ve covered most of the important and common stuff. In case you have problems, suggestions or just things to say about this article, please fill free to leave a comment or email me at alexander.sandler@gmail.com.

sed – the missing manual

Alexander Sandler — Wed, 16 Jul 2008 06:28:54 +0000

IntroductionBACK TO TOC

sed is an exceptionally powerful and often overlooked tool. Its most common use is in scripts, where we want to replace part of the string matching certain pattern. While this is the most common use, it’s far from being its only use.

In this manual I’ll try to describe sed‘s most useful features.

Why sed rules?BACK TO TOC

As with most of the Unix command line utilities, sed reads data from the standard input and writes the result into the standard output. One thing that I like about sed is that it doesn’t have many command line switches. Actually, I often use only one of them. We’ll talk about it a little later in this manual.

sed is special. Special because it turns some common and quiet complex tasks into simple. For instance, what if you want to delete last line in a file? I can think of at least four ways to achieve this, but sed is the only way to do this with one command. And how about deleting lines 5 through 10? You can write bash script that does the job of course. Yet same effect can be achieved with single sed command.

And of course its most common use. That is to find and replace a pattern with some other pattern. It is absolutely irreplaceable. The later used so widely, so I guess I have no choice but to start with it.

sed simple case – search and replaceBACK TO TOC

Let’s say we have a file with numbers. Each number occupies a single line in the file. I’ll use this file as a sample input file in this article. Obviously, you’ll have your own input information and you will do your own stuff with it, but for the sake of the demonstration, let’s think our input file contains plain numbers. By the way, you can produce the file with

$ seq 20 > file

command. This will print numbers from 1 to 20, one in a line.

sed accepts commands as command line last argument. It is common practice to place sed‘s commands into single quotes, to tell the shell it should not try to interpret and manipulate sed‘s command.

sed commands designated with single characters, followed by optional argument. Let’s see a sample sed command. Since we’re talking about search and replace, this is the command we’ll see.

$ cat file | sed 's///[optional switches]'

The search and replace command is in single quotes. Both patterns are regular expressions, meaning that . will be interpreted as any character, etc. You may omit optional switches at the end. Let’s see some examples of sed‘s search and replace in action.

$ cat file | sed 's/1/x/'

This is perhaps the simplest case of search and replace. This particular command replaces all instances of 1 digit with x character. You’re right. Not very handy. Let’s try something more complicated and perhaps meaningful.

$ cat file | sed 's/1/x/'

This command replaces every instance of 1 character with x, affecting all numbers between 10 and 19 and 1. Well, this is not exactly true. This command changes 11 into x1, instead of changing it into xx. This is wrong, isn’t it?

The truth is that unless explicitly told to replace all occurrences of the search pattern with replace pattern, sed will replace only first occurance. I.e. 11 will turn into x1 instead of xx. This is where optional switches become handy. You can use g switch to tell sed to replace all occurrences of the search pattern in each line. This is how:

$ cat file | sed 's/1/x/g'

Unlike previous command, this one will cause 11 to be replaced with xx.

Now let’s see few search and replace commands that envolve regular expressions. sed supports common regular expression syntax out of the box.

Search and replace with regular expressionsBACK TO TOC

There are plenty of resources describing regular expressions on the web and I assume that you possess some degree of knowledge in regular expression syntax. Yet, here is a wonderful resource on regular expressions that I myself often use.

$ cat file | sed 's/^1./-/'

This command replaces every two digit number starting with 1, with dash character. I.e. it will replace numbers 10-19 with ‘-‘. The ‘^’ character that you see at the front of the search pattern indicates that following characters should be first characters in the line of text.

Here’s another, more complex example.

$ cat file | sed 's/^\(.\)$/1\1/'

This example is more interesting because it demonstrates one of the more powerful feature of regular expressions and that is grouping. The sample sed command above places 1 before all one character long numbers. I.e numbers 1-9 will turn to 11-19.

sed‘s other halfBACK TO TOC

sed accepts several different commands. We’ve seen only one of them, but the rest are no less useful than search and replace.

In general, sed commands consist of two parts. First part specifies what line of input to manipulate. Second part of the command specifies the actual manipulation. The command may be search and replace that we’ve already seen. Or it may be simply delete one of the input lines or print it twice. Oh and you can run several sed commands one after another for each input line.

It is true that search and replace is perhaps the most common sed command, but other commands combined with addressing features of sed, are no less useful.

Generic sed commandBACK TO TOC

Generic sed command looks like this:

sed '[address] [;[address] ]

Note that I used square brackets for address. That is because address is optional (common designation of optional argument) – when we did search and replace we didn’t specify any address.

You can specify several sed commands one after another, delimited with semicolon. sed will apply commands on every line of input, command by command, line by line.

The additional two sed commands that I would like to explain here are ‘d’ and ‘p’ – the delete and print commands. The truth is that I rarely use any other sed command. Combined with sed‘s powerful addressing, these two and perhaps search and replace are the commands that you will use the most.

sed addressesBACK TO TOC

The idea behind addressing in sed is to let you specify what lines of input you want to alter. This is a unique feature because no other command line utility allows you to modify very specific line of text in the input with such ease. Throughout this section of this article, I’ll use the ‘d’ command to demonstrate use of addresses. This command deletes the line. So if we tell sed to modify only the 10th line and then do ‘d’ command it will delete the line from the output.

Simple addressing, by line numberBACK TO TOC

You can specify a number of line that we want to alter. As you remember, each sed command consists of address followed by a command, delimited by a space character. Let’s see few examples.

$ cat file | sed '10 d'

This command will delete 10th line in the input file, meaning that we will see all numbers but number 10.

In case you want to specify last line of the input file, you can use $ sign as address specification. Following example will print numbers 1 through 19.

$ cat file | sed '$ d'

Complex addressingBACK TO TOC

You can specify a range of lines. To do this, you specify first line of the range and the last one with comma in between.

$ cat file | sed '1,9 d'

Will print lines 10 through 20. Next example on the other hand, will print lines 1 to 9.

$ cat file | sed '10,$ d'

Now here’s something neat. You can also tell sed that address lines appear starting line X every Y lines. To do this, you specify X~Y as an address. Here’s an example.

$ cat file | sed '1~2 d'

This will print the even numbers out of 1-20 range. On the contrary, following command will print only the odd numbers.

$ cat file | sed '0~2 d'

Finally, we can specify addresses using regular expressions. You can do this by specifying regular expression in between two slash characters.

$ cat file | sed '/^1./ d'

This command will delete all two character lines starting with 1. I.e. it will print numbers 1-9 and 20. Here’s another regular expression example.

$ cat file | sed '// d'

Putting it all togetherBACK TO TOC

Last command that I would like to demonstrate you is ‘p’ command. As you could have guessed, it prints input line as is. There’s one caveat in using it. sed will print the input line (or what may have left of it) anyway, so when telling it to print the line it will do it twice. It is easy to overcome this problem is using -n command line switch – remember I mentioned a command line switch that I use from time to time, this is it

Let’s see few examples of the ‘p’ command in action.

$ cat file | sed 'p'

As expected, this command will simply print every line of text twice. Let’s see something more complex.

$ cat file | sed -n '1,5 p'

In this command I used the addressing feature of sed. It will print all numbers from 1 to 5. Finally, let’s try to put it all together.

$ cat file | sed -n '/^.$/ d; /^2./ d; /^1./ s/1/2/; p'

What you see here is three sed commands in a row. First one deletes all lines of text one character long. This will delete all numbers from 1 to 9. Next, it deletes all two digit numbers that start with 2. This will delete number 20. Finally, it searches for all two digit numbers starting from 1 and replaces 1 with 2. Eventually, we use ‘p’ command to print the result. Last ‘p’ command needed only because we used -n command line switch. Eventually it will print numbers 20 to 29.

Here’s another a slightly complex example, that does however exactly the same – prints numbers 20 through 29.

$ cat file | sed '1,9 d; $ d; s/1/2/g;'

This concludes this introductionary article on sed. Hope you found it interesting. If you have further questions, drop me an email or leave a comment here.

Opening and modifying the initrd

Alexander Sandler — Sun, 01 Jun 2008 12:26:33 +0000

IntroductionBACK TO TOC

Ever wondered what’s inside of the initrd file? This article tells you how to look into the initrd and even modify it.

Few words about initrdBACK TO TOC

Linux uses the initrd or initial ram-disk during the boot process. Linux kernel is very modular as you know. While the kernel main file contains only the most needed stuff, rest of the kernel, drivers included, reside in separate files – the kernel modules.

It would be impossible to create a single kernel binary image that would suit all the hardware configurations out there. Instead, kernel supports the initrd. initrd is a virtual file-system that contains drivers (kernel modules) needed to boot the system. For instance, very often a SCSI controllers drivers reside inside of the initrd. Kernel needs a SCSI controller driver to boot the operating system, but it does not include it, nor it can read it from hard-disk (you’d need a driver for the hard-disk, right?). And this is when the initrd becomes very handy.

BIOS routines that read the actual kernel from the disk into RAM, do the same job with initrd. When Linux kernel boots, long before trying to mount the real root file-system, it loads initrd into memory and makes it a temporary root file-system.

See how handy this is. initrd itself requires no drivers whatsoever, because BIOS handles all the work of loading it into memory. On the other hand, it contains all the drivers Linux needs to boot. And you can easily rebuild it without changing the kernel.

After loading initrd into RAM, the kernel runs a script named init that resides in initrd‘s root directory. The script contains commands that would load all required kernel modules. And only after that Linux tries to mount the real root file-system.

Few words about historyBACK TO TOC

Content of the initrd file and its format has significantly changed over last couple of years. Something like four years ago, it was a common practice to create a real RAM-disk with a fixed size, format it with ext2 file-system and write some data to it.

To look into it, you had to open it up with gzip and then mount using loopback device (mount -o loop).

Today things are totally different. Kernel configuration option that configures the size of initrd has gone. It wasn’t really convenient because your system was limited to certain initrd size. Instead kernel adapts itself to initrd, no matter what is it’s size.

Back to the real thingBACK TO TOC

Like the kernel, initrd is compressed to save disk space. Unlike the kernel, it can be easily decompressed. The tool we’ll use to decompress it is, nothing fancy gzip. Same good old gzip that we use so often.

Now before we begin it is a good idea to create a directory where we’ll work. After all, internal structure of initrd is quiet complex and we don’t want to mix contents the initrd with contents of your, let’s say, home directory. So, do mkdir and cd to create our clean environment. We’ll call this directory A. To make things even cleaner, place initrd file into your newly created directory and an additional directory in it. This is directory B. In that directory we will have the contents of the initrd. Eventually, you should have a layout similar to this one.

Let’s start decompressing. Enter directory A and copy initrd that you would like to open into the directory. Then, rename it so that it would have .gz extension. The thing is that initrd is gzip compressed archive. Since gzip refuses do decompress something that doesn’t have .gz extension, we have to rename the file.

Next we have to decompress the file. gzip -d does the job for us. Next step is to open up the cpio archive. Yes, modern initrd is a cpio archive. We can do that with cpio -i < , but before we do that, we have to enter directory B specifying file name with double dots indicating file is in the parent directory – the A directory.

sasha@sasha-linux:~/A$ cp /boot/initrd.img-2.6.24-16-generic .
sasha@sasha-linux:~/A$ mv initrd.img-2.6.24-16-generic initrd.img-2.6.24-16-generi
c.gz
sasha@sasha-linux:~/A$ gzip -d initrd.img-2.6.24-16-generic.gz
sasha@sasha-linux:~/A$ ls
B/  initrd.img-2.6.24-16-generic
sasha@sasha-linux:~/A$ cd B/
sasha@sasha-linux:~/A/B$ cpio -i < ../initrd.img-2.6.24-16-generic
42155 blocks
sasha@sasha-linux:~/A/B$ ls -F
bin/  conf/  etc/  init*  lib/  modules/  sbin/  scripts/  usr/  var/
sasha@sasha-linux:~/A/B$

In this example you can see me opening default initial ram-disk image from my Ubuntu 8.04 installation. We can see that the initrd opened up into a nice directory tree that resembles your root directory structure. In the heart of the initrd structure is the init script that does most of the job of loading right modules when system boots.

The content of the init script is different from distribution to distribution. The main difference is in approach. In some distributions developers preferred to keep as many initializations as possible out of the initrd. In other distributions developers didn’t care that much about keeping initrd small and fast. In general both approaches has a place under the sun. First approach based on the fact that initrd is a limited environment, on the contrary to Linux when its fully loaded. Thus when Linux is fully loaded, you can do more complex stuff with less effort. Second approach on the other hand, sees in initrd an environment that works faster than “big” Linux, so it uses initrd‘s fastness to do some initializations.

Ubuntu’s initrd image based upon first approach. It uses a shell program named busybox – the shell environment originally designed for embedded systems and known for its small memory footprint and good performance. initrd in OpenSuSE 10.2 on the other hand uses bash shell – same shell as you use regularly. This is a clear example of the second approach.

Another interesting input to look at, is the fact that init script in Ubuntu 8.04 is ~200 lines long, while in OpenSuSE 10.2 it is ~1000 lines long.

Changing itBACK TO TOC

Once you have it opened up, you can see things inside of it and even make some modifications. As I already explained, structure of the initial ram-disk changes from distribution to distribution. However, all distributions share few common things. For instance, disregarding the distribution and a particular initrd format, lib/modules/ directory always contains kernel modules that initrd loads at boot time. You may swap one module with another without anyone even noticing.

Number of modules, their names, etc controlled via init script in distribution dependent form. Therefore, no matter what distribution of Linux you have, init script is the key to understanding how initrd works. Apprehend the init script, and you will have full control over your initrd, it’s contents and what it does.

Packing it backBACK TO TOC

Assuming you’re done playing around with initrd contents and you want to pack it back. Here is what you do.

First you have to pack cpio archive. Remember the B directory we’ve created. This is where it becomes handy. We want to keep contents of the initrd as clean as possible. The A-B separation allows us to keep the original initrd image out of the way when packing it back.

This is how we do that. First, we should enter the B directory. From there, run following command:

find | cpio -H newc -o > ../new_initrd_file

This will create a new initrd file named new_initrd_file inside of directory A.

Next enter directory A and pack the cpio archive with gzip. Here’s the command that should do the job.

gzip -9 new_initrd_file

This will pack the initrd in new_initrd_file into new_initrd_file.gz archive. Finally rename the file into whatever you want to call it. Remember that getting rid of .gz extension is a common practice, although not a necessity.

This is how complete session will look like on Ubuntu:

sasha@sasha-linux:~$ cd A/B/
sasha@sasha-linux:~/A/B$ find | cpio -H newc -o > ../new_initrd_image
42155 blocks
sasha@sasha-linux:~/A/B$ cd ../
sasha@sasha-linux:~/A$ gzip -9 new_initrd_image
sasha@sasha-linux:~/A$ ls
B  initrd.img-2.6.24-16-generic  new_initrd_image.gz
sasha@sasha-linux:~/A$ mv new_initrd_image.gz initrd.img-2.6.24-16-generic-modified
sasha@sasha-linux:~/A$ ls
B  initrd.img-2.6.24-16-generic  initrd.img-2.6.24-16-generic-modified
sasha@sasha-linux:~/A$

Booting with the new initrdBACK TO TOC

Changing initrd is always a risky business. When playing with matters of this kind, mistakes are common and it is important to stay on the safe side. Adding a new GRUB configuration is not such a big deal, but by all means do so when trying to boot the brewed five minutes ago initrd. You’ll save yourself lots of time reinstalling distributions and poking around with different rescue systems to make your system boot again.

Have fun!

System Administrator Articles – Alex on Linux

Python for bash replacement

MSI-X – the right way to spread interrupt load

Meet MSI-X

Why interrupt affinity with multiple cores is not such a good thing

Backup and restore your Linux installation

Table of contents

Quick linksBACK TO TOC

IntroductionBACK TO TOC

Creating the backupBACK TO TOC

What to backupBACK TO TOC

Figuring out device fileBACK TO TOC

PreparationsBACK TO TOC

Actual backupBACK TO TOC

Restoring from backupBACK TO TOC

PreparationsBACK TO TOC

RestoringBACK TO TOC

Restoring entire hard diskBACK TO TOC

Restoring single partitionBACK TO TOC

SSH crash course

About this article

Jump to…

Table of contents

IntroductionBACK TO TOC

Part 1. BasicsBACK TO TOC

IntroBACK TO TOC

Connecting to remote host – simple caseBACK TO TOC

RSA/DSA host fingerprintBACK TO TOC

What happens when host fingerprint changesBACK TO TOC

How to handle expected host fingerprint changeBACK TO TOC

Executing command on a remote computerBACK TO TOC

Securely copying files to and from a remote computerBACK TO TOC

Copying multiple filesBACK TO TOC

Part 2. EncryptionBACK TO TOC

Login without entering password?BACK TO TOC

How modern cryptography worksBACK TO TOC

What are keysBACK TO TOC

Public/private keyBACK TO TOC

How actual encryption worksBACK TO TOC

SSH identitiesBACK TO TOC

How to generate identity filesBACK TO TOC

How to install identity fileBACK TO TOC

Installing identity files manuallyBACK TO TOC

Part 3. Advanced SSHBACK TO TOC

X forwardingBACK TO TOC

IntroductionBACK TO TOC

Remote X Windows server configurationBACK TO TOC

How SSH fixes the situationBACK TO TOC

What if -Y doesn’t workBACK TO TOC

ConfigurationBACK TO TOC

Allowing login as rootBACK TO TOC

X forwardingBACK TO TOC

Other optionsBACK TO TOC

ConclusionBACK TO TOC

32bit vs 64bit computers, the QA

Table of contents

IntroductionBACK TO TOC

What 64-bit support mean?BACK TO TOC

What the difference between the 32-bit and 64-bit anyway?BACK TO TOC

General purpose registersBACK TO TOC

What additional registers good for?BACK TO TOC

So how good 64-bit really is?BACK TO TOC

What about OS and software?BACK TO TOC

Does that mean that paying extra money for 64-bit Windows XP/Vista worth it?BACK TO TOC

How about Linux?BACK TO TOC

Swap vs. no swap

Table of contents

IntroductionBACK TO TOC

The obviousBACK TO TOC

Memory leaksBACK TO TOC

HibernationBACK TO TOC

The less obviousBACK TO TOC

I/O cacheBACK TO TOC

Bottom lineBACK TO TOC

Few problems that you may encounter when booting Linux

Table of contents

IntroductionBACK TO TOC

Understanding the kernel installationBACK TO TOC

Kernel binaryBACK TO TOC

Kernel version numberBACK TO TOC