Computers, Science, Technology, Xen Virtualization, Hosting, Photography, The Internet, Geekdom And More

So You Want To Be A Linux System Administrator?

Posted on | November 1, 2010 | 1 Comment

I was digging through Server Fault today looking for some thoughts on CARP (think HSRP, just free) and came across an honest and interesting question. The author wanted to know what skills and experience would make him more desirable in the world of UNIX like operating systems. I began to answer his question and found myself writing an entire anthology as an answer, so I’m bringing it to my blog instead. A bit of background for those who don’t know me, I’m someone who hires programmers and system administrators. My business card says CTO and I’m not ‘part owner of a start-up’ (anyone who doesn’t get that joke wouldn’t appreciate it anyway), so I consider myself to be somewhat authoritative on the topic.

The first thing you need to do is stop thinking and saying the word ‘Linux’. I’m not going off on an idealistic tangent, I’m advising you to realize that you are working with a UNIX like operating system.  Linux is an awesome kernel, your experience is rooted (no pun intended) mostly in user space. For now, get the word out of your head completely. Go meditate if you think it will help, I’ll wait. When you start thinking “UNIX that doesn’t suck”, continue reading. In a nutshell, if I’m going to hire you, it’s because I know you can figure out a solution to practically any problem that you might face. If I put you in front of an operating system that you have never seen, much less used before and ask you to do something with it, I’d like to be confident that you’ll be able to do it. I’d feel even better if I knew that you would (perhaps after a small learning curve) do it as efficiently as possible. This means, if I hire you, it is because you demonstrated, in front of me, a very good working knowledge of what every UNIX like operating system has in common – that insanity we call the ‘UNIX’ way of doing things. Notice I did not say the ‘Foo OS’ way of doing things, or the ‘Bar OS’ way of doing things, I’m talking about having your head wrapped around the way that any Linux / UNIX / Minix / HURD / L4 (leaving others out for brevity) actually works. This leads me to my list:

1 – Don’t get married to a single distribution

I’m planning to teach a three month (somewhat intensive) class on UNIX like operating systems starting next year. My students won’t start out with Red Hat, or Debian, Or Slackware .. or anything else that you have ever heard of. They will start out with something using the Linux kernel accompanied by the basic core utilities, with a compiler and tool chain. Sure, yeah, cron and sysv init, a few editors and other (bare) creature comforts will be there, but not much else. Why would I do this to people? It is the only way to give people an understanding of what everything else abstracts into oblivion. Sure, I could just toss Ubuntu at them and count how many people marvel at the fact that Facebook works on Linux, too – but what would they do when they came across FreeBSD?

I see so many people going after any certificate that ends in “certified engineer” and consider that I could purchase a very nice home on what 300 applicants spent on a single certification alone. I have fired more incompetent people who have such a certificate than competent people that I’ve hired. As a hiring manager, certs are all but worthless to me. They show me only that you invested (at least) some money in your career. I want to see evidence of time spent on your career, even if you are just getting started. If I decide to switch operating systems tomorrow, are you still of any use to me?

Learn how pipes work. Learn how files work, and can work in pipes. Learn how programs tie that all together. Learn the characteristics of a good and bad process. Learn how the kernel manages memory. Study how POSIX brought an end to the UNIX wars, and learn what you can demand from any (nearly) POSIX compliant system. If you know that, you can quickly become comfortable no matter what UNIX like OS you might be using.

2 – System administrators are programmers, too.

If you spend some time coming up with a clever way to make your shell do your work for you, congratulations, you just programmed a computer. Did you just use Perl to grab some of the contents of one file, combine it with some contents of another and dump it into some other file? You just wrote a program. Sure, you did not just write ‘sed’, but you told a computer to do something that it didn’t do before you started. If you are going to work on an operating system that was built by programmers for programmers, you had better be comfortable with thinking like a programmer.

Every single tool that you will use is a work in progress. It has bugs and lots of them. If you are fortunate enough to watch one of those bugs completely ruin your day, you had better be prepared to write a useful, succinct and accurate bug report. Even if you are using an ‘enterprise’ operating system, the people who wrote the stuff that you are using are most likely not paid by anyone. Given the popularity and data center penetration of operating systems like CentOS, the people who will receive your bug report are most likely not paid, either. You need to be able to do at least the following:

  1. Eliminate yourself as the problem (if you don’t, trust me, they’ll do that for you)
  2. Articulate the problem
  3. Be prepared to test a fix, if indeed it was really a bug to begin with

This means you might need to manually build and upgrade something after checking out a beta version from source control, apply a patch, run the program through a debugger and lots of other things that you might think only programmers do. The amount of help that you’ll receive depends entirely on how much you help yourself at the beginning. A web server crashing inexplicably might be due to a memory leak. If you can spot that, you’ll get what you need (a server that does not crash) much faster than you would otherwise.

Additionally, all distributions are not created equally. You will, indeed encounter a situation when you need to install something that your OS of choice simply does not package. When that happens, you’ll need to build it yourself. While many stable programs offer some kind of configuration script that tells you what is missing, it is still your job to figure out that you need libfoo in order for that program to compile and run. Sometimes, you might also need to build and install libfoo yourself. You need to be able to do that, at least in the real world :)

3 – Understand the fundamentals of networking

You spent half the day installing foo-os, you have the soft raid tweaked, you have the kernel tweaked and bam, you have no networking. You need to be able to realize that the network admin buggered up your VLAN and the OS picked a gateway that is off by one from the netmask (even though the OS is probably, in most cases, correct). Or, you can waste a few hours, sometimes days realizing that the problem is not on your end.

Your job is to make access to the computers that you maintain as convenient and secure as possible, to whomever is using them as intended. Often times, this means the general public at large. This is going to mean proxies, VPNs, firewalls, bridges, virtual switches and a plethora of other things that will intimidate you until you have a working understanding of how TCP/IP actually works. No, you don’t need to be a BGP hero, but you need to be able to learn quickly. At the minimum, have a grasp on classless internet domain routing,  how to read a packet dump, traceroute, the principles of common protocols (HTTP/POP/IMAP/SMTP/SIP/etc) and be able to say “this doesn’t look right …”

As a system administrator, be ready to catch the blame from clients, vendors, bosses, co-workers and everyone else. If there is one area where you need to be able to say “the problem is not on my end” authoritatively, it will be with strange network related issues.

This is one area where I actually do recommend some of the classes that go into obtaining a certification, as most do place a strong emphasis on this. You might be called upon to build a router or load balancer, set up a cluster virtual IP and many other things. For brevity, I can only recommend reading up (in length) on how packets actually get from point A to point B.

If a perspective employer says something like “IPX” or “Novell Virtual Terminal” or anything that sounds like token ring, run like hell. Focusing on TCP/IP is most decidedly not a bad idea.

4 – Security is seldom convenient

Following up on #3, you need to understand that there are lots of people eager to capitalize on any mistake that you will (not may, will) make. One of the hardest challenges that you’ll face in your life is differentiating benign but clueless behavior from malice and acting accordingly. Unless configured otherwise, most UNIX like operating systems trust their users and in the last decade this behavior has become problematic, to say the least.

You may land a job running a server farm that powers a new web app written by people who learned how to program by reading blogs and forums. Just like networking, when the worst happens, it will always (at least initially) be blamed on you. In reality (for the most part):

XKCD: Exploits Of A Mom

Be ready to be embarrassed, at least a few times. But make sure, whatever you do, that you don’t interfere with sales! Never, ever count on a programmer to adhere to best practices. Plan for the worst and minimize potential damage ahead of time. I am a programmer and I’m telling you that.

5 – Be stingy with access

If you receive a request to place a new public key on a server, unless the use is obvious, your very first question should be ‘why??’.  The same thing goes for logins to anything else that you are held accountable for securing. In this case you have a few choices:

  • Piss everyone off
  • Use LDAP/AD with very granular roles

I highly recommend the second, you need to be able to pull access quickly. When I was a full time server monkey, 85% of all access requests that I received were in an attempt to short circuit a company policy so that someone could get their work done without having to call up the appropriate channels (and reviews) that go into deploying production code. That’s right, the annoying sales guy to your left that can’t stop talking 10 decibels higher than everyone else while on the phone is going to attempt to manipulate you to go against company policy. Why? S/He is probably too stupid to understand why protocols are in place, in the first place. “Hey, can you punch a hole through the firewall so my i-bad works at Starbucks?? I’ll take care of you next week, I promise!”. That type of thing is common, you need to be a brick wall.

Carry a small digital recorder / player with you that repeats the following quote from 2001: “I’m sorry Dave, I’m afraid I can’t do that” and play it relentlessly.  If your boss asks you a similar request, CC the CTO in your reply with a resounding “NO!”. If the CTO doesn’t realize your plight, run like hell.

6 – Never stop learning / don’t succumb to vendor lock in

Your quest to master the machines does not stop when you earn a certificate, a new title or even an award from your peers. Every program you work with is subject to change, and you had better keep up with those changes while also looking for anything that makes your job easier than it was yesterday. There is no such thing as the ‘perfect’ network just like there is no such thing as someone who knows everything. While both are conceivable, neither last for more than a day.

Whatever you are doing right now, today, you could probably be doing better tomorrow. Never, ever forget that and never become a slave to a vendor. If what a vendor provides doesn’t give you what you need, your job is to make something that does and kick that vendor out as soon as possible. If a vendor won’t implement a feature you want, you job is to find one who will and realize that they are disposable.

I worked once, for a company that was paying thousands of dollars per quarter to a vendor that supplied software that could only be useful on the Event Horizon. The reason? The vendor was a friend of the son of the senior sales guy. I’m not kidding. The senior sales guy went to college with the CEO. If you are held accountable for the performance or security of something, you must be within your rights to demand that it be replaced within a reasonable amount of time and cost. If that is not the case, run like hell.

Incidentally, did some new OS just creep up that is funded by millions? You had better get to know it.

I’m digressing quite a bit here, so let’s wrap it up:

  • I don’t, as a hiring manager owe you a thing due to whatever certification you may have. In fact, I’m free to ignore it because I’ve found that they are very easy to obtain. Why? The last person who had you job had the same thing and was completely incompetent.
  • How you demonstrate your knowledge in an interview means everything to me. Well rounded means hire, very well rounded means hire with more pay
  • Your personality is very important. I need to know that you know how to say no, and how to scream “FIRE” when you see problems now or ahead. This is proof that I can stop doing your job and focus only on mine.
  • If you can’t cite a dozen blogs / QA sites / forums that you visit on a regular basis, you aren’t serious about your career. How the hell will you know when new CVE’s need to be addressed? How do you possibly keep up on new stuff? This goes back to making sure I don’t need to do your job, too.
  • If you can’t demonstrate a track record of working well with others, why should I test you on my company?
  • You don’t need to be a router guru, but if you can’t understand basic networking your path of learning has failed you (and won’t fail us).
  • You need to have a healthy ego, but be able to take criticism. You’ll get it during the interview and it will never, ever stop. If I have failed to find a single issue in what you did last month it means that I’ve failed to promote you or failed to look hard enough. We’re both geeks and hackers, but you need to understand the relationship of my job and your job.

On another note, I’ll advertise the class location once it is finalized. I’m currently talking to a few institutions in Makati, Mandaluyong and Singapore regarding hosting it.


One Response to “So You Want To Be A Linux System Administrator?”

  1. Israel Martinez
    February 5th, 2014 @ 3:09 am

    Your blog is very informal and enjoyed reading every little piece of it. However, I am not that new to the Unix world, but I have quite a bit of knowledge and no on the job experience. You stated that certifications are worthless, but how can someone like me be able to be even considered for a job if I have nothing to show for it or not much experience. I’m taking Liinux classes, and a red hat system admin I and II so I can have some kind of chance to land a system admin job. What would you do in my position if you were eager to land a sys admin job?

Leave a Reply

  • Monkey Plus Typewriter
  • Stack Overflow

  • Me According To Ohloh

  • Meta