SUMMARY: a day in the life of a sysadmin.....
maccy at maccomms.co.uk
Mon Jan 21 16:02:44 EST 2002
Thanks to everyone who emailed, got some great responses!
Firstly, my thanks go to :-
sckhoo at tm.net.my, Jon, Scott McCool, Matthew P. Marino, Jim Winkle, Frank
Smith, Roy Culley, Ed Rolison, Brian Dunbar, Larye Parkins, Scott Buecker,
Craig Raskin, John Tan, Eric Bennett, Ric Anderson, Karl Vogel.
Thanks for your time guys. The aim of my request was to hear from others wrt
how you spend you spend your working days. Maybe there was something that I
could add to my daily routine, or benefit from carrying out on a regular
basis. some responses were pretty detailed, so I've put together some edited
highlights right here :-
http://www.usenix.org/sage/day/day.html (contributed by sckhoo at tm.net.my)
http://www.usenix.org/sage/sysadmins/ditl.pdf (contributed by Larye)
http://bofh.ntk.net/Bastard.html (Some light relief, this was funny!
Contributed by a few)
Jon: You cant be busy 100% or even 80% of the time performing chores on the
system, and you shouldnt have to (dont work for an employer who disagrees
with this) because when a system breaks you have to have the time needed
available in your schedule to fix the problem. For the most part my day is
spent researching information, experimenting with that information, or
writing scripts to automate things that I normally have to do or making
those scripts more robust.
Matthew: If it aint broke, don't fix it.
Make sure your backups are always current within 1 day. Also, make sure your
backups are working. Mock data losses are invaluable for finding out when
your backup strategies have holes.
If you are running any web related services and therefor have routes betwwen
your internal network and the internet, you should spend at least 20% of
your day on security related issues. No matter how good you think your
network is protected, you should always assume that things will slip. Look
for signs of intrusion and read all the advisories from cert.org that
you have time for. The important soft maintenance should be handled via cron
jobs, rotating logs, reporting quotas etc. I do a hard maintenance twice a
year. Shut down, vacuum, polish etc. CAUTION!!! on servers that run 24/7 for
months on end; Assume the server will not restart after shutting down. Make
sure your backups are very current. If it comes back up healthy treat it as
a pleasant surprise.
Jim (who had a nice checklist):-
+ Software: Plan for, acquire (or write), install, configure,
document, support, update, and patch (as needed) the following:
o the UNIX operating system
o all other user applications (about a dozen), and system
administration tools (another dozen)
o the web information server
+ Hardware: Plan hardware architecture, maintain and enhance
+ User: Provide user authorization and user assistance
+ Backups: Disk backups to tape and network backup
+ Security: Ensure a reasonable level of system security
+ Rates: Rate calculations and accounting
+ Marketing: Market this service
+ Other: Whatever else falls onto my plate related to the system
Frank : The "average day" varies greatly on whether you are the admin for a
server farm or a bunch of workstations, and whether your company is
proactive or reactive in nature. I would recommend you reading 'The Practice
of System and Network Administration' by Tom Limoncelli and Christine Hogan.
Roy : My work is interrupt driven. When a problem comes up it gets my
immediate attention if
possible. I do a lot of log checking as I am responsible for implementing
the company's security policy. There is never enough time to do everything.
I also read serveral newsgroups and subscribe to work related mailing lists.
Ed: Been trying to figure out GNOME since from Solaris 9 onwards, that's
going to be the 'basic' desktop environment.
Current projects that I am working on (in the last week or so) is setting up
a new DNS/NIS/DHCP server and migrating existing usage onto this new machine
as seamlessly as possible, and also performing a security check of a
'secure' network. (There's a network security certificate for some of our
internal confidential traffic). We're also in the process of deploying a
SAN - that's currently on hold for a 'high level' decision, but so far we've
had to size and spec our current disk usage, and our estimated growth for
the next 3 years. Again, migration plans for this have to be produced. I'm
also trying to source a compiler for an SGI machine - C++ on evaluation
pending actually buying it, and also looking at consolidating some servers
(we have quite a few boxes which have been set up 5 years ago, to just do 1
job, and are now virtually redundant). In a normal week, since I do second
line tech support, I'd expect a couple of helpdesk calls about things
(usually login scripts but sometimes printers, or simply things that haven't
been done). I also keep track of virus/vulnerability alerts from CERT,
Bugtraq and a bunch of other mailing lists.
What 'free' time I have is often spent on bigbrother (network monitor
http://www.bb4.com) development - we use this as a
network monitoring tool, because it's free, and really really good at what
it does. (And IMHO the mailing list is top notch).
Brian: A 'good' sys-admin would reply "I'll never have time .. " <grin>.
My typical day consists of doing a bunch of little things to keep big
disasters from befalling. Reading email to keep up on bugs/patches.
Checking logs and server stats to keep ahead of near disaster. I've
embarked on a project to upgrade our network with the leftovers from a
closed business office. When all is said and done, I'll have 'old' solaris
equipement, in place of the ancient equipment now running the network. No
checklist, but I should have one. My highest priority is to keep the
network functioning/up. I spend most of my time on maint.
Scott : I spend 2 hours a day doing backups and preparing backups for
off-site storage. I spend 2 hours a day reading/learning new
technologies/protocols/scripting languages. I spend the remaining 4 hours
of my day in meetings and doing things that are specific to my
business(aviation). I may spend a week at a time installing new systems
and racking equip. if needed.
Eric : Typically regular checks and such are not necessary, if you can
script you can automate most of these so you don't have to bother with the
tedium of going through them, it all depends really on the personal style
you've developed, over the years that I've been working I've developed a
dictum of creating a framework around the systems that I "own" and making
sure I'm aware of who uses them, the management personnel likely to be
notified if something goes wrong by the people who use them, the tasks that
they need to undertake, and write a bunch of scripts to look out for typical
things (too much memory, processor or disk usage, etc) as well as
application specific things, tablespace for oracle databases, logfile size
for webservers, logfile analysis for most any application, making sure the
processes necessary are always up and running, usually by a script that runs
every five minutes in cron (and a daemon that stays up permanently, checking
that cron doesn't die, and the aforementioned cron script checking that the
aforementioned daemon doesn't inadvertently fail).
I keep a centralised database of services and platforms, network
architecture and such, and a daily security check in the way of changed file
permissions or new creations and such in unauthorised or suspicious
locations, keep an eye on bugtraq for the latest potential vulnerabilities,
and if something looks suspect I can check with the SQL service database and
see if any of the systems I own are affected. It also comes down to how much
you've got to administer, if you've just got a small network 5-15 machines
then you might not want to bother with the SQL database of services and
versions, however in my case with upward of 200 machines in remote locations
and uptimes long enough for me to forget where they're located much less
where their power switches are, records are essential.
This leaves a lot of free time in the day to look into ways that you can
expand the service that you're providing to the users of your service, or a
lot of free time in the day to research new technologies that may assist in
your companies work or given things, a junior really helps as they can be
used to make intelligent diagnosis of logfiles from syslogd and various
backup agents much more finely than you could program a script to do it, I
don't have a junior in my current position so I've had to write some tight
scripts that check specifically for successful backups and still I do check
them manually maybe once a week or month, backups are something that you
can't afford to lose just because a regexp was falsely returning true for
the past few weeks or so.
Ric : * Read email containing interesting log events and system
status generated by summary program run from cron.
* Investigate mail from COPS and other monitoring programs.
* On Monday, check output from weekend Level 1 backups in
* On the first of month, check level 0 backup logs in detail.
* Ongoing "stuff" and answering questions from co-workers.
Karl: I have one general checklist on a large legal pad; stuff gets
top-to-bottom, and deleted as it gets done. The only things that are
time-critical are things like planned reboots (after working hours) and
hardware maintenance (rare).
Security and keeping the systems up is priority. This generally means
mail; the systems will send mail to a program that pops a message window
up on my screen if something really barfs.
Since we don't wait until the last minute to replace aging systems, our
upgrades are low-pressure migrations to newer systems, and most of the
time the users don't notice.
sunmanagers mailing list
sunmanagers at sunmanagers.org
More information about the summaries