Employing bash scripting to solve Windows problems
September 19th, 2006 by Jake Scaltreto
When I first came on board with the company I'm with now, my duties were... shall we say, limited. One of my duties was to name and enable all the Ethernet ports in a new building and check them for connectivity. As the facility was large, this meant literally thousands of ports. Furthermore, each time I did a port I also had to create a problem ticket to track the work I did. I got very good at entering tickets as you can imagine. Still, even at my best, I was only able to do do about one ticket a minute. So I did a bit of out of the box thinking and wrote a script that given a list of ports would automatically write my tickets at a rate of about one every twelve seconds. Mind you, it was a crude VBScript that interacted with the ticket software by simulating keystrokes, it was still a leap forward in ticket writing for its time. I couldn't have anticipated the ramifications of writing that script at the time. For one, the spike in tickets set off a management storm like you wouldn't imagine - I was praised as the ticket-writing boy wonder and to this day I still hold the record for most tickets created in a one month period. Secondly, and more importantly, it established me as the resident "script guy".
For the next two years I spent a good deal of my time writing scripts - mostly as side projects on top of my regular duties. Everything from VBScript to Perl to Bash to Batch - I was your guy if you needed something scripted. I would have to say that working on these little programming projects has been the main highlight of my employment as it appeals to my problem solving nature.
Recently, as in last week, I was tasked with coming up with a script that would scan a number of folders on a drive and determine which of them were unused. Sounds simple enough, right? I'm sure there are Windows utilities designed for just that. However, they wanted a completely hands-off utility that would generate a list on command. It didn't help that the drive in question has some folders with data in the terabytes. Yesterday I set out to tackle the problem.
I immediately decided to use Bash as my script language for this project. Now the drive needing to be scanned was a network drive on a Windows machine. Not wanting to fiddle with samba, etc. I fired up cygwin as my operating environment. The task was to find out what the most unused directories are. Initial instincts would drive me towards using a simple "du -s */" command to scan all the directories. However, this proved to be grossly inefficient when scanning those large directory trees - which we weren't concerned with in the first place.
I was taken back to a lesson I learned years ago when writing the AI for Othello2001 (A VisualBasic reversi game I wrote years back). The problem I had then was that if I scanned every possible move on the board to determine if it was a good move or not it would end up taking an incredibly long time - especially when I had it checking 2, 3 or 4 turns in. To solve this, I made it so that the computer would do an initial scan of all the moves discarding those that were considered bad moves - no sense wasting time with moves that would ensure certain destruction. It would then scan each of the remaining moves one turn in - discarding the bad ones yet again. I could recursively scan and discard moves many turns in to come up with a move that was (hopefully) very good. This is oversimplified, but the point is there. In the end, I was able to scan 4 or 5 turns deep and come up with the best move within an acceptable margin of error - and in a tiny fraction of the time it would have taken to scan all of the possible moves to to five turns.
So I applied a similar technique to my folder scanning project. First, I had to discard the folders that were obviously being used. The quickest way to do this was to count up the number of subfolders it had - a folder containing 10 subfolders or more is clearly being used by someone for something - so discard this. This increased the speed of scanning by quite a bit - but it still would have taken several days continuously running to scan all the folders because of instances where a folder did not contain many subfolders, but still contained a lot of data in the few subfolders it had.
To tackle this issue, I came up with the idea to insert a timeout into the "du" scan. Essentially, if a folder took longer than two seconds to scan, it obviously wasn't empty. These two methods eliminated the scanning of large folders. I was then left to only scan the "good candidates" for unused folders.
Using this method, the full scan now takes around an hour to complete and does a very good job of identifying unused folders. I'm confident that the script would have been much more difficult or even impossible to write in Batch or VBScript. It's interesting to see how Linux tools can be employed to solve such a vast array of problems.
- 1 Comment »
- Posted in Linux