[Work  ] [Search tool       ]

Search tool

I had been recruited to be database admin in a large company. A department-wide IT system had been recently implemented - however many basic user needs had been overlooked. I was looking to add value for our internal teams. At go-live there was no easy way to determine the status of demands and payments - a self-service tool for anyone to see the current status would be a great project that would make a big impact.

There were a number of challenges I had to overcome to make the product workable:

Problem 1: A large dataset

The dataset of demands and payments was 500,000 rows and growing at 15,000 per month. This requires a scalable solution or it's going to be REALLY SLOW.

Problem 2: The data was spread across a number of databases

The search tool would need to span these and return results quickly. I needed to minimise referencing different databases and even using joins during retrieval. I would need to create a data-model for quick retrieval.

Problem 3: Users would expect to search by any number of columns

Users would need to search by any number of columns, both as an exact match and also with wildcards. I would need to experiment with indexing to get things optimised.

Problem 4: I didn't want to spend a long time on development

I didn't have the skills or time to make something from scratch. I would need to be creative in my use of user interface.

The search engine

It was clear I needed a data model designed for performance. I achieved this by creating a dedicated database with one table used to store the searchable data. A stored procedure would run daily and retrieve data from each source database and load the table. During the load the data would be transformed into the search results. Having the calculations take place during load makes each retrieval that much more efficient.

I experimented with a number of different index choices before settling on a Full-Text and Clustered Indexes. This allows for very flexible word and pattern matching searches. All searches were audited and this allowed me to benchmark each index with real-world data. A very interesting trick I employed was to sort as required (newest to oldest) prior to indexing. As the clustered index arranges data on the disk, sorting is made redundant - one less task when the result set is being prepared!

The front-end

I needed a way for the users to access the the data. Ideally this would've been a intranet-site that could communicate with the database server. However, that was too daunting with the skills I had at the time. Instead I opted for Excel: it might seem like a strange choice, but there are lots of advantages:

  • It interfaces very well with SQL Server
  • Cell changes can trigger a stored procedure (held on the server for security) to retrieve data
  • The list nature of Excel lends itself well to search results
  • Deploying the search tool is as simple as sending out a spreadsheet
  • No VBA, macros or logic was held on the client side - all processing took place on the server. This is important when it comes to managing the versions of the spreadsheet in use
  • I was able to use Windows authentication SSO credentials for seamless security

Implementation and results

I did the development work under the radar (the project hadn't been sanctioned by management) and I was very pleased with the feedback from users. More impressive are the numbers taken from the audit table:

  • 4 weeks from launch, 950 searches had been made
  • By the end of the first year 43,000 searches made
  • Two years from launch, 140,000+ searches made
  • 15 million lines of data returned to users
  • 225 users had made a search
  • Typical response time 0.25 seconds or less
  • Client version still on version 1.1.2 !

Not a bad result for work done on a hunch!