Pixata Custom Controls
For Lightswitch

Recent Posts

Popular tags (# posts in brackets)

Anonymous types (3) ASP.NET (5) C# (3) C# tricks and tips (2) Computers (6) Design patterns (3) DomainDataSource (3) Dynamic data (4) Entity framework (3) Entity model framework (5) F# (3) LightSwitch (12) Linq (6) Microsoft (2) MVP (2) MVVM (2) Project Euler (2) RIA services (5) Silverlight (2) SQL Server (4) Unit testing (4) Visual Studio (7) WCF (3) WPF (3)

Gratuitous link to StackExchange

Archives


Categories


Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

Actually, as I'm self-employed, I guess that means that any views I expressed here aren't my own. That's confusing!


Acknowledgments

Theme modified from one by Tom Watts
C#/F# code styling by Manoli (for posts pre-2016) and Google code prettify (for post from Jan 2016 and beyond)


My rambling thoughts on exploring the .NET framework and related technologies

What should have been a relatively easy job turned into a very complex one. How I did battle with SQL Server and (mostly) won.

By the way, in case you're wondering about the title (and the subheading below if you've looked further on), then you obviously haven't read The Hunting Of The Snark by Lewis Carroll. If so, I strongly recommend you do, partly because it's a wonderful piece of sophisticated nonsense, and mainly because it's far more entertaining than this article! It was subtitled "An Agony In Eight Fits" with each section being described as a Fit. It seemed appropriate for what happened here...

For the Snark/Carroll aficionados, it's worth noting that most of the illustrations on this page are of the classic Henry Holiday edition of the Hunting Of The Snark. There were plenty more, but his illustrations best fit with Carroll's surreal poem.

Fit The First - Background

The project involved has a set of database tables that store information about support tickets raised by users. The desktop application that the support engineers use already had some searching facilities, but these were limited to things like open/close date, priority, status, etc.

I had a request to allow the user to enter some keywords, and we would show all support tickets that used the keywords in any of a number of fields. As SQL Server provides a full-text search facility, this seemed ideal for the task. Little did I know how complex it would be! I'm documenting it here, in case anyone needs to do something like this again. Hopefully this will save them a few hours of pain.

Fit The Second - The database structure

In order to avoid mucking around with the database that was under source control, I created a test database, and created some tables that included only the relevant fields...

The main table is SupportTickets, which contains two fields to be indexed. Each ticket can have any number of SupportTicketNotes, and for various boring historical reasons, each of those has an associated Actions entry which contains the Details field that is to be included in the search. In addition, a ticket may have a ProblemsAndSolutions entry linked, in which case the two fields shown there are also to be included.

OK, all ready, I thought this would be a doddle. Ha, should have known better!

Fit The Third - Setting up the full-text catalogue

Creating full-text catalogues isn't too hard. You find the table you want to index, right-click it, and from the "Full-Text index" sub-menu, choose "Define Full-Text Index" (note the inconsistent capitalisation, naughty Microsoft!)...

When the wizard appears, click past the first step (which just checks you have a primary key defined, and allows you to choose which to use), and you'll see the fields in the table...

Check off the one(s) that you want to be indexed, and click Next. Click Next again to go past the change tracking step (you want to leave it as Automatically).

You now get to choose or create the full-text catalogue that will be used for the fields you chose. If this is the first time you've created a catalogue for this table, you'll want to check the "Create a new catalogue" checkbox and enter a name for the catalogue. I use the table name with FT appended, but I can't claim this is any standard.

Make sure to click the Insensitive radio button. You can leave the two dropdowns at the bottom as they are and click Next.

Click Next again to go past the population schedules step as we don't need it.

You will then be shown a summary of the catalogue, and if you click Finish, it will be created and populated for you. Depending on how much data is in the table, it may take a few minutes for the catalogue to be fully populated.

So now we are ready to query the data...

Fit The Fourth - Single-table queries

There are various ways of querying full-text catalogues, and you can easily search to find out the benefits of each one. I used fulltexttable, as it has a very powerful algorithm for matching text, including variant word forms, proximity search, stop word ignoring, etc.

In order to search the data, you can use a query something like the following...

select ID, k.rank, ShortSummary, Description from SupportTickets st

inner join freetexttable(SupportTickets, (ShortSummary, Description), 'computer') as k on st.ID=k.[key]

order by rank desc

This uses freetexttable to create a full-text indexed view of the table, using the two columns we included in the catalogue. You can use any combination of indexed columns here, so you can query a subset of the fields if you wish. You can enter as many words as you like, and SQL Server will attempt to match as many as it can, taking frequency into account.

This gives results that look like this...

The first column is the ticket ID. The second column shows the rank, which is SQL Server's indication of how well the row matched the keyword(s) entered. Having searched around, I have come to the conclusion that the actual algorithm for calculating those ranks is based in some very deep Kabbalistic literature, and is partially based on how buckets of goat's entrails you've sacrificed to the Great God Of Database Servers that day. Microsoft seem to be pretty quiet on how the algorithm works, leaving us rather in the dark. This is a little embarrassing when you demo the feature to the client, and they, knowing their data a lot better that you, query for some significant keyword, and get the wrong results. You have to try and explain why your fab code, that you thought was working wonderfully well, came up with the wrong answers! Ho hum.

Note that as you can see from the query, we are selecting from the SupportTickets table, we can include any fields we like, not just those that were indexed.

If you only want to do full-text searching on one table, then you can skip down to Fit The Ninth, and avoid four fits of madness. Otherwise, take a deep breath and read on.

At this stage, I was smiling to myself, thinking that the job was nearly over. All I need to do is join the table,s and I'm done. Ha, teach me a lesson eh?

Fit The Fifth - Joining the tables

Armed with the above query, I set about writing one that would use all of the tables, and give me results that searched the tickets, their notes and the associated problems and solutions.

I'll spare you the gory details, but it turns out that you can't do this. Well, you can write some pretty scary queries that involve joining various freetexttables, but it doesn't work the way you expect, and you do your head in trying to work out how to combine the ranks.

So, off you go back to your favourite search engine, and you find out that what you really need is a view...

Fit The Sixth - Creating a view for the query, and the discovery of stuff

Views are very useful in situations like this, as they enable you to write a complex query, wrap it in a view, and pretend it's a single table. They tend to be read-only, but that's not an issue here.

So, all I needed to do was write a query that joined the various tables, and wrap it in a view. simple eh?

Again, sparing you a few hours' worth of unpleasant details, the end result was that this wasn't so easy, as joining the tables resulted in multiple rows for each support ticket (ie one for each note). This made querying across all the fields impossible.

What I really needed was a way to pull all of the note information into one field, leaving me with one row per ticket, but with all of the note data present. If you've used the Linq Aggregate() method, you'll know what I mean.

After some searching, I came across the delightfully named stuff T-SQL keyword. When used in conjunction with its close friend xml, it enables you to do just what I described above. The syntax is a bit hairy at first, but it works...

select st.ID, st.ShortSummary, st.Description, isnull(ps.Problem, '') Problem, isnull(ps.Solution, '') Solution,
    (select isnull(stuff((select cast(Details as varchar(max)) + ' ' from SupportTicketNotes stn
    inner join Actions a on a.ID=stn.ActionID where stn.SupportTicketID=st.ID for xml path('')), 1, 0, ''), '')) as Notes
from SupportTickets st
left join ProblemsAndSolutions ps on ps.ID=st.ProblemAndSolutionID

The first line of this is fairly simple, it's just selecting various fields. The second and third lines are where the clever stuff (sic) happens. I'm not going to give a full explanation of the stuff/xml syntax, partly because I don't fully understand it myself, and partly because it's dead dull and boring! If you're really interested, you can cure your insomnia by having a look at the MSDN pages about it. The query above is a little more complex than it needed to be, as the Details field of the Actions table was declared as type "text" which is incompatible with varchar, so I had to cast the one unto the other.

Anyway, the end result was the the data returned by this query included a Notes column that was an aggregation of all of the ticket's note details. I wrapped this query in a view, tested it out and was happy (briefly).

Ah, I must have cracked it this time. Well, no...

Fit The Seventh - Of unique keys, unique clustered indexes, schema binding and left joins

So there I was, ready to create a full-text catalogue on my shiny new view. You would think they would have the decency to leave me happy, but no, it all went wrong again.

When I tried to create the catalogue, I got an error message telling me it couldn't be created, as I needed a unique key on the view. Well, I thought I had one. After all, the primary table in the view is the SupportTickets table, and that has a unique key, which was included in the query, so what's the problem? The problem is that SQL Server is the boss, and it doesn't recognise that field as a unique key.

OK, so off to Google I went, and discovered that ll you need to do is create a unique clustered index for the primary key, and away you go. Or not...

I tried the following fabulous piece of SQL...

create unique clustered index IX_SupportTicketsWithExtraInfo on SupportTicketsWithExtraInfo(ID)

...where SupportTicketsWithExtraInfo was the snappy name of my view. I was somewhat annoyed to be told that the index couldn't be created due to some weird error about schema binding. I can't remember the exact text, but it was something weird like that.

So back off to Google to find out what that was all about, and it seems that all you need to do is alter the view to include schema binding...

create view SupportTicketsWithExtraInfo with schemabinding
as
...rest of the SQL here...

You know what's coming don't you. Another classic Microsoft "what the heck does that mean" error message...

Cannot schema bind view 'dbo.SupportTicketsWithExtraInfo' because name 'SupportTicketsWithExtraInfo is invalid for schema binding. Names must be in two-part format and an object cannot reference itself.

Huh? It turns out that in order to use schema binding you have to use table names that consist of two parts, such as Furry.Ferrets and the like. Well, there was no way I was going to rename the tables, and having tried to use aliases and views to get around the problem, I realised that I wasn't going to be able to do it that way.

I don't remember the exact sequence of events at this point, as I think hysteria was setting in, but I eventually got to the stage where I had something that (theoretically at least) could be used in the full-text catalogue. Sadly, it wasn't to be.

The final insult in this sorry story was when I found out that you can't use full-text indexing on a view that uses a left or right join. No, I'm not making this up, honest!

I came across a forum post where someone tantalisingly mentioned a trick where you could simulate an inner join, which would have been just what I wanted, except that I couldn't find any hint of it anywhere.

So that was it. After all that, I was scuppered at the last stage. Back to the drawing board.

Note that if you want to create a full-text index on a view that doesn't include left or right joins, you will be OK, and can gleefully skip down to Fit The Ninth. As for me, I had to ensure Fit The Eight first.

Fit The Eighth - The summary table and the triggers

At this point, I had to rethink the whole thing from the beginning. This was quite a strain on the brain cell, and involved a strong cup of herbal brew, but I eventually came up with the earth-shatteringly obvious idea of creating a new table to hold summaries of the support ticket information, and creating a full-text catalogue against that.

Remember stuff and xml from Fit The Sixth? Well that had enabled me to aggregate the note data into a single row in the view. All I needed to do was use this to aggregate the note data into the summary table. It was messy, but I ended up with triggers on the main tables that looked like this...

create trigger SupportTicketNotes_Update on SupportTicketNotes
after update
as
declare @ID int
select @ID = (select SupportTicketID from inserted)
declare @Notes varchar(max)
select @Notes = (select isnull(stuff((select cast(Details as varchar(max)) + ' '
  from SupportTicketNotes stn inner join Actions a on a.ID=stn.ActionID
  where stn.Active=1 and stn.SupportTicketID=@ID for xml path('')), 1, 0, ''), ''))
update SupportTicketsSummaries set Notes=@Notes where ID=@ID

With similar triggers on the other tables, I ended up with a SupportTicketsSummaries table that contained the fields I needed, and was a plain old single table. Hurray, back to Fit The Fourth!

I created a full-text catalogue against the summary table, and was finally able to write a query that would query all the support ticket data...

select ID, k.rank, ShortSummary + ' ' + Description + ' ' + Problem + ' ' + Solution + ' ' + Notes Info
from SupportTicketsSummaries st
inner join freetexttable(SupportTicketsSummaries, (ShortSummary, Description, Problem, Solution, Notes), @SearchText) as k on st.ID=k.[key]
order by rank desc

I wrapped this up in a stored procedure, with a parameter for the search text, and breathed a sigh of relief. I'd done it, I was finally ready to import the stored procedure into the Entity Framework model, and get on with implementing the code.

You know what's coming don't you? You should by now.

Fit The Ninth - Trying to find the stored procedure

I went into Visual Studio, opened the .edmx file, right-clicked on the designer surface and chose "Update model from database" as I'd done so many times before. I opened the Stored Procedures section, and scrolled down to my fab new stored procedure... that wasn't there! Huh? Wh'appen?

Back to SSMS to double-check. No, it really is there. Try updating the model again, no joy.

It turns out that you have to grant EXEC permissions on the stored procedure before you can see it in the Entity Framework model. I checked the connection string to see which user was being used to access the database, and granted EXEC permissions.

After a few false starts, I managed to get past this hurdle, only to hit the next one...

Fit The Tenth - Where are my parameters?

One of the nice things about importing a stored procedure into Entity Framework is that the wizard will check what columns are returned, and either create or map a complex entity to match the results. Except it didn't. When I clicked the button to get the schema, I got the grammatically awful message “The selected stored procedure or function returns no columns.” I know they are American, but they could make some effort to write decent English! I know, it's probably decent American, but I wasn't in a mood to argue to the point at this stage!

After a lot of searching, I didn't get very far. I found a lot of people reporting the same problem, but they were all using dynamic SQL or temporary tables. I wasn't using either, so didn't get any benefit from the suggestions.

I never did find an answer to this bit. I wondered if maybe the freetexttable was seen as a temporary table, or maybe it actually is a temporary table. I couldn't find an answer, but I did come across a wonderfully simple way around it.

The facility to detect the schema and generate a complex entity is a nice extra, but you don't actually have to use it. All you need to do is open the Model Browser, right-click on the Complex Types node, and add a new complex type that has properties that match the columns returned by your stored procedure. Make sure the names match the columns returned by your stored procedure exactly.

Then start the function import wizard, choose complex type, and pick the one you just created from the drop-down list.

Fit The Eleventh - Using the new function in the code

This bit turned out to be surprisingly easy, once I had remembered to regenerate the entities. The following code was all that was needed to get the results of the full-text search...

List<SupportTicketsFullTextSearchResults> rankedIDs = ((SalesTrackerCRMEntities)getContext()).SupportTicketsSearch(searchText).ToList();

Armed with this, I was able to pull out the full tickets for those in this query, and deliver them in ranked order.

As a final touch, I normalised the ranks, so that the highest one was always 100, and the others decreased, then used this to give the user an indication of the relative relevance of each ticket...

I showed this to the client, and got a heart-warming enthusiastic response. Made it all worthwhile.

Well, almost!

Fit The Eleventh - Finally reached dry land, none the worse for my adventures

Reminiscent in many ways of the late Mr Ulysses, I had finally made it back to dry land in one piece.

Tuesday, February 16, 2016 5:00:00 PM (GMT Standard Time, UTC+00:00)
# Thursday, June 18, 2015

Having blogged recently about a stupid error message, I came across another one! Microsoft much employ people to craft these, they just couldn’t be there by accident. I reckon they are put there to amuse developers, and distract them from the fact that the software is misbehaving!

Anyway, I was using the SQL Server Import and Export Wizard, and it threw up the following error message:

Error 0xc004706b: Data Flow Task 3: "Destination 14 - EntityHistory" failed validation and returned validation status "VS_ISBROKEN"

I just love that last bit!

Thursday, June 18, 2015 9:16:00 PM (GMT Daylight Time, UTC+01:00)

And I’ve seen a few!

I was just trying to back up an SQL database, and got this truly wonderful error message...

Not much you can say to that is there?

Wednesday, March 18, 2015 3:14:00 AM (GMT Standard Time, UTC+00:00)

Geographical searches on postcode

I am currently working on an application where I want to have a search feature that allows people to search for businesses within a certain distance of their home (or anywhere else they care to choose).

I have some old UK postcode data knocking around, and was going to use that. For those not familiar with them, UK postcodes are made up of two parts, a major (also known as “outward”) part, and a minor (or “inward”) part. The major part is one or two letters followed by one or two digits, and the minor part is a digit, followed by two letters. Examples of valid postcode formats are M25 0LE, NW11 3ER and L2 3WE (no idea if these are genuine postcodes though).

Coupled with the postcodes are northings and eastings. These odd-sounding beasties are simply the number of metres north and east from a designated origin, which is somewhere west of Lands End. See the Ordinance Survey web site for more details. If you have any two postcodes, you can calculate the distance between them (as the crow flies) by a simple application of Pythagoras’ Theorem. My intention was to use all of this in my search code.

Whilst doing some ferreting around the web, looking for more up-to-date data, I found out that you can now get the UK postcode data absolutely free! When I last looked, which was some years ago admittedly, they charged an arm and a leg for this. Now all you need to do is order it from their downloads page, and you get sent a link to download the data. They have all sorts of interesting data sets there (including all sorts of maps, street views and so on), but the one I wanted was the Code-Point Open set. This was far more up-to-date than the data I had, and was a welcome discovery. Now all I had to do was import it, and write some calculation code.

Before I got any further though, I further discovered that SQL Server 2008 has a new data type known as geography data, which is used to represent real-world (ie represented on a round Earth) co-ordinates. It also has the geometry data type, which is a similar idea, but uses a Euclidean (ie flat) co-ordinate system. Given that I am only dealing with the UK, the curvature of the Earth isn’t significant in distance calculations, and so either data type would do.

However, converting northings and eastings to geography data isn’t as simple as you might hope, but I found a post on Adrian Hill’s blog, where he described how to convert OS data into geometry data, provided a C# console program that does just that, and then showed how easy it is to query the data for distances between points. In other words, exactly what I wanted!

I won’t describe the details here, because you can follow that link and read it for yourself, but basically all you need to do is get the Open-Point data, download the converter and away you go. Truth be told, it wasn’t quite that simple as the format of the data has changed slightly since he wrote the code, so I needed to make a small change. I left a comment on the post, and Adrian updated the code, so you shouldn’t need to worry about that. It doesn’t use hard-coded column numbers anymore, so should be future-proof, in case the OS people change the format again.

Testing the distance calculation, and an important point to improve performance

Once I had the data in SQL Server, I wanted to see how to use it. In the blog post, Adrian showed a simple piece of SQL that did a sample query. You can see the full code on his blog, but the main part of it was a call to an SQL Server function named GeoLocation.STDistance() that did the calculation. He commented that when he tested it, he searched for postcodes within five miles of his home, and got 8354 rows returned in around 500ms. I was somewhat surprised when I tried it to discover that it took around 14 seconds to do a similar calculation! Not exactly what you’d call snappy, and certainly too slow for a (hopefully) busy application.

I was a bit disappointed with this, but decided that for my purposes, it would be accurate enough to do the calculation based on the major part of the postcode only. One of the things Adrian’s code did when importing the data to SQL Server was create rows where the minor part was blank, and the geography value was an average for the whole postcode major area. I adjusted my SQL to do this, and I got results quickly enough that the status bar in SQL Server 2008 Management Studio reported them as zero. OK, so the results won’t be quite as accurate, but at least they will be fast.

Whilst I was writing this blog post, I looked back at Adrian’s description again, and noticed one small bullet point that I had missed. When describing what his code did, he mentioned that it created a spatial index on the geography column. Huh? What on earth (pardon the pun) is a spatial index. No, I had no idea either, so off I went to MSDN, and found an article describing how to create a spatial index. I created one, which took a few minutes (presumably because there were over 17 million postcodes in the table), and tried again. This time, I got the performance that Adrian reported. I don’t know if his code really should have created the spatial index, but it didn’t. However, once it was created, the execution speed was fast enough for me to use a full postcode search in my application.

Bringing the code into the Entity Framework model

So, all armed with working code, my next job was to code the search in my application. This seemed simple enough, just update the Entity Framework model with the new format of the postcodes table, write some Linq to do the query and we’re done… or not! Sadly, Entity Framework doesn’t support the geography data type, so it looked like I couldn’t use my new data! This was a let-down to say the least. Still, not to be put off, I went off and did some more research, and realised that it was time to learn how to use stored procedures with Entity Framework. I’d never done this before, simply because when I discovered Entity Framework, I was so excited by it that I gave up writing SQL altogether. All my data access code went into the repository classes, and was written in Linq.

Creating a stored procedure was pretty easy, and was based on Adrian’s sample SQL:

create procedure GetPostcodesWithinDistance
@OutwardCode varchar(4),
@InwardCode varchar(3),
@Miles int
as
begin
  declare @home geography
  select @home = GeoLocation from PostCodeData
      where OutwardCode = @OutwardCode and InwardCode = @InwardCode
  select OutwardCode, InwardCode from dbo.PostCodeData
  where GeoLocation.STDistance(@home) <= (@Miles * 1609)
    and InwardCode <> ''
end

Using stored procedures in Entity Framework

Having coded the stored procedure, I now had to bring it into the Entity Framework model, and work out how to use it. The first part seemed straightforward enough (doesn’t it always?). You just refresh your model, open the “Stored Procedures” node in the tree on the Add tab, select the stored procedure, click OK and you’re done. Or not. The slight problem was that my fantastic new stored procedure wasn’t actually listed in the tree on the Add tab. It was a fairly simple case of non-presence (Vic wasn’t there either).

After some frustration, someone pointed out to me that it was probably due to the database user not having execute permission on the stored procedure. Always gets me that one! Once I had granted execute permission, the stored procedure appeared (although Vic still wasn’t there), and I was able to bring it into the model. Then I right-clicked the design surface, chose Add –> Function Import and added the stored procedure as a function in the model context. Finally I had access to the code in my model, and could begin to write the search.

Just to ensure that I didn’t get too confident, Microsoft threw another curve-ball at me at this point. My first attempt at the query looked like this:

IEnumerable<string> majors = from p in jbd.GetPostcodeWithinDistance("M7", 45)
 
IQueryable<Business> localBusinesses = from b in ObjSet
          where (from p in majors
                  where p.Major == b.PostcodeMajor
                  select p).Count() > 0
          select b;

The first query grabs all the postcodes within 45 miles of the average location of the major postcode M7.Bear in mind that this query was written before I discovered the spatial index, so it only uses the major part of the postcode. A later revision uses the full postcode. The variable jbd is the context, and ObjSet is an ObjectSet<Business> which is created by my base generic repository.

When this query was executed, I got a delightfully descriptive exception, “Unable to create a constant value of type 'JBD.Entities.Postcode'. Only primitive types ('such as Int32, String, and Guid') are supported in this context.” Somewhat baffled, I turned the Ultimate Source Of All Baffling Problems known as Google, and discovered that this wasn’t actually the best way to code the query, even if it had worked. I had forgotten about the Contains() extension method, which was written specifically for cases like this.

The revised code looked like this (first query omitted as it didn’t change):

IQueryable<Business> businesses = from b in ObjSet
                                  where majors.Contains(b.PostcodeMajor)
                                  select b;

Somewhat neater, clearer and (more to the point) working!

So, with working code, I now only had more hurdle to jump before I could claim this one had been cracked.

Copying geography or geometry data to another database

All of the above was going on on my development machine. Now I had working code, I wanted to put the data on the production server, so that the application could use it. This turned out to be harder than I thought. I tried the SQL Server Import/Export wizard, which usually works cleanly enough, but got an error telling me that it couldn’t convert geography data to geography data. Huh? Searching around for advice, I found someone who suggested trying to do the copy as NTEXT, NIMAGE, and various other data types, but none of these worked either.

After some more searching, I discovered an article describing it, that says that SQL Server has an XML file that contains definitions for how to convert various data types. Unfortunately, Microsoft seem to have forgotten to include the geography and geometry data types in there! I found some code to copy and paste and, lo and behold, it gave the same error message! Ho hum.

After some more messing around and frustrated attempts, I mentioned the problem to a friend of mine who came up with the rather simple suggestion of backing up the development database and attaching it to the SQL Server instance on the production machine. I had thought of this, but didn’t want to do it as the production database has live data in it, which would be lost. He pointed out to me that if I attached it under a different name, I could then copy the data from the newly-attached database to the production one (which would be very easy now that they were both on the same instance of SQL Server), and then delete the copy of the development database. Thank you Yaakov!

Thankfully, this all worked fine, and I finally have a working geographic search. As usual, it was a rough and frustrating ride, but I learned quite a few new things along the way.

Tuesday, September 20, 2011 2:12:00 PM (GMT Daylight Time, UTC+01:00)