Any ideas of a fast way to search a table with JSON stored in a MEMO field ?
Below is a sample solution but it is slower than I would like to see. I'm finding on my laptop to search 1000 records it takes just over 1.5 seconds.
The only other solution I've come up with is to add a bunch of fields to the table and do a normal search for but this has its drawbacks besides being very sloppy and not allow me to maintain the flexibility of storing the values in JSON.
DO wconnect
* DEBUG
* SUSPEND
lnStart = SECONDS()
o=CREATEOBJECT('myTest')
o.myQuery()
? SECONDS()-lnStart
***********************
***********************
DEFINE CLASS myTest as Custom
oSerializer=.null.
***********************
***********************
FUNCTION Init()
this.oSerializer=CREATEOBJECT('wwJsonSerializer')
ENDFUNC
***********************
***********************
FUNCTION myQuery()
SELECT * FROM 'c:\dataPath\Notes' WHERE this.UDF( oJson )==.t. INTO CURSOR tQuery
ENDFUNC
***********************
***********************
FUNCTION UDF(lcJson)
IF EMPTY(lcJson)
RETURN .f.
ENDIF
TRY
loJson = this.oSerializer.DeserializeJson(lcJson)
llReturn = .f.
CATCH
llReturn = .t.
ENDTRY
IF llReturn
RETURN llReturn
ENDIF
IF PEMSTATUS(loJson,'VdrFk',5)
IF loJson.VdrFk = 150
RETURN .t.
ENDIF
ENDIF
RETURN .f.
ENDFUNC
***********************
***********************
FUNCTION DESTROY()
this.oSerializer=.null.
ENDFUNC
***********************
***********************
ENDDEFINE
depends on the proportion of records having JSON
you could eliminate faster records without JSON using (I use 'm.' to retrieve variables faster):
IF EMPTY(lcJson) or occurs('{', m.lcJson) # occurs('}', m.lcJson)
You could also do the same with less instructions, hence faster
(note: VFP evaluates after OR
only if first operand is .F.
, after AND
only if first operand is .T.
):
&& replace
IF llReturn
RETURN llReturn
ENDIF
IF PEMSTATUS(loJson,'VdrFk',5)
IF loJson.VdrFk = 150
RETURN .t.
ENDIF
ENDIF
RETURN .f.
&& by
RETURN m.llReturn OR (PEMSTATUS(loJson,'VdrFk', 5) AND loJson.VdrFk = 150)
Like Thierry says I think you'd want to pre-query your data to make sure the field is not empty (or NULL) to avoid having to call the UDF when there's nothing to do. IOW, let the SQL Engine do as much work to minimize the records you have to process with the UDF.
If you have 1000 records that's probably fine but if you have 10,000 the query will likely get exponentially slower, so you have to test this. If you're looking for specific values in the JSON perhaps you can pre-query the raw JSON to find the string match and only then deserialize the document with the UDF.
+++ Rick ---
Thank you both for your suggestions. I was able to get the hit time on the same 1000 records down to just under 0.9xx seconds by reducing the number of qualifying records the UDF() processed. I did this by using the EMPTY() and OCCURS() before running the TRY/CATCH DeserializeJSON() function and returning false when the record did not conform.
Then I went back and took a second look at the data and realized most of the records were not being processed by the DeserializeJSON() function. Something like 100 out of the 1000 actually got processed. So based on Rick's comment about performance, I appended 10,000 qualified records for testing the cycle time. The query results took over 17 seconds, yaks I need another solution!
I now think it is time to add new fields to the table so search hit time will again be sub-seconds results.
Thank you both again - Happy Holiday!
I think I'd probably do a query to preselect the records into a cursor. If you can filter based on string content (ie. you're matching some specific value) add that into the query and look for it in the JSON content as part of that initial query.
lcId = "12345"
*** Pre-filter query - just return what you need
select id, Json, Description from MyData ;
where !empty(lcJson) and json like "%" + lcId + "%" ;
into cursor TQuery
SCAN
lcJson = TQuery.Json
* ... do your actual processing
ENDSCAN
If you do this right you should end up just with a very small set of records to process in the SCAN loop.
The advantage of this is that you can let FoxPro run an efficient query to get the prelim results, which likely doesn't happen if you use a UDF() in the query.
You can play around with the search filter expression (ATC()
or OCCURS()
etc.) to get that as optimized as possible. I think using Native SQL rather than a FoxPro string command will be more efficient because any Fox command cannot be optimized while native SQL commands can be. OTOH, string searches of large blocks of text are pretty slow in FoxPro when dealing with large data sets and there's no good way to optimize that either AFAIK.
+++ Rick ---