Forum OpenACS Development: From database to JSON

Collapse
Posted by Antonio Pisano on
Hello everyone,

while working on some code I started pondering about an interface that would allow to generate a JSON structure from a database query "fast". By fast I mean that the experience should feel seamless as, for instance, using existing db_* api like db_list_of_list or db_multirow

db_multirow in particular is the kind of experience I am thinking about: it goes quickly from a query to a data structure, which is well established in the templating system and can be extended also via Tcl.

The core trick to achieve such an interface is relatively easy to implement via libraries such as json::write and I have in fact tried different approaches that I summarize:

Direct api: something like db_json that executes a query and returns the result as JSON. It can be implemented like db_multirow to allow extending the columns from the query via tcl in an extra code block. This looks a lot like reimplementing db_multirow, except the json part. The resulting json cannot be easily extended or modified without some reparsing.

Converting template::multirow: this reuses the many facilities offered by multirow and just converts to json later. It is an extra step, so not as seamless. It also uses roughly twice as memory to store the extra multirow and wastes cycles to compute it. However, we get all of the multirow facilities for free.

Converting a simple list of lists: one can obtain such a structure via e.g. db_list_of_lists and manipulate it in tcl easily. I like this approach, however, it makes it harder to "extend" existing user interfaces using template::list to return JSON, which was my original use case. Such pages normally already ship with a multirow to convert.

I have also tried extending template::list to return JSON similar to what it does for CSV, but I do not want to go there, as there are more corner cases one has to cover and a smaller chance for reuse. For 99% of cases converting the multirow would be sufficient IMO.

Do you have any suggestion about what the best approach would be? Do you have different approaches in mind?

Thanks a lot for your 2 cents

Antonio

Collapse
2: Re: From database to JSON (response to 1)
Posted by Brian Fenton on

Hi Antonio!

this is an interesting discussion. FYI, both Postgres and Oracle allow you to output JSON directly from the database - so that could be one option for you.

I normally use db_foreach to achieve this - here is a simple example pulling data from the users table. JSON can have a rich hierarchical structure so I like the flexibility that db_foreach allows me if I need to achieve more complex structures.

Here I'm using the util::json APIs to build the JSON - I know this has some limitations so you might not want to use it. You could also use rl_json - that is on my list of things to look at when I get a chance.

set sql "select user_id, username,  last_visit from users" 
set json_list_of_items [list]
db_foreach get_users $sql {
    lappend json_list_of_items [util::json::object::create [list user_id $user_id username $username last_visit $last_visit  ] ]
}
set json_array [util::json::array::create $json_list_of_items]
set response_list [util::json::gen $json_array]
doc_return 200 application/json $response_list
ad_script_abort

Brian

Collapse
3: Re: From database to JSON (response to 2)
Posted by Antonio Pisano on
Hi Brian!

My experiments so far do not look smarter than your example. I think this approach covers already most of the use cases fo SQL -> JSON. I agree with you that a complete solution may need to take into account nested JSON structures, which would increase the complexity of conversion.

There are actually quite a few ways to produce JSON in tcl, some we have mentioned, but e.g. tDOM or huddle can also do it.

I was wondering if OpenACS as a platform should provide some easier interface to produce JSON out of the box: so far we are strong in extracting data from the database and make it into markup, but modern UIs often fetch their data from JSON endpoints.

Do you think there is any value in providing some tool in OpenACS itself?

Collapse
4: Re: From database to JSON (response to 3)
Posted by Brian Fenton on
I guess it would depend on how you plan to generate the JSON. For example, last time I looked, on Windows we were unable to use rl_json, so choosing that could be a problem for us. But rl_json's template approach is very appealing, so that would be a pity not to use. I see also Neophytos has his tjson library, which I haven't looked at. OpenACS has always had a great history of staying close to the database, so maybe it is worth taking a look at how Postgres returns JSON and maybe integrating with that?

I think it's tricky to make it more simple than my example, which still keeping it powerful enough to handle complex cases. If you look at something like Symfony's JsonResponse class, it sounds great, but really they're just moving the pain point somewhere else (to making sure your data is already structured properly before calling JsonResponse) https://symfony.com/doc/current/controller.html#returning-json-response

just some random (probably not very helpful) thoughts
Brian
PS we discussed some of this previously here https://openacs.org/forums/message-view?message_id=3854601 and Malte raised a very interesting discussion here https://openacs.org/forums/message-view?message_id=5452649

Collapse
5: Re: From database to JSON (response to 4)
Posted by Neophytos Demetriou on
Hi Brian,

Regarding tjson library: it does not support Windows at the moment. It should not be too much work to get it to run on Windows but I don't have the setup yet - I will try and let u know. tjson works on Linux and MacOS at the moment.

Coming to the question at hand. I think it is the responsibility of the database or its driver to produce JSON or an intermediate format (because there are no types in TCL). This is what the AWS DynamoDB and the MongoDB drivers do (not 100% sure about the latter). They return an intermediate format that tjson can use in both cases. The reason this needs to happen in the database or its driver is because they are aware of the types already.

... I have to run, I will follow up later.

Collapse
7: Re: From database to JSON (response to 5)
Posted by Brian Fenton on
Hi Neophytos

good to see you still around! I accept your point about the responsibility of the database, and the importance of types - very important points. However there will be occasions when there is a need to maybe apply a TCL API to the data before sending it to the client - this may not be achievable from the database. Flexibility is key here I think.

Brian

Collapse
8: Re: From database to JSON (response to 7)
Posted by Neophytos Demetriou on
tjson supports manipulation commands even though there is no create command (I will add it). Here is how your example looks like with tjson with the existing commands:


set sql "select user_id, username,  last_visit from users"
set json_arr [::tjson::parse {[]}]
db_foreach get_users $sql {
   ::tjson::add_item_to_array $json_arr \
        [list M \
                [list \
                        user_id [list N $user_id] \
                        username [list S $username] \
                        last_visit [list S $last_visit]]]
}
doc_return 200 application/json [::tjson::to_pretty_json $json_arr]
ad_script_abort

And here is what you get:


[
  {
    "user_id": 0,
    "username": "guest",
    "last_visit": ""
  },
  {
    "user_id": 759,
    "username": "test",
    "last_visit": "2023-07-26 09:27:37.590777-04"
  }
]

M means it is an object, N means it is a number, and S means it is a string. tjson also supports the triple notation that is used in mongodb driver (documentation for that feature is still pending).

Collapse
9: Re: From database to JSON (response to 8)
Posted by Brian Fenton on
This looks very good. I particularly like the type support, which is badly missing in util::json
Collapse
10: Re: From database to JSON (response to 9)
Posted by Neophytos Demetriou on
You can see more object or array manipulation commands in this example.

I will add a create command though. Thanks for that.

Collapse
11: Re: From database to JSON (response to 9)
Posted by Antonio Pisano on
tDOM also offers explicity JSON type support, see e.g. jsonType at [1]

One advantage of using tDOM is that OpenACS already depends on it, so there is no extra library to load. Plus, cross-platform compatibility is already sorted out.

[1] http://tdom.org/index.html/doc/trunk/doc/domNode.html

Collapse
Posted by Neophytos Demetriou on
I see your point but it does not support arrays, e.g. [1,2,3] or ["a","b","c"]. tdom does not parse it into anything. I tried it before I built tjson.
Collapse
Posted by Antonio Pisano on
I have just tried this in a development shell:

dom parse -json {[1,2,3]} test
$test asJSON

-->

[1,2,3]

so it seems that the parsing has worked. Can you provide an example where tDOM fails to parse a valid JSON?

Ciao

Collapse
Posted by Neophytos Demetriou on
I stand corrected.
Collapse
Posted by Neophytos Demetriou on
Another way is to write the result to a csv and then exec python to convert it to json. The following line should work:

set result [exec -ignorestderr python3.10 -c "import csv, json, sys; print(json.dumps(\[dict(r) for r in csv.DictReader(sys.stdin)\]))" < /path/to/data.csv]
Collapse
Posted by Neophytos Demetriou on
Can you provide an example where tDOM fails to parse a valid JSON?

Please download this file: https://raw.githubusercontent.com/stripe/openapi/master/openapi/spec3.json

and then run:


package require tdom
set fp [open "spec3.json"]
set data [read $fp]
close $fp
puts [time "dom parse -json $data test" 1

I get: error "JSON syntax error" at position 15

tjson runs fine with it:


package require tjson
set fp [open "spec3.json"]
set data [read $fp]
close $fp
puts [time "::tjson::parse $data" 1]

=> 18316 microseconds for tjson

It's probably something that I am doing wrong. Please let me know if you have better luck with it.

Collapse
Posted by Antonio Pisano on

Hi Neophytos,

I have tried the following on my test vanilla instance:

set fp [open "/tmp/spec3.json" r]
set data [read $fp]
close $fp
lappend results [time {dom parse -json $data test} 1] [$test asJSON]

And the result was:

{23639 microseconds per iteration} {{"components":{"schemas":... (truncated)

The problem was the double quotes in

"dom parse -json $data test"

the $data JSON is expanded into the command string and won't be valid anymore.

Ciao

Antonio

Collapse
Posted by Rolf Ade on
Past EuroTcl 2024 Neophytos pointed me to this thread.

Antonio already pointed out that a valid test should read:

puts [time {dom parse -json $data test} 1]

Since the tjson code snipped had also the wrong double quotes I
wondered why tjson not also reported JSON error on that input.

There is a well-known JSON test-suite under
https://github.com/nst/JSONTestSuite/archive/refs/heads/master.zip

I adapted a small test script of mine (see below) and gave the current
tjson head code from github a try with 8.6 (it's currently not
migrated to Tcl 9 and seg faults, if I try to run it with that).

With Tcl 8.6 and 9 tDOM parses all valid JSON test data and raises
error on all invalid input data.

On the other hand tjson with 8.6 raises error on the valid JSON 

y_number_double_close_to_zero.json 

and does not raise error on the invalid data:

n_object_trailing_comment_slash_open.json
n_object_trailing_comment_slash_open_incomplete.json
n_string_invalid_unicode_escape.json
n_array_extra_close.json
n_string_unescaped_ctrl_char.json
n_structure_trailing_#.json
n_multidigit_number_then_00.json
n_structure_number_with_trailing_garbage.json
n_object_with_trailing_garbage.json
n_structure_close_unopened_array.json
n_number_-2..json
n_string_unescaped_newline.json
n_structure_whitespace_formfeed.json
n_number_0.e1.json
n_object_trailing_comment.json
n_number_2.e3.json
n_number_2.e+3.json
n_number_neg_real_without_int_part.json
n_structure_array_trailing_garbage.json
n_string_with_trailing_garbage.json
n_array_comma_after_close.json
n_number_2.e-3.json
n_number_-01.json
n_string_unescaped_tab.json
n_structure_object_followed_by_closing_object.json
n_number_with_leading_zero.json
n_structure_object_with_trailing_garbage.json
n_object_trailing_comment_open.json
n_structure_double_array.json
n_number_neg_int_starting_with_zero.json
n_number_real_without_fractional_part.json
n_structure_array_with_extra_array_close.json

Although, very probably this is not a shortcoming of the tjson code
but of the underlying interfaced C json parser.

package require tdom
load ./libtjson.so

set testdir JSONTestSuite-master/test_parsing

foreach type {i y n} {
    set r($type,nr) 0
    set r($type,readfailed) ""
    set r($type,failed) ""
    set r($type,ok) ""
    foreach file [glob $testdir/${type}_*] {
        incr r($type,nr)
        set filetail [file tail $file]
        set fd [open $file]
        if {[catch {
            set data [read $fd]
        }]} {
            close $fd
            lappend r($type,readfailed) $filetail
            continue
        }
        close $fd
        if {[catch {
            # for tdom use
            # set doc [dom parse -json -- $data]"
            set doc [tjson::parse $data]
        }]} {
            lappend r($type,failed) $filetail
        } else {
            lappend r($type,ok) $filetail
        }
    }
}

foreach {type report} {
    i {readfailed failed}
    y failed
    n ok
} {
    puts "type $type"
    puts "nr of tests $r($type,nr)"
    puts "readfailed [llength $r($type,readfailed)]"
    puts "failed [llength $r($type,failed)]"
    puts "ok [llength $r($type,ok)]"
    set details ""
    foreach toreport $report {
        set thisdetails [join $r($type,$toreport)]
        if {$thisdetails ne ""} {
            append details "$toreport: $thisdetails\n"
        }
    }
    if {$details ne ""} {
        puts "Details:"
        puts $details
    }
}

Collapse
Posted by Neophytos Demetriou on
Hi Rolf,

I will check the segfaults. Other than that, I have nothing else to comment. Was it linux, macos, or windows?

Collapse
Posted by Rolf Ade on
That was on linux.

If you haven't done migration effort for Tcl 9 it is the "expected" result, that even if the build with Tcl 9 comes to an end with warnings (as it was for me) the lib very shortly raise seg fault. So, if you haven't tackled migration to Tcl 9 then it would be of course great if you do but otherwise there isn't something to do wrt the seg fault.

Collapse
Posted by Neophytos Demetriou on
Hi Rolf, at my morning coffee and I cannot stop thinking of the segfaults you reported. I am using tcl9 both on linux and macos without any build warnings or segfaults. So, I kindly asked Konstantin to independently check and I will post again when he does.

Unless tcl9 is installed at /usr/local you need to provide TCL_LIBRARY_DIR and TCL_INCLUDE_DIR as discussed in the readme.md. I am not 100% sure you did that as I see you load the shared library with load instead of package require. If you were to provide more details, we can fix things, which is on us to make it work for everyone.

That said, I guess we should call up amazon to stop using cjson in their c++ sdk. And, by that I mean, I am glad tdom passes all tests but it is no reason for me to switch to tdom-json. I am using tdom for xml as you know already.

Furthermore, I finally figured out why arrays in tdom were not working for me. I was using xml manipulation commands to create the array without creating an "arraycontainer" node.

Collapse
Posted by Rolf Ade on
The seg faults was my fauilt - I was stupid, absently and careless to mix up a Tcl 9 shell with a Tcl 8.6 linked extension. And not to imediately realize that.
Sorry for the noise.
Collapse
Posted by Neophytos Demetriou on
Thanks for clarifying. I think tdom is great and I must admit I learnt a lot by studying its code. I also think the more extensions, the better for the TCL ecosystem.
Collapse
6: Re: From database to JSON (response to 4)
Posted by Antonio Pisano on
I see I have also "contributed" to the conversation in the past! Funny how we did not get much further from there more than 10 years ago 😊
Collapse
18: Re: From database to JSON (response to 1)
Posted by Gustaf Neumann on
Dear all,

to get data from the database straight to JSON, one should use the native support from the database.

JSON-Tcl processing contains two parts, parsing JSON and generating it. Parsing is for DB application just relevant, when the database returns JSON directly. Below are some figures from different implementations, including tdom, tjson, from the mongdb package in nsf (bson, supporting binary datatypes), and scripted Tcl

microseconds
tdom-parse 16,044.66
tjson-parse 11,543.77
bson-triple 51,567.74
tjson-typed 72,859.57
tjson-triple 72,118.74
json2dict 1,345,247.94

One can see here significant performance differences, which also depend on the result structure. While tdom and tjson return a handle to the internal structure, bson is transformed into a Tcl list structure containing triples of names, types and values, or json2dict returns a dict. One can also use tjson to produce the same triple notation (see "tjson-triple"). The differences between the C implemented versions are not huge (when comparing a similar richness of the output).

In order to generate JSON, from these notations, we see the following results:

microseconds
tdom-generate 9,714
tjson-generate 18,117
bson-generate 58,271

I've also looked at huddle (https://github.com/aplicacionamedida/huddle), using the json2huddle and "huddle jsondump" from there:

microseconds
huddle-parse 9,455,761
huddle-generate 4,394,388
Performance is not the primary reason to use huddle for this kind of application.

We can see, that the C-implemented versions are clear better (no big surprise). I am pretty sure, that rl_json performs similar. All of these have some advantages:

  • tdom is still very fast and established, it keeps the type information internally to achieve the appropriate JSON quoting.
  • tjson has in some areas advantages
  • bson has support for binary datatypes, but this is not important, when interfacing with PostgreSQL and Oracle). The triple notation makes it ease to provide types for the values on the Tcl level.
For OpenACS it is preferable to provide a common API to these backend packages, without losing much performance.

all the best
-gn

package require json
set jsonfile /usr/local/src/tjson/spec3.json ; set count 10
set fp [open $jsonfile r]; set JSON [read $fp]; close $fp
lappend results [list tdom-parse [time {dom parse -json $JSON test} $count]]]
lappend results [list tjson-parse [time {::tjson::parse $JSON} $count]]
lappend results [list bson-triple [time {::mongo::json::parse $JSON} $count]]
lappend results [list json-typed [time {set typed_spec [::tjson::json_to_typed $JSON]} $count]]
lappend results [list tjson-triple [time {set typed_spec [::tjson::json_to_typed $JSON]; ::tjson::typed_to_custom $typed_spec} $count]]
lappend results [list json2dict [time {::json::json2dict $JSON} $count]]

set T [dom parse -json $JSON test]
lappend results [list tdom-generate [time {$T asJSON} $count]]]

set TJ [::tjson::parse $JSON]
lappend results [list tjson-generate [time {::tjson::to_json $TJ} $count]]

set TRIPLE [::mongo::json::parse $JSON]
lappend results [list bson-generate [time {::mongo::json::generate $TRIPLE} $count]]

Collapse
19: Re: From database to JSON (response to 1)
Posted by Neophytos Demetriou on
Hi Gustaf,

Thanks for the thorough review. Just a minor correction: tjson parse had no trace var when you run the test. I added it this morning and tested both (tdom and tjson) again and they are equivalent.

@Brian, I added Windows support to tjson but I only tested it with TCL. I will check NaviServer next weekend.