Forum OpenACS Development: SEO friendly urls

Request notifications

Collapse
Posted by Iuri Sampaio on
Hi there,

How would I switch from urls such as: https://evex.co//items/item-edit?item_id=65040

to their names such as https://evex.co/items/joeysirishpub

items/ is a custom package, which uses cr_items and cr_revisions

Collapse
2: Re: SEO friendly urls (response to 1)
Posted by Gustaf Neumann on

at least 3 approaches:

  • write a .vuh handler
  • register procs which do redirect
  • register procs with internal redirect (query url is not changed in the client browser)

Assume, you want to prettify http://YOURHOST/forums/forum-view?forum_id=1264


ns_register_proc GET /forums/myforum1 {
    ad_returnredirect /forums/forum-view?forum_id=1264
}

ns_register_proc GET /forums/myforum2 {
    set form [rp_getform]
    ns_set update $form forum_id 1264
    rp_internal_redirect /packages/forums/www/forum-view
}
Collapse
3: Re: SEO friendly urls (response to 2)
Posted by Iuri Sampaio on
In this case, It'd be necessary to register all items (i.e. declaring one ns_register_proc per each $item(title), as in a FOR statement :

foreach item_id $item_ids {

set title [content::item::get_title -item_id $item_id]

ns_register_proc GET /items/[util::text_to_url $title] {
set form [rp_getform]
ns_set update $form item_id $item_id
rp_internal_redirect /packages/evex-item/www/item-edit
}
}

Furthermore, .vuh file would become a bottleneck, having to process it every page request.

Collapse
4: Re: SEO friendly urls (response to 3)
Posted by Gustaf Neumann on
yes, as the word "ns_register" implies, you have to register the names. The .vuh files is easier to maintain, when items are added/deleted. I would not expect that the .vuh files becomes a bottleneck. Make some measurements!
Collapse
5: Re: SEO friendly urls (response to 4)
Posted by Iuri Sampaio on
Yes, even though I was expecting something similar to https://openacs.org/test-doc/tutorial-vuh

It doesn't use ns_register at all. Why? Old chunk?
I got it working fine, using item_id directly. The problem gets in when I try to tweak the algorithm to use $item(title) instead.

I'm still working on it.

# Transform requests of type: a/b
# into this internal request: A?c=b
# for example, note/495 > note-edit?item_id=496
# a: base name of this .vuh file
# b: from the request
# A: hard-coded
# C: hard-coded

set query [ad_conn url]
set request [string range $query [expr [string last / $query] + 1] end]
rp_form_put item_id $request
set internal_path "/packages/[ad_conn package_key]/www/note-edit"
rp_internal_redirect $internal_path

Collapse
6: Re: SEO friendly urls (response to 5)
Posted by Iuri Sampaio on
Gustaf,

Foreach statement is the bottleneck I was trying to mention earler.

set query [ad_conn url]
set request [string range $query [expr [string last / $query] + 1] end]

set item_ids [db_list select_items {
SELECT item_id FROM cr_items WHERE content_type = 'ee_venue' OR content_type = 'ee_service'
}]

foreach item_id $item_ids {
set title [util_text_to_url [content::item::get_title -item_id $item_id]]
if {$title eq $request} {
rp_form_put item_id $item_id
}
}

set internal_path "/packages/[ad_conn package_key]/www/item-edit"
rp_internal_redirect $internal_path

Collapse
7: Re: SEO friendly urls (response to 6)
Posted by Antonio Pisano on
Dear Iuri,

I would not expect a loop through every cr_item to scale on the long term...

I suggest you to have a look to the documentation and e.g. try whether one of these procs fits your needs:
https://openacs.org/api-doc/proc-view?proc=content::item::get_id&source_p=1
https://openacs.org/api-doc/proc-view?proc=content::item::get_id_by_name&source_p=1

Once you figure out how to make the lookup more efficient, your vuh should get you where you want.

Ciao

Antonio

Collapse
8: Re: SEO friendly urls (response to 7)
Posted by Iuri Sampaio on
Hi Antonio,
Right on!
Instead of looping over all cr_items, the algorithm now searches for an item_id associated with the string_url passed.

That way access becomes specifc.

So, as Gustaf said before...
Make some measure measurements!

Best wishes,

Collapse
9: Re: SEO friendly urls (response to 8)
Posted by Gustaf Neumann on

Iuri,

although your sketch above might work for small instances, but it does not scale in a more general case. Getting first all items of certain types first and checking these later on a per-item basis, whether the item has the right title, is not the way to go. We have instances with more than 40 mio items and 100 mio revisions, where this approach would lead to pages taking forever.

I am as well not so sure, whether the "title" the attribute is a good attribute for checking (it might change between revisions). The attribute "name" of cr_items is more stable, by using it you you can use the API (content::item::getidby_name).

The following snippet might lead you a way to go.

all the best
-g


set title [lindex [ad_conn urlv] end]
set item_ids [db_list select_items {
   select ci.item_id
   from cr_items ci, cr_revisions cr 
   where ci.live_revision = cr.revision_id 
   and cr.title = :title 
   and ci.content_type in ('ee_venue', 'ee_service')
}
if {[llength $item_ids] == 1} {
   # handle case, where there is an exact match
} elseif {[llength $item_ids] == 0} {
   # handle case, where there is no match
} else {
   # handle case, where there are multiple matches
}
Collapse
10: Re: SEO friendly urls (response to 9)
Posted by Iuri Sampaio on
Hi Gustaf,

As I've mentioned the foreach statement is a bottleneck for scalability.

Indeed title isn't the best for usual scenarios (i.e. where name is stable, cleaned from special chars and recommended).

However, title's behavior is precisely what I want to achieve aligned with SEO mutable pretty links. They're very dynamic.

Changes in item titles reflects in the keywords and metatags for SEO within the source code.

Still, your code is too specific and there's a bug when it exits 2 items, different content_types, with the same title.

Sometimes, I think it's better to have content_type splited and checked. In separated dbs_ But I still haven't thought this through.

By the way, I've noticed you didn't write ns_register in your chunk of code. Aren't they mandatory?

set query [ad_conn url]

set request [string range $query [expr [string last / $query] + 1] end]
#rp_form_put item_id $request

ns_log Notice "REQUEST $request"
set item_id [db_string select_item_id {
SELECT item_id FROM ee_venuesx WHERE url = :request
} -default 0]

if {$item_id eq 0 } {
set item_id [db_string select_item_id {
SELECT item_id FROM ee_servicesx WHERE url = :request
} -default 0]
}

if {$item_id ne 0 } {
rp_form_put item_id $item_id
}

set internal_path "/packages/[ad_conn package_key]/www/item-edit"
rp_internal_redirect $internal_path

Collapse
11: Re: SEO friendly urls (response to 10)
Posted by Gustaf Neumann on
a few comments:

- the snipped above was intended for a .vuh handler
- "very dynamic": note that you get bad ratings on frequently changing URLs when other people use make links to your content, and you change the URL pointing to the same content (without a redirect)
- "title" has has well bad internationalization aspects (depending on the language settings, a link might work or not, ... when e.g. a user is logged out or logged in)
- why is there a bug in the code snipped above? as indicated in the comments, your code has to handle the case that a look up returns multiple (or no) results. The code snipped hat the purpose to show you a way out of the scalability problem of your previous approaches.
- why should ns_return be mandatory? mandatory for what?
- don't use manual URL splitting, but the API