Forum OpenACS Development: Re: SEO friendly urls

Collapse
7: Re: SEO friendly urls (response to 6)
Posted by Antonio Pisano on
Dear Iuri,

I would not expect a loop through every cr_item to scale on the long term...

I suggest you to have a look to the documentation and e.g. try whether one of these procs fits your needs:
https://openacs.org/api-doc/proc-view?proc=content::item::get_id&source_p=1
https://openacs.org/api-doc/proc-view?proc=content::item::get_id_by_name&source_p=1

Once you figure out how to make the lookup more efficient, your vuh should get you where you want.

Ciao

Antonio

Collapse
8: Re: SEO friendly urls (response to 7)
Posted by Iuri Sampaio on
Hi Antonio,
Right on!
Instead of looping over all cr_items, the algorithm now searches for an item_id associated with the string_url passed.

That way access becomes specifc.

So, as Gustaf said before...
Make some measure measurements!

Best wishes,

Collapse
9: Re: SEO friendly urls (response to 8)
Posted by Gustaf Neumann on

Iuri,

although your sketch above might work for small instances, but it does not scale in a more general case. Getting first all items of certain types first and checking these later on a per-item basis, whether the item has the right title, is not the way to go. We have instances with more than 40 mio items and 100 mio revisions, where this approach would lead to pages taking forever.

I am as well not so sure, whether the "title" the attribute is a good attribute for checking (it might change between revisions). The attribute "name" of cr_items is more stable, by using it you you can use the API (content::item::getidby_name).

The following snippet might lead you a way to go.

all the best
-g

set title [lindex [ad_conn urlv] end]
set item_ids [db_list select_items {
   select ci.item_id
   from cr_items ci, cr_revisions cr 
   where ci.live_revision = cr.revision_id 
   and cr.title = :title 
   and ci.content_type in ('ee_venue', 'ee_service')
}
if {[llength $item_ids] == 1} {
   # handle case, where there is an exact match
} elseif {[llength $item_ids] == 0} {
   # handle case, where there is no match
} else {
   # handle case, where there are multiple matches
}
Collapse
10: Re: SEO friendly urls (response to 9)
Posted by Iuri Sampaio on
Hi Gustaf,

As I've mentioned the foreach statement is a bottleneck for scalability.

Indeed title isn't the best for usual scenarios (i.e. where name is stable, cleaned from special chars and recommended).

However, title's behavior is precisely what I want to achieve aligned with SEO mutable pretty links. They're very dynamic.

Changes in item titles reflects in the keywords and metatags for SEO within the source code.

Still, your code is too specific and there's a bug when it exits 2 items, different content_types, with the same title.

Sometimes, I think it's better to have content_type splited and checked. In separated dbs_ But I still haven't thought this through.

By the way, I've noticed you didn't write ns_register in your chunk of code. Aren't they mandatory?

set query [ad_conn url]

set request [string range $query [expr [string last / $query] + 1] end]
#rp_form_put item_id $request

ns_log Notice "REQUEST $request"
set item_id [db_string select_item_id {
SELECT item_id FROM ee_venuesx WHERE url = :request
} -default 0]

if {$item_id eq 0 } {
set item_id [db_string select_item_id {
SELECT item_id FROM ee_servicesx WHERE url = :request
} -default 0]
}

if {$item_id ne 0 } {
rp_form_put item_id $item_id
}

set internal_path "/packages/[ad_conn package_key]/www/item-edit"
rp_internal_redirect $internal_path

Collapse
11: Re: SEO friendly urls (response to 10)
Posted by Gustaf Neumann on
a few comments:

- the snipped above was intended for a .vuh handler
- "very dynamic": note that you get bad ratings on frequently changing URLs when other people use make links to your content, and you change the URL pointing to the same content (without a redirect)
- "title" has has well bad internationalization aspects (depending on the language settings, a link might work or not, ... when e.g. a user is logged out or logged in)
- why is there a bug in the code snipped above? as indicated in the comments, your code has to handle the case that a look up returns multiple (or no) results. The code snipped hat the purpose to show you a way out of the scalability problem of your previous approaches.
- why should ns_return be mandatory? mandatory for what?
- don't use manual URL splitting, but the API

Collapse
12: Re: SEO friendly urls (response to 11)
Posted by Iuri Sampaio on
Hi Gustaf, Thanks for the tips.
I've amended the source code in order to use name instead of title. (ie. by the time the item gets inserted.) That way, we don't harm SEO statistics.

One thing, though, is that, in the long term, a user may change the title of the item, then the URL becomes different from the actual title, causing a bad user/public experience
(i.e. the public will browse the item's page, and they will see that the URL is slightly, or very, different from the actual title.

But that's perfectionism, I'd say. So, I created a scheduled script to update the name of the item,once in a while.
That way SEO URL won't change too often, meaning every time a user changes item's title. But only when the script runs.

Plus, I've made a few enhancements to util_text_to_url in order to switch Latin accentuated letters to their actual/root letters, instead of hyphen symbol "-".

Example:
set name [util_text_to_url [string map {á a à a â a ã a ç c é e è e ê e í i ó o õ o ô o ú u "´" "" "'" "" " " - "," -} [string tolower $title]]]

Best wishes,