Hi Iuri,
Hmm.. each case is somewhat unique. Here is an overview of what we did with ecds procs.
ecds-procs.tcl contains general procedures to help with much of that.
An example of how we imported products from a partner website is at:
ecommerce/www/admin/products/upload-vendor-imports
and
ecommerce/www/admin/products/vendor-imports-add-update
In either case, an admin feeds a bunch of product references to the page. The code (using ecds_import_product_from_vendor_site) converts each product reference to a url. ecds_get_url is called to collect the html page from a partner website.
Each partner vendor requires a different set of parsing routines, because (at least for us) no two vendors used the same standards. The reality is that most didn't use any standard. Custom procedures for each case were required. An internal abbreviation for each vendor was used as a unique reference ecds_vendors.abbrev. Each vendor had a minimum of the abbrev and title fields used in the table ecds_vendors.
The html content would be parsed using these and other procs:
ecds_abbreviate
ecds_convert_html_list_to_tcl_list
ecds_convert_html_table_to_list
ecds_get_category_id_from_title
ecds_get_contents_from_tag
ecds_get_contents_from_tags_list
ecds_email_on_purchase_list
ecds_get_subcategory_id_from_title
ecds_get_subsubcategory_id_from_title
ecds_is_natural_number
ecds_keyword_search_update
ecds_remove_attributes_from_html
ecds_remove_from_list
ecds_remove_html
ecds_remove_tag_contents
ecds_sku_from_brand
ecds_webify
Much of the parsing requires different code for each vendor. So, some (many) procs reference unique procs for each vendor. The ecds_vendors.abbrev would be used in the proc name to define the proc uniquely. The following page has an example, where the ecds_vendors.abbrev = 'ex'.
ecommerce/tcl/ecds-ex-procs.tcl
Once product information is generated, the ecommerce product data would be updated using:
ecds_update_ec_products_product
ecds_add_product_to_ec_products
ecds_update_ec_category_map
To get an image for the product, use:
ecds_get_image_from_url
Then import the image to the product directory:
ecds_import_image_to_ecommerce
You do not have to worry about retrieving a page from an external website multiple times if more than one product is represented on one page. ecds_get_url lets you set the local cache refresh period to a relative time compatible with tcl's clock scan.
This is a brute force paradigm designed to handle the most difficult cases, where everything is different.
If you are not importing product data to ecommerce and just grabbing images for related presentation, then the task should be straight forward. However, this process cache's a local copy on the hard disk in order to not clobber partner websites. The time delays make this process incompatible with instant ajax style requirements.
Also, see the note at the top of ecommerce/tcl/ecds-procs.tcl. A few custom fields need to be defined via the ecommerce admin's "Add a custom field" page.
cheers,
Torben