Forum OpenACS Development: Comments on the new postal-address module (and kin)
There are a few issues I would like to bring up in considering how to implement an address schema.
My main reason for concern/alarm is finding yet another place in the system where addresses will be stored (e-commerce, users table, address-book come to mind). I also saw a thread recently from someone saying they are going to implement a contact system and ignore any other tables in the schema (sigh).
I'll focus on the US and assume that at least some of the issues will track internationally.
An address is a fairly permanent feature. Who or what actually resides there can change. People and businesses move. Even from an HR perspective, if I want to give my history of where I have lived, I think its best to map person -> from_date/to_date -> address. I can even send my addresses to a National Change of Address provider who will let me know if the person has moved http://www.usps.com/ncsc/addressservices/moveupdate/changeaddress.htm
If you are actually going to send things to those addresses, there are ways of saving lots of money if you store the address accurately (that is, in compliance with the local authority). The USPS has a very definite concept that there is only ONE correct delivery address for a given location. http://www.usps.com/ncsc/addressservices/addressqualityservices/addresscorrection.htm
An address exists at a point in space. I can get latitude/longitude ICBM coordinates for any location, why not store it with the addresses? Postgres even has a point datatype.
Marketers like to record additional data about an address - DMA, MSA, Arbitron Market (I worked for a radio conglomerate once upon a time), etc to help support demographics. If I buy lists for marketing, I'll get all that info - why not store it in the 'right' place - those are all based on the location.
For purposes of e-commerce, a map from e-commerce user to an address should include a field for 'delivery instructions' such as 'leave in garage'. This works nicely for anything that isn't brought to the door by FedEx/UPS/USPS (such as a florist). This doesn't have anything to do with your tables, but I thought I would mention it.
The main user registration should use your postal-address module and re-instate the Demographics search at /acs-admin/users/ - I've had clients ask to /spam everyone in California for example. When people do advertising, it would be valuable to see who registered in a certain DMA during the period surrounding the radio spot.
I'll quit at six - that's my lucky number. (But I'll start a followup post on telephone numbers ;)
Telecom-number and postal-address are services. They are not packages. They are analogous to the reference-* packages. People are free to use the underlying data model as they see fit.
Everything you describe can be accomodated with the HR-XML framework. The reason other packages use seperate data is historical; The original address book was almost useless beyond a personal list and was most definitly hardly internationalized. Plus, many package designers just added their own tables without regard for what was already in the toolkit (states and countries come to mind).
My contacts-lite package is just that. It was (like much of the code here) written for a client who was kind enough to let me release it to the OACS community. It does however show how to use the underlying services (although certainly not everything has been completed, I have other obligations and they don't involve these packages anymore except for my personal use).
As anyone who has followed my posts on reference data and contacts in general knows, I am very concerned with internationalized contact information. Modelling these packages after HR-XML fulfills this need.
Recipent is optional because many business processes store the intended recipient name separately from postal address. If your schema handles the recipient information elsewhere, avoid redundancy by not populating Recipient in PostalAddress elements. Care Of data should be part of the DeliveryAddress, not part of Recipient. When using care of information, it should be held in the AddressLine element to avoid problems when building the address.etc. Look at page 15 and you'll see what I'm talking about. The Postal Address Spec is a core data spec in the HR-XML suite - if you look at items like Credit Report, Background Check, Contact Method, etc - they all have references to Postal Address - it seems redundant and potentially dangerous to store the Recipient with the Address in all cases, which is indeed how the postal-address package is designed.
I think much better is to refactor the people out from the address table entirely and use a join table to point people to addresses in all instances of the system. This does not prevent using HR-XML for data interchange, which is I assume your primary purpose.
I totally agree with your comment that this is a service like a reference package. I would go even further and get into a discussion of datatypes - I think that "address" is a specific datatype - in the sense that there is a specific 'range' of values (in the functions sense of the word) that are legal - there are well-formed addresses that are valid and well-formed addresses that are not valid.
The real "function" comes into play when you pair the people with the addresses at a given point in time - given the domain of people,date what is the address? I personally like this kind of design. Ideally, with no speed or memory constraints, you design it like this: table person ( name varchar(whatever), address varchar(whatever) Check address in (1 main st charlotte nc 11111,2 main st charlotte nc 11111, 3 main st charlotte nc 11111) etc :) or the 'references addresses' equivalent of it, in which case you just use an address_id and point to the address table.
Or we can just use usca_p cookies and call it a day!
I'm that person - and no i did not say i would not use postal-address just that i have worked without it until now and will integrate it later - when postal address is "stable". Things like the ones you are mentioning are the types of issues that kept me from designing it with postal address built in - but i need to street it will be EASY to add it later... once things like this are sorted out.
Your concern about the necessity of persons related to postal-addresses is one of mine as well that i would have brought up later. I want a "contact" to also be assigned an address and they won't necessarily be a person in the system... so i agree with Alfred on this one. Having a seperate mapping table, where you map and address_id to a object_id would take care of this - because presumeably all services that would use postal-address would have objects that the address can be linked to. I also don't think it is necessary for a postal-address to even be an acs_object. We won't be needing to do permissions on individual addresses and there is no need (in my view) to clutter up those other systems. A simple unique address id created by a sequence should be enough.
Now, regarding alfreds points
- Fantasitic idea - i modeled mine after "deleted on this date" but yours is better.
- I agree with Jon that HR-XML takes care of htis problem - its a matter of correctly inputting data into the HR-XML standard - and this requires the appropriate fields to be filled out for US costomers - but it also allows people in other countries to satisfy their address requirements. I also based my contacts addresses on HR-XML before i knew postal-address existed as a package and independenly came up to the same conclusion as Jon on this one so i'll say i think its the right one.
- I think it makes more sense to store longitude/latitude with zip codes and reference it that way. or, do you know of reference data that says house number "1234" on street "X Avenue" is at thisxthat degrees? This is primarily a matter of having longitude and latitude data for every house (and we can't expect users to enter it correctly themselves).
- If postal-address is a service then this suggestion should be handled by some other package that references the postal-address service.
- Same point as above
- With the contacts system i am creating this will be possible - though i don't think it should be mandatory throughout openacs, but be something that can be activated via a parameter. Many online services don't care about a persons address.
I'm not sure what I ought to do about the situation at this point! It sounds like you and Jon are both willing to coordinate your efforts - I am willing to work on whatever needs working on also. I'm open to suggestions - but maybe if I built a ref-addresses? Jon's HR/XML and your contact packages would refer to it (eventually)?
And then on the admin side we link up demographics to user tables?
To your points above:
1. Thanks, and there are a few subtleties that still need to be incorporated to do date guessing and granularity (you might only know that someone moved in a certain year)... Also probably best to mark at least one of the join records between person (or business) and address as current. That will save expensive queries in most cases, because you will primarily deal with current addresses.
2. I am absolutely NOT refuting the structure of Jon's design for addresses based on HR-XML (or apparently yours) - the only thing I was urging was to decouple people and addresses. The original ACS had all that stuff in the user table and then they created address-book, ecommerce, etc, etc where addresses kept popping up. That's the only situation I'm trying to prevent here - let's build ONE place in the system where we deal with postal addresses - hopefully someone from China will build an address verification routine for Chinese addresses, etc etc ...
3. Longitude/Latitude can be purchased down to the address level. A company that has a true use for geocoding will have already done this. (I do this for several sites I work on). Someone ambitious could probably get the TIGER dataset and create it for OACS.
4. If you're saying that the "marketing" data should be stored separately, I don't really agree, unless you push back and argue that it's not consistent internationally.
Several of those fields are directly copied from the census bureau classification.
It's not really computable in any easy sense - better to just store it with the data.
5. You're saying that e-commerce should reference the postal-address/ref-address package? I agree.
6. Agree. But many do too :) Those are the ones I'm trying to help..
So once again - what next?
Unless someone can come up with a very compelling reason not to have addresses and telephone numbers acs_objects they are going to stay that way. Comparisons to the way "AD" did it aren't reasons.
The current model is designed to make all addresses and phone numbers objects for many reasons, including site wide search and auditability.
As Alfred suggested, I am planning on tcl procs to do address verification. This is not trivial though and the system now works the way it is. Also, none of this has any effect on the data model or HR-XML.
Everything you guys are talking about should be built on top of the telecom-number and postal-address services. They are only repositories of data, what you do with them is another package.
Marketing data has to be stored seperatly, that is another package.
Longitude/Latitude is already incorporated into the ref-zipcodes service. Please note that getting this data is country specific with many if not most countries not having it. It really has no place embedded in the core data model.
There will be some changes to persons:
Also, person__new et al need to be replaced. I already did this for PG and Oracle will be just as simple.
alter table persons add column formatted_name varchar(200); alter table persons add column preferred_given_name varchar(100); alter table persons add column middle_name varchar(100); -- need to drop the not null constraint, I believe PG 7.3 allows this.
If you do 'verification' in tcl procs the best you are going to do is confirm that an address 'looks' well-formed.
I make a strong distinction between well-formed and valid data. Well formed passes some initial regexp rules that you can do either client side or server side with a round trip. It's not VALID unless its actually accurate. It's just like the email registration process - OACS makes sure there is an @ sign and thats about it. If I want to ensure that I have VALID emails, I have the system send the email and make the new user click through from the email to fully register.
I am trying to get the community to incorporate at least the fundamentals of correct address processing - the USPS and Bundespost will both give good discounts for properly formatted and address verified mail.
If I can save .03 per mailing per piece, I save a good amount of money over time. If the company sends even a moderate amount of mail, its worth doing properly.
The added bonus to us as developers is we can charge our customers to validate the data through 3rd party address quality providers.
Longitude/Latitude at the ZIP code level? That is based on the centroid and can be many miles away from the address. I can get it to the street address level - its used for proximity, routing, etc. It's all from TIGER anyway. If a particular user doesn't need it - they can leave it blank, but if I send out my list to get geocoded, I'd like to have a consistent place to put it.
And when you say marketing data - remember that once you wipe off the polish - that is also just TIGER data from the census bureau. Classifying an address as falling into a specific DMA is a US government distinction, not some arbitrary marketing decision. That doesn't work across all lands, but I'm willing to bet that a lot of other countries over time will do it similarly.
I would also argue that you might want to add timezone to the addresses - it's more of a warehouse mentality - put the facts in with the record itself. If you want to call someone who is registered in your system (customer support for example), there are only certain hours where you're allowed to do so.
Whee! Another can of worms.
That's what list brokers do ...
German address lookups:
I used to live at Straße - Stadtbergerstrasse
Nummer - 80
Ort - Augsburg
If you run that through, you'll get a completed German address.
If you want to buy data - including geocoding down to the street address LAT/Long you can go here: http://www.deutschepost.de/dpag?lang=de_DE&xmlFile=1589
At the US Postal service:
They provide an address quality API ... XML based, you need to register yourself as a developer to get the specs, and then register a site like you would with a payment processor if you want to use it in production.
www.royalmail.com has the same for UK .. etc, etc
I'd say the countries that have 'been ahead' in other technology are still the early adopters for this type of data/service - but its where things are headed.
But Jon, as a repository of data, postal-address should not rely on persons - it should rely on itself. I really don't like the constraint on persons and in fact it would make it impossible for me to use it with my contacts app without creating bogus "persons" in all i'm suggesting is changing this part of the postal-adderss table creation script
party_id integer constraint postal_owner_id_fk references parties (party_id),I would like it to become
object_id integer constraint postal_owner_id_fk references acs_objects (object_id) constraint postal_owner_id_nn not null,since this field can be null in its current state - which i hadn't realize it was - i can use it. in fact i am going to use it and implement it today so i don't start doing things that aren't standardize. However, if that field is not entered then there could be no way of "tracking down" addresses "belong" i.e. you could track the address object 1234 by the use of a map table with contact object 4321. But if lots of apps end up depending on postal-address (which they hopefully will), then i think it would be good to be able to search the postal-address table for all addresses belonging to the objects that particular application owns...
Alfred - john's point that ref-addresses isn't a good idea is a very valid one - that's not the way it would work with openacs standards. we could have a ref-addresses package that stores all this "tiger" data you are talking about but it wouldn't make sense to put unique addresses in there. Do you know of any gnu or gpled reference data that would let us calculate longitude and latitude? As Alfred suggested, I am planning on tcl procs to do address verification. This is not trivial though and the system now works the way it is. Also, none of this has any effect on the data model or HR-XML. Everything you guys are talking about should be built on top of the telecom-number and postal-address services. They are only repositories of data, what you do with them is another package. Marketing data has to be stored seperatly, that is another package. Longitude/Latitude is already incorporated into the ref-zipcodes service. Please note that getting this data is country specific with many if not most countries not having it. It really has no place embedded in the core data model. There will be some changes to persons: alter table persons add column formatted_name varchar(200); alter table persons add column preferred_given_name varchar(100); alter table persons add column middle_name varchar(100); -- need to drop the not null constraint, I believe PG 7.3 allows this. Also, person__new et al need to be replaced. I already did this for PG and Oracle will be just as simple.
And parties is going to change in 5.0 anyway to normalize the tables correctly. (i.e. a party doesn't need an email). Part of this is to work with other methods of registration etc. See other threads for this discussion.
By entering them as parties, you're using the standard OpenACS way of entering organizations like this, and other packages can take advantage of the organizations you've entered.
I haven't looked thoroughly into the parties data model, but I believe this is the best way to do things.
If you enter an address, you will get the following - the ones with my * in front - I think are useful in the users screen to profile your user base:
Carrier Route - DPC
ZIP Code Type
*County (FIPS Code)
*Time Zone (Local time)
*Consolidated Metro Area (CMSA)
*Primary Metro Area (PMSA)
Representative, Party & District
# (useful for a political site maybe)
Delivery Post Office
CMSA is VERY broad - I live half way up Connecticut and am in the Bridgeport PMSA and the NEW YORK CMSA.
Address and carrier route are for snail mail support - and there is more stuff you can buy.
My point being - I have this table of data in OACS - stuff users type in and provide. At some point the guy in marketing says - how many of our users are on the east coast? I say - well if you let me send our data out for geocoding I can let you know. From that point on you send incremental files once a month or whatever, but you always can determine how many users are in what regions. Time Zone as I said, is useful if you decide to call the person.
Latitude and Longitude are good if you are going to use any mapping applications.
The point being - I am hoping to have the "address" information *as it is currently viewed by data companies* stored in an isolated area - that way its a lot easier to do what companies actually do - send it out to get checked and coded, without worrying about the relation integrity issues of all the other tables it touches.
I think that what Jon and Matthew are proposing will support that - I just want to make you aware that this additional information should NOT just be discounted as "marketing" stuff - it is part of the physical reality of the address in question, and it makes it easier for marketing to accurately identify the user population of their site - where do they live? What city? What time zone? How many are within 50/500/5000 miles of me or any of my locations?
Nobody that I know of provides this info en masse for free, GPL or whatever, but even the non-profits I work with have enough budget to save themselves money on mailings to pay for the geocoding and address verification.
It's current best practice, it pays for itself, and I think its a mistake to leave it out of the primary location we intend to store addresses :)
It really isn't a part of the addresses in the world.
See if you can verify this legitimate address:
Calle 48 % 29b y 29f Num. 29b03 Buena Vista Playa La Habana CubaThis is a legitimate address and will probably never have data quality needs.
There are many ways of guaranteeing the data purity, being in the core address data model is not really a good solution (in fact I would argue it is wrong). So why don't you just make a package that depends on postal-address and use the data from there. That is easy and what a service is meant to do. You can even make tcl procs to verify whatever you need.
Challenging me to find the information about Cuba is meaningless. As far as I know, it's a federal crime to conduct business there - and Amnesty International isn't permitted to visit the country - so let's just put that red herring aside, shall we? (http://web.amnesty.org/report2003/cub-summary-eng)
I have a feeling that my clients would be satisfied if I could tell them how many subscribers were from Cuba. Does finding one example like that actually discredit my point about all the other locations? Are you really worried about the HR-XML implications for Cuba?
I live in Connecticut - most of my users and clients are in the United States.
You live in Las Vegas - most of your users and clients are probably in the United States.
Matthew's address is at Berkeley - most of his users and clients are probably in the United States.
Those of my clients who conduct business on the Internet internationally are primarily concerned with Europe and Japan, Mexico and Brazil ..
As far as I have been able to tell, all of the mentioned countries are following this trend toward address quality - I already provided examples in US/UK/Germany - there is a page on the japanpost website but I'm not THAT interested .. If I run the address I showed above in Germany - that puts the site in Bayern as the Ort - same as the PMSA. The same kind of data is being provided all over the place.
My only point here has been and continues to be - an address is at a physical point in space. Those physical points in space have specific characteristics that rarely change - what major city are they near? What is their exact latitude and longitude? What time zone are they in?
I see them as all being intrinsically tied together - the only thing that DOES change is who is at the address. The melissadata example I provided doesn't say anything about who is there - it just gives you the 'fact table' about that location. The one thing that changes in that example is the political data which I agree is not relevant.
Alfred, your points are well taken and you want good things. BUT, Jon is right - this should be a different package - one that does verification and spacial constructs, etc. I agree that the vast majority of openacs users are going to be operating from the US, Western Europe (or as Donald Rumsfeld beligerantly calls it "old europe"), and the probably other wealthy countries where internet use is widespread. I myself will need to store addresses primarily from the US and Canada, but we have a number of international students from all over - many of which are from these "random anomolies" like Cuba that you talk about (i.e. we have students from Etheopia, Uganda, Laos, etc.). I don't think my scenerio is going to be incredibly uncommon. So we need a place to store all addresses. Postal-address does this.
Alfred the things you want are GREAT, i want many of them myself, but this should be a different pacakge - one that relies on the storage postal-address provides but does its own thing. It would be great to have somebody write address validation procs for a number of countries - this would be fantastic, i hope you are able to do this, and if not i'll need to do some basic stuff (like making sure americans spell their state correctly and use valid zip codes - that are in fact located in the state they say it is). I encourage you to build a package called postal-address-validator or whatever you want to call it.
You are free to do as you wish, but the point I was making with Cuba was the fact that this is a community system. Check out salsablanca.com and you will see that I have a Cuban record company and a Cuban group on tour right now. So, no my point isn't useless and as far as I am concerned, I really don't give the slightest bit of a damn about Greenpeace. They have zero relevance in my life and yes it is legal to do business with Cuba. In fact the only 2 countries that have an embargo against Cuba are the US and Israel. But I am not in for a political debate.
I have no problems with your address verification stuff, in fact I welcomed it. It just needs to be in another package.
My two primary issues were
#1 - decouple addresses from people.
#2 - store appropriate information with the addresses - lat/lon, DMA, timezone, etc.
I appreciate your adventures in Cuba - I am working with a school in Ecuador to put up a dotLrn site. I think all the stuff we're talking about here is largely irrelevant to either of those cases. The corporate guys are the ones who like the marketroid stuff - this is a great sales tool in that market and it provides an additional revenue stream for the developer/hoster if they periodically ship the addresses out to be cleaned up.
I clearly got lost in the distinction between services and packages (is a service something that uses a service contract? or just something with no ADP pages?) Parties - the only references I found were from AD people in the party::permission_p -
"Wrapper for ad_permission to allow us to bypass having to specify the read privilege " sounds like a way to skirt the permission system?
and you said they are changing in 5.0? I also am a bit vague on the _rels mentioned in the thread here ..
I guess there are enough internals here that I'm not familiar with and can't find documentation on that I'll just shut up and see what comes out. It's likely that I'll just add the fields I want on your table and keep a private version.
Thanks for the lively dialog :)
A service is meant to be used by other packages and has nothing to do with service contracts. They in general don't have adp pages, but that is not the real distinction. The adp pages would be for admin purposes. Also, services should only be mounted once (usually)
I also think you are confusing my proposal for a validation package. Your package can use a tcl proc to consolidate/validate do whatever you want to the data. If you want to send it out to xxx-data service, no problem, ecommerce does a similar thing with payment gateways. There used to be one payment method. It has since been rewritten to use service contract to extend the ecommerce package without touching the core data model.
A similar approach should be used in the address-validation package. I am sure that every list cleaning/validation service has different requirements so you write a gateway and then add specifics for each company. By trying to put the US centric address information in the core of postal-address, you are locking everyone into your preferred company.If there was a standard for data cleaning/validation that would be a different matter, but as far as I know that doesn't exist.
The changes in 5.0 (which certainly are only being talked about) are changes to parties to eliminate persons information at the party level.
A party is an entity that relates to an object. That can be a person, group, organization or even another object. A person is a specific type of party and once a person becomes registered they become a user. But, all of the types inherit from parties.
Addresses don't have to be attached to a person, they are only attached to a party. You don't even need that information to send an address for validation. Also, you can create a history of that address by simply creating a package that tracks address_id and party_id with dates or whatever you want, but this isn't part of postal-addresses.
Thanks for the great pointers and explanations everyone!
in the bugtracker - requesting incorporating a GPL-ed copy of the TIGER street address to lat/lon data :)
But, I believe most of it is already in ref-zipcodes.