To: "C. Len Bullard"
Subject: Re: HTML is declarative on purpose [was: Web neurons ]
Cc: connolly@w3.org, hyper-theory@math.byu.edu,
preece@predator.urbana.mcd.mot.com, www-html@w3.org
Scott writes:-
> The problem is that in some cases (a growing number) the "content" of
> the page is generated by the code. If you don't execute the code, you
> don't have any words to index. This is a significant obstacle to
> providing effective indexing and access to the information on the Web.
Yes, I suppose I should have realised this.
So to maintain good indexing you shouldn't generate "content" from code.
One has to ask why this is done anyway. Perhaps the very reason is because
HTML has not itself got the proper tools in place.
> The problem has existed from the beginning (pages that are actually
> front ends to database search engines typically offer no clue to
> indexers of what the content of their database may cover), but the
> growth in scripting languages and applets is sure to exacerbate the
> problem.
Yes, there is no way the indexer could easily sort out the vast number
of different database formats, applets etc. So what is the answer here?
The whole system is a bloody mess, in need of a severe amount of pruning
and a major rethink.
To recap then - HTML is declarative on purpose i.e. to avoid indexing
problems. But there still are indexing problems even so! - as Scott
points out. So then there is no reason HTML should not progress to
turing completeness.
In my opinion the answer to it all is to bring everything under one
hat so you don't have the problem of dissimilar database format,
applets etc. Now the Web IS a database already - a database of pages -
If we make the size of a page small enough until it holds say only one
item e.g. a Name or an Address then we have the basic element on which
to build a database of the conventional type. This may sound radical
but I think the possibility should be explored.
A simple database of records with several fields could be modelled
if you link the small page elements correctly and HTML is good at
links. Access and editing of the database could also be handled by
HTML if it was turing complete.
Also, it would be fully indexable!
I honestly think it may be just sitting there in front of our noses.
Scott writes:-
> The problem is that in some cases (a growing number) the "content" of
> the page is generated by the code. If you don't execute the code, you
> don't have any words to index. This is a significant obstacle to
> providing effective indexing and access to the information on the Web.
Len replies:-
>>This is true, of course. Any information that is bound
>>at run time requires an application be "run". But
>>this is also an issue close to the heart of what I
>>suggest is going on: people need to define
>>executable applications, and some want to do it
>>in the context of a document-centric system.
Surely, HTML is being "executed" (or interpreted) anyway during normal
browser rendering but in a simple way straight down the page. Just
because some extra bit of embedded code or whatever also gets executed
is not particularly different. The difference is just surely to do with
the availability of variables and logic flow commands.
>>That this violates or strains the HTML application
>>language is inevitable, but the answer is not
>>to relentlessly extend HTML, but to define
>>another Web language for that purpose.
HTML is already relentlessly extended - just a few more shouldn't
make much difference. I don't think you need many extensions to get
turing completeness. As I understand Turing machines (HTML analogy),
you would mainly need the ability to automatically move data between
pages, given that the HTML page is the building block (analogy with
turing tape placeholder) of the system. Also, you would need a simple
bit of data processing in each page (analogy with turing program).
You could define a new Web language with logic flow etc, but why
complicate by splitting the logic flow commands from the declarative?
For example even BASIC includes GOSUB's and PRINT statements (but has
no hyperlinking, therefore is rubbish).
>>Applets are a different issue from JavaScript.
>>An applet is a parameterized call to an
>>external handler. The indexing engine
>>shouldn't look in that box except to note
>>that a call exists and perhaps what it
>>calls if that is useful indexable information.
You wouldn't need applets if HTML was improved.
>>Separating the control classes, the content
>>classes, and the script classes into
>>separate DTDs is one approach. Consider
>>that identifying variants by extending
>>the formal public identifier equates to
>>this.
I really think you are over complicating.
|