Scott writes:-
> The problem is that in some cases (a growing number) the "content" of
> the page is generated by the code. If you don't execute the code, you
> don't have any words to index. This is a significant obstacle to
> providing effective indexing and access to the information on the Web.
Yes, I suppose I should have realised this.
So to maintain good indexing you shouldn't generate "content" from code.
One has to ask why this is done anyway. Perhaps the very reason is because
HTML has not itself got the proper tools in place.
> The problem has existed from the beginning (pages that are actually
> front ends to database search engines typically offer no clue to
> indexers of what the content of their database may cover), but the
> growth in scripting languages and applets is sure to exacerbate the
> problem.
Yes, there is no way the indexer could easily sort out the vast number of
different database formats, applets etc. So what is the answer here?
The whole system is a bloody mess, in need of a severe amount of pruning
and a major rethink.
To recap then - HTML is declarative on purpose i.e. to avoid indexing problems.
But there still are indexing problems even so! - as Scott points out. So then
there is no reason HTML should not progress to turing completeness.
In my opinion the answer to it all is to bring everything under one hat so you
don't have the problem of dissimilar database format, applets etc. Now the Web
IS a database already - a database of pages - If we make the size of a page small
enough until it holds say only one item e.g. a Name or an Address then we have the
basic element on which to build a database of the conventional type. This may
sound radical but I think the possibility should be explored.
A simple database of records with several fields could be modelled if you link
the small page elements correctly and HTML is good at links. Access and editing
of the database could also be handled by HTML if it was turing complete.
Also, it would be fully indexable!
I honestly think it may be just sitting there in front of our noses.
Scott writes:-
> The problem is that in some cases (a growing number) the "content" of
> the page is generated by the code. If you don't execute the code, you
> don't have any words to index. This is a significant obstacle to
> providing effective indexing and access to the information on the Web.
Len replies:-
>>This is true, of course. Any information that is bound
>>at run time requires an application be "run". But
>>this is also an issue close to the heart of what I
>>suggest is going on: people need to define
>>executable applications, and some want to do it
>>in the context of a document-centric system.
Surely, HTML is being "executed" (or interpreted) anyway during normal
browser rendering but in a simple way straight down the page. Just because
some extra bit of embedded code or whatever also gets executed is not
particularly different. The difference is just surely to do with the availability
of variables and logic flow commands.
>>That this violates or strains the HTML application
>>language is inevitable, but the answer is not
>>to relentlessly extend HTML, but to define
>>another Web language for that purpose.
HTML is already relentlessly extended - just a few more shouldn't make much
difference. I don't think you need many extensions to get turing completeness.
As I understand Turing machines (HTML analogy), you would mainly need the
ability to automatically move data between pages, given that the HTML page
is the building block (analogy with turing tape placeholder) of the system.
Also, you would need a simple bit of data processing in each page (analogy
with turing program).
You could define a new Web language with logic flow etc, but why complicate
by splitting the logic flow commands from the declarative? For example even
BASIC includes GOSUB's and PRINT statements (but has no hyperlinking,
therefore is rubbish).
>>Applets are a different issue from JavaScript.
>>An applet is a parameterized call to an
>>external handler. The indexing engine
>>shouldn't look in that box except to note
>>that a call exists and perhaps what it
>>calls if that is useful indexable information.
You wouldn't need applets if HTML was improved.
>>Separating the control classes, the content
>>classes, and the script classes into
>>separate DTDs is one approach. Consider
>>that identifying variants by extending
>>the formal public identifier equates to
>>this.
I really think you are over complicating.
|