netrik hacker's manual >========================<
[This file contains a description of the page handling system. See hacking.txt
or hacking.html for an overview of the
manual.]
The page handling system is responsible for the central browser functionality:
Loading pages to be displayed, loading new pages when links are followed etc.
This also includes all page history handling, as all page loads either affect
the page list (history), or depend on it, or both; and all history commands
involve a page load.
The page handling is also a central component, because it invokes all of the
other modules: The file loader
(hacking-load.*) is used to fetch a
new document (file) if necessary, and the layout engine
(hacking-layout.*) to prepare it for
rendering; the pager (hacking-pager.*
is (indirectly) invoked to display the new page; and the link handling
mechanism tells when and what page to load. Moreover, most of the modules are
also coupled to the page handling, because they use the page structure handled
by the page loading mechanism.
So the page handling is really the central component, controlling the program
flow. Thus it can only partially be located in an own file (page.c); part of
the page handling has to be done in main() directly.
load_page()
load_page() ist the main function of load.c, and does most of the work. It is
responsible for loading a page so that it will be shown in the pager next time
display() (see hacking-pager.*) is
called.
The exact actions necessary to achieve that vary on the nature of the load
operation. In the most basic operation mode it requires adding a new entry to
the page list and loading a new document to use. Sometimes however there are
only changes to the page list while the document is reaused, or the the page
history stayes unchanged while some existing entry is reactivated and the
settings reloaded. The case of reloading some document while the page list
isn't altered, is possible as well.
load_page() takes almost the same arguments as layout(): A base URL, which is
usually the URL of the current page, which is used as the base when following a
realtive link or so; a main URL (as string), which can be absolute or relative,
in the latter case to be combined with the base URL to form an absolute target
URL; an optional form item, which tells where to find the form data of a form
to submit while loading a new document; the page width telling how to layout
loaded pages; and an error handle informing whether somer problem occured while
loading the document.
In case a new document needs to be loaded, all of these parameters are simply
passed on to layout(), which is responsible for actually loading a document
from a file or a HTTP server, as well as invoking all necessary layouting
passes thus the document can be actually rendered and displayed in the pager.
This process is described in
hacking-layout.*.
load_page() takes an additional "reference" parameter. This one may refer to
some entry in the page list (normally the current entry), which contains an
already loaded document to reuse. In this case no document load needs to be
performed (i.e. layout() isn't called), as the layouting data of the reference
page is reused. (Most of the other parameters aren't used in this case.) This
feature is used when jumping to an anchor inside a document -- it doesn't
require loading a new document.
First action (even before loading the document) is creating a page descriptor
-- regardless whether a new document is loaded or existing layouting data is
reused.
This however is left out if the page is reloaded from history (indicated by
"url" being NULL), and thus already has a descriptor somewhere in the page
list.
Page List Handling
The page list (history) is a global variable (of type "struct Page_list"),
which basically consists of an array of pointers to individual page
descriptors. It also stores the current number of entries in the list, as well
es the current active entry. (The one that describes the page visible in the
pager.)
The list has one entry for each page in the page history. Normally the last
entry is the visible page, but other pages are also possible after going back
in history.
Each page descriptor is of type "struct Page". This struct contains:
-
The "layout" pointer, which points to a descriptor with the document's
layouting data created by layout().
-
The page URL in the split URL structure "url"
-
The current position of the pager, stored as the index of the first line
visible on the screen ("pager_pos")
-
The link number of the link currently active in the pager ("active_link")
-
The optionl anchor number of an anchor to jump to when starting the pager
("active_link")
-
The "mark" flag indicating that a return mark has been set on this page
The page list is manipulated exclusively by load_page() and its helper
functions. load_page() is also responsible for keeping the list up to date, so
it always contains exactly those entries that shall be used by the history
commands.
Thus, before a new page descriptor is created, the page list needs to be
adjusted.
Normally a new entry is simply appended at the end of the list; however, there
are several other cases.
When loading some new page while not being at the last entry in
the page list (after going back to some older page from the page history), all
entries after the current one have to be discarded.
Moreover, if the current page is an internal page (either a page loaded from
stdin, or an error page), it isn't to be kept in history; in that case, we go
further back in history until we find the last normal page entry.
All entries after the current or the last normal one are then cleread by
calling free_page() in a loop.
Afterwards, the new page descriptor is created.
When reloading a page from history, it may also be necessary to delete internal
pages, if leaving such. The last non-internal page is determined (starting with
the end of the list), and all following are deleted.
add_page()
The add_page() function (in url-history.c) is responsible for actually creating
the new page list entry. The list is a "struct Page_list", and contains the
following information:
-
"num" is the number of entries currently stored in the list
-
"pos" is the entry number of the entry corresponding to the currently visible
page
-
The history entries themselfs are stored in "page", which is an array of
pointers to the page descriptors of all pages.
The new entry is added at the position indicated by "page_list.pos".
First the array is resized to the new history size -- the history now will end
with the new entry generated. Then a new page descriptor is created and the
pointer to it is stored at the proper position (the last list entry).
After creating the descriptor, some default values are set. (Pager position etc.)
Loading
Now having the page descriptor, we need to get the layouting data some way so
the page can be displayed in the pager.
Again, the standard case is loading a new document using
layout(). The pointer to the
layouting data returned by that function is simply stored in the page
descriptor. The URL is extracted from the layouting data and stored in the
explicit "url" pointer of the page descriptor, which is necessary in case the
layout data is descarded (when loading another document), but the page ist kept
in history.
Local Links
As mentioned before, it's also possible instead of loadin a new document, to
reuse existing layouting data of another one by passing a "reference" page. The
primary application of that is when following a link that points to some anchor
in the same document -- that doesn't require reloading the whole document, but
just jumping to the anchor. Of course, it is also used when returning to the
previous page after following such a local link, or going forward again.
The (pointer to) the layout data descriptor is simply copied from the page list
entry indicated by the given "reference" parameter.
As layout() and thus also
init_load() (see
hacking-load.*) isn't used in this case,
merge_urls() has to be
called directly. If URL merging fails here, load_page() returns immediately; no
page descriptor is created and nothing else is changed.
Also, if some anchor is active in the reference page,
highlight_link() needs to
be used to remove the highlighting, to get a "clean" item tree.
Anchors
If the URL contains a fragment identifier, the corresponding anchor is
retrieved from the anchor list, and stored in "page->active_anchor"; this is
described under Anchors
in hacking-links.*. The pager then jumps to the anchor position and highligts
it upon startup.
Handling in main()
Although load_page() does a great part of the work, some things have to be
taken care of in main(); particularily determining in which manner the new page
is to be loaded (reuse current document or load new one), and clearing the
layout data of an old document before loading a new one. Maybe that could be
done in load_page() too; however, it's not worth considering that now, as it
will probably need to be handled completely different with the planned new
basic program structure. (Using an event queue and a main dispatcher.)
Another thing that needs to be handled in main() is initiating a page load in
reaction to commands given by the user inside the pager (or on the command
promt), which often can't be clearly seperated from the load operation itself.
If the user activates the command prompt (by typing ':' inside the pager) and
issues the ":e" or ":E" command, the URL is extracted from the command, and
load_page() is called to load the desired new page. The URL of the current page
is used as base for a relative URL with ":e". With ":E" (and also for ":e", if
the current page is internal), no base is used; the URL is always interpreted
absolutely.
These commands never pass a reference page, i.e. always involve loading a new
document. (Even though it is possible to jump to a local anchor using ":e
#anchor".)
If a link/form control was activated by pressing <return> on a selected link
inside the pager, the action depends on the link or from element type. For
normal links load_page() is used with the current URL as base, just as with
":e". The link URL is extracted from the text item containing the link with
get_link(), by help of the
"link_list" structure. This
process is described under
Following Links in
hacking-links.*.
The only difference to the ":e" command (except for the way of getting the
target URL) is that the current page is passed as "reference" to load_page() if
the URL starts with '#' (i.e. points to an anchor in the same document), so
that the document isn't reloaded in that case, but only the anchor is
activated.
Form submit buttons are quite similar to normal links. First,
get_form_item() (also in
hacking-links.*) is used to retrieve the item (from the structure tree) that
describes to the form in which the button resides. This is used to get the
form's submit address ("action") first. Having this, load_page() is used to do
the submit; the form item is passed as the "form" argument. (And passed on to
init_load() there; see
hacking-load.*.) This is both to tell init_load() that a form is to be
submitted, and where to find the form data. init_load() (and its sub-functions)
then take care of extracting the form data (using
url_encode() or
mime_encode() from
forms.c, also described in hacking-links.*) and submitting it to the server.
The resulting response page is loaded just like any other document.
Other form controls do not issue a load operation, but only adjust the form
value appropriately. (This is described under
Manipulating in
hacking-links.*.)
The 'u', 'U' and 'c' pager commands also aren't really load operations, but
they are described here because they involve similar actions as the
preparations for a page load.
If 'u' was typed, the link URL is retrieved the same way like when following a
link, and then just printed to the screen.
'U' is similar. Instead of printing the (relative) link URL directly, it merges
it with the current page URL, thus getting the same absolute target URL that
would be used if the link was actually followed.
'c' simply prints the "full_url" component of the current page URL.
If a history command was given, load_page() is called with the (split) URL
taken from the requested "page_list" entry. We know which entry to take by
"page_list.pos", which is set the desired new value before returning from the
pager.
If the history entry refers to the same HTML document as the one displayed up
to now, the current page descriptor is passed as "reference". To determine
whether it is the same document, we need to check if all entries between the
old and the new one (regardles whether the new one is before or after the old
one in history) have "local" URLs, i.e. if the newer of the two entries was
created only by following links to local anchors from the older one.
|