Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Include HTML file and alter it's links???

Status
Not open for further replies.

theniteowl

Programmer
May 24, 2005
1,975
US
Hi All,
I have a web page template I created and I pass a value to the page to tell it what HTML file to load into the main content section of the page.

I need to be able to analyze and alter the links in the page that I include.

The reason for this is that the templated system will always remain the same and the HTML pages the client creates will load inside that page structure. Links inside that page need to be formatted so that they can pass values to the next page telling it what state the navigation menu is in and the relative folder path for the file to load.

Rather than requiring the clients to specially formulate all of their links so that the menu system is not affected I want to dynamically alter the link when the page loads.
I would analyze the link to see if it is a relative link or a fully justified one (presumably one that is linking to an extrnal web site) so that only links to local pages will be altered for the menu but links to external pages I could make certain open in a new browser window.

Help?

Currently the system works like this.
<A HREF="<?= $linkurl ?>mylink">My Link</a>
The $linkurl contains the path to the template file, values to set the navigation menu to the correct settings for the page and on the end is the name of the file the client wants to load without an extension. It is passed as a URL value and my script determines the path based on the menu selection then appends the .htm on the end.

This means the client has to add the <?= $linkurl ?> in to every HREF and they have to leave off the .htm. My goal was to make this a simpler system for them to maintain and it would be better to not require unusual mods to every page they make.
If I can alter the links dynamically then they can have a normal link in their page and I can handle the rest in script.

All pages are on the same server unless linking to an outside URL.

Thanks.

Stamp out, eliminate and abolish redundancy!
 
without knowing what is in the resultant html files (dynamic/static content) it's difficult to advise which approach to take.

one approach might be
1. to read the content of the html file into a string,
2. parse the string for incidences of anchor tags (using preg perhaps)
3. changing the tags as necessary
4. printing the string to the browser.

 
The HTML files are static.

I have used javascript to parse out HTML and modify form elements so I expect it will not be a great deal different.
The trick is trying to anticipate all the different possibilities that come up in the links they come up with and dealing with subfolders in the path.

Would I be better off using some type of session variable to persist the menu selection information and then only have to deal with redirecting the link to work through my template file rather than loading as a new page?
I generally work with ASP and am learning bits of PHP as I determine what I need to do server-side. So I am not always aware of available methods or what is comparable to what I am used to doing in ASP.

Thanks.

Stamp out, eliminate and abolish redundancy!
 
sorry - i didn't fully read your first post to understand what you wanted. i concentrated on the url rewriting too much.

navigation state is, as you allude, best done through session variables. these work fine for working around (a bit) the problem with lack of state in web apps.

just remember to start the session at the beginning of each page
Code:
session_start()

write variables to the session variable by
Code:
$_SESSION['var_name'] = var_value;

 
Yes, the two issues are kind of intertwined but it does primarily seem to be a URL rewrite type of issue.

I have not had time to think this through thoroughly but it seems using methods of capturing and rewriting the URL might cause conflict with maintaining the menu state.
If for instance I used session values to maintain the navigation menu state rather than as values on the URL and someone bookmarks a page then if they use that bookmark it would not know the settings for the navigation menu to start with.

I could attempt to make the navigation menu more intelligent and have it recognize the link being requested and set state based upon that but then I run into issues if the same page can be linked to more than one location in the site.

I have to think about it a while and see if I come up with other potential issues and how I might approach them. Perhaps it would be more logical to always affix a specific page with a specific menu setting so if that page is linked from another section of the site it will still change the menu settings to the default for the linked page.

Using an .HTAccess file and doing URL rewriting seems like the best solution for dealing with the links in the included static pages. It leaves me with one other issue though.
In the content page I have very small graphical edges to create a rounded border. The outside of the content area is a fixed color then it has the rounded border surrounding the inner content area where the file is included. If the clients page uses a background color other than white or uses an image then the page does not blend. The included page would just be squared off inside a white box with a colored rounded border. It looks a bit blocky. I was thinking I could parse the HTML page looking for a background color or image and alter the background color/image of the content page to use the same so they blend together. If I were parsing for the URL I could just continue on with the background mod also but now I will probably have to do both. :)

Ultimately a page not using the rounded edge graphic would work the smoothest but the design looks so much better and helps transition between the fixed template and static HTML pages the clients provide.



Stamp out, eliminate and abolish redundancy!
 
i still haven't got a feel for how your template engine works. do you have a sample site?

on the bookmark issue you may be able to avoid much of the problem by saving the session state in a cookie on the client machine. that way the next time the user accesses your site you can retrieve the cookie value and thus "remember" the navigation state.

on the template side - a few thoughts;

1. could you control the html at the point that the client uploads it to your templating program?
2. could you use javascript to set the src of the borders depending on the colours inside it? or even change the colour of your page?
 
jpadie, sorry I guess I had looked at this thread once quickly without answering and then forgotten that someone had responded and thought that since it did not show new activity I had no responses. Browsing too many forums I guess. :)

Currently it's like this:
The main index.php page loads, tests the URL for passed values for the Top nav menu, the Secondary nav menu and the Content page name. The values are stored in variables on the page or set to defaults if none found.
It includes a file that sets up the array for the navigation menu. It also acts as the HTML framework for the page, the below include pages just insert inside the body tag.

The page then does three includes:
Page header include
Secondary navigation menu include
Content page include

The page header include sets up a table structure for the top of the HTML page and uses the array to generate the top level nagigation menu.

The Secondary navigation include sets up a table to hold the secondary navigation menu and generates the menu based on current selections.

The content page include takes the passed name of the page to load, sets up the path relative to index.php and sets the sub folder based on the value of the selected nav menu.
It then adds the passed name and appends .htm to the end of it, tests that the file exists and then includes that page.
It finishes by setting up up the bottom of the page.

The final piece is the client's HTML page. I created a template for them to build their pages with using an outer table to set the maximum width of the page so it would not break the template. They just build their HTML within the table of the template.

The URL for any given page is not a direct link to a specific file but a link to the index.php with parameters that would tell it how to setup the nav menu and which page to load. So when clients setup links on their own pages to point to pages within the site they have to insert the variable I built that contains the index.php?T=topnavselection&M=middlenavselection&C=
Then they add in the name of the page they want to load without the .htm extension as it is appending the value to the end of the URL I created.

The system is far from perfect and I will probably have to substantially revamp it as I go but I had NO experience with PHP when I started or with creating a system that works within a template rather than allowing free association with any HTML file directly.

You can view the pages at:
It is my first attempt not only as a templated system and first time use of PHP but the first time I have ever had to come up with the whole design from scratch where there was not already something to work with so it all needs a bit of evolving still. I have also not worked on it in over 8 months and am trying to remember what my logic flow was when I wrote it to figure out how it is working. :)


Stamp out, eliminate and abolish redundancy!
 
so what you are basically trying to achieve is to allow users to maintain their own pages? pretty standard templating stuff and in the way of things you need to set rules.

in a typical user-contributed content scheme the designer enforces a separation between content and design (at least to the level mandated by the overall site/section/page design).

thus i would suggest either enforcing content upload through a textarea or richtext control or, if that cant be done, strip out the offending tag attributes before rendering the page.
 
At the time of design there was no database available from the provider so I planned on using flat files to control things.
The teachers also already have access to the folders the HTML goes to and I do not think I will be able to get that restricted so I am looking to make it as easy as possible for them to submit in the new design. I have been busy building the template system though and have not yet gotten to any CMS portion other than trying to layout a template it would work with readily.

I may be able to do some URL redirection to ensure all links get filtered by my script and thereby relieve the clients from having to use specially formatted links in their documents. I think the biggest thing though would be creating a CM screen to let them build and view their pages and then submit the results giving them the ability to select under which navigation options it belongs visually and the ability to modify the nav options. When I think about it though I see in how many different directions something like that has an impact and it may take me a while to be able to do it right.

In the meantime do you have any sample code for parsing an HTML page?

Rather than forcing them to use a template file for the content page it might be possible to include a complete HTML file within that content page. What issues might I run into if the included page has it's own HTML, Head, Body tags when being inserted into the body of the current page?

There are many ways to approach it I know. I have not found any good examples of this type of templated approach at least not in PHP so I had to figure it out as I went.

And yes, I am looking to allow them to maintain their own pages. I will eventually write CM pages that would allow non-technical people to update content on frequently changing pages like an events calendar, that will not be a big problem. But I need a good framework for teachers who create whole new pages rather than just altering the content of an existing page to work within to keep the site template from breaking. If they put in a direct link from one of their content pages to another page on the site it will not go through the template and will lose the site navigation controls as well as formatting and that is my biggest concern right now.



Stamp out, eliminate and abolish redundancy!
 
i appreciate that you are constrained by history but have you taken a look at smarty as a templating engine? or for a GNU content management system?

some other thoughts:

1. although i've still not got a handle on the kind of links your users might upload i'm assuming that they pretty much just use page names (ie. href=page.html) which, of course, won't exist in your root directory where index is being served from.
2. if i'm right, then why don't you use session variables to hold the state and to prevent a page not found error, use a custom htaccess file in the root directory to point all errors back to index.php. neat and simple with no need for url_rewrites at the web server level.
3. this also means that you don't need to do much heavy parsing of the incoming html.
4. if you still want to protect against some rogue html getting into the template then identify those tags which you don't like (e.g. <html><head><script><body><style>) perhaps and remove their contents or the tags at serve time. you could use regex for this or i posted some code in a thread today that could be adapted fairly easily.
 
Session variables might be problematic if the client saves a link to a sub page because it will not have any data on the URL that would tell the code which page needed to be loaded or how to set the navigation menu when it is created so for those values I may still be better off using the URL to hold the state so bookmarked url's will still be able to load the correct page and not need to delve into cookies.
Using the htaccess file may work, I am completely unfamiliar with it but can research it some. The one issue I can envision causing trouble is if they create a link with a subfolder name /subfolder/file.htm and that folder name happening to exist in the root so that it does not cause an error and tries to load the named file from a subfolder of the root instead of a subfolder of the current pages location. If a file of that name happens to exist in the subfolder off the root then it will load that page and break the template, otherwise it will just give an error and the htaccess file can take care of it.

I will have to give it some thought and read up on the htaccess file.

The reason for parsing out the html, head, body tags would be to allow the teachers to create their HTML pages in whatever program they like and upload them instead of having to use a custom template to insert their code into.
It just removes steps they have to remember and take and keeps us one step further away from causing template problems. It is a nice-to-have feature that I may add later just to make lives easier though.

I have never heard of smarty and have not seen any freely available CMS systems. The school does not have the budget to purchase any software (thanks Mitt Romney) so it would have to be freely available. I also figured the learning curve on a full featured system would be great enough to cause the teachers to put up resistance. To date they have had unlimited freedom to do things the way they want. I will be curtailing that somewhat with my setup but it will still be liberal enough so hopefully they will not rebel.
I am not working for the school system, I am just a parent donating time to help out so I have no real control over what ultimately happens so I build it to provide the best control with the least disruption.
So I try to write apps that do the greatest amount of hand-holding and error correcting possible. :)


Stamp out, eliminate and abolish redundancy!
 
i would look seriously at smarty you'll get to grips with it within an hour or so and i suspect it will do 90-95% of whay you want out of the box.

it's completely free to use under GNU:
 
I will take a look at it thanks.
Hopefully it is not a server-side installation setup cause I do not think the providor would be accomodating. That is one of the other reasons for my home-grown solution, total lack of control over the environment. *Sigh*

Thanks again.


Stamp out, eliminate and abolish redundancy!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top