eyesrglazed
Programmer
This program is supposed to remove the HTML tags from a downloaded HTML file. Here is the code :
In the "no tags" section, it reads a character from the file and checks if it is a '<'. If so, it continues reading until it finds a '>'. Then, it reads the next character and writes that one. I know this method of tag removing won't work, but I was wondering what is wrong with it.
Help would be appreciated.
Code:
#include <iostream>
#include <fstream>
#include <string>
#include <windows.h>
#include <wininet.h>
#pragma comment(lib, "wininet.lib")
using namespace std;
int main()
{
ifstream inFile;
ofstream outFile;
ifstream inFiletemp;
ofstream outFiletemp;
char fileName[100];
char fileName2[100];
char temp;
inFile.open("sites.txt");
inFile >> fileName >> fileName2;
//***Write HTML File****************************************************************************
outFile.open("temphtml.txt");
HINTERNET Initialize,Connection,File;
DWORD dwBytes;
char ch;
Initialize = InternetOpen("HTTPGET", INTERNET_OPEN_TYPE_DIRECT, NULL, NULL, 0);
Connection = InternetConnect(Initialize, fileName, INTERNET_DEFAULT_HTTP_PORT,
NULL , NULL, INTERNET_SERVICE_HTTP, 0, 0);
File = HttpOpenRequest(Connection, NULL, fileName2, NULL, NULL, NULL, 0, 0);
if(HttpSendRequest(File, NULL, 0, NULL, 0))
{
while(InternetReadFile(File, &ch, 1, &dwBytes))
{
if(dwBytes != 1)
break;
outFile << ch;
}
cout << "Connected successfully." << endl;
}
InternetCloseHandle(File);
InternetCloseHandle(Connection);
InternetCloseHandle(Initialize);
outFile.close();
//***Write "No tags" File***********************************************************************
inFiletemp.open("temphtml.txt");
outFiletemp.open("temp.txt");
inFiletemp >> temp;
while (! inFiletemp.eof())
{
if (temp == '<')
{
while (temp != '>')
inFiletemp >> temp;
inFiletemp >> temp;
}
outFiletemp << temp;
inFiletemp >> temp;
}
return 0;
}
In the "no tags" section, it reads a character from the file and checks if it is a '<'. If so, it continues reading until it finds a '>'. Then, it reads the next character and writes that one. I know this method of tag removing won't work, but I was wondering what is wrong with it.
Help would be appreciated.