HTML::Entites - hash copy

piken · Jan 3, 2012

Does anyone see a problem with doing something
as simple as this with form input.

Object is to create two hashes from one, one for printing back to
client safely (encoded) and the other to store the data in original non-encoded format.
=================================
my %in_c = %in; # copy my hash

foreach $key (keys %in) {
$in{$key}=encode_entities($value=$in{$key});
}
=================================
So now I have two hashes;
%in (encoded)
%in_c (decoded/raw)

The idea since I have many print statements through out
the program is to use %in for printing and to save %in_c for our internal use.

Seems to work and better to do at beginning of program and be done with it then to encode/decode multi times through the program.

Bad/Good/OK?

prex1 · Jan 4, 2012

This is the perl forum. You should ask in the HTML forum, as your question (quite unclear) seems not related to perl issues.

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

piken · Jan 4, 2012

Err.. this is a perl question.

HTML::Entities is a perl module.

If you have ever done any perl cgi programming
this should be very obvious and would also
be very clear as to the question.

So if you don't know what you are
responding about... don't respond.

Annihilannic · Jan 4, 2012

Your proposal sounds fine to me. I don't see the point in copying the contents of the hash twice. Why not encode as you copy it?

Perl:

# assign values to %in_c initially

# skip this
# my %in_c = %in;  # copy my hash

foreach $key (keys %in_c) {
    $in{$key}=encode_entities($value=$in_c{$key});
}

The only disadvantage I can think of is, should you decide in future to manipulate some of the data in this hash, you may forget to update both versions with confusing results.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]

piken · Jan 4, 2012

you may forget to update both versions"
that was a good point I didn't think about, but values are never changed inline, they would be assigned to a variable and modified in the variable.

Since this script generates several dynamic web pages through out
almost 6000 lines of code and about 80% through also
saves data out to a file I was thinking it would be easiest to
have an encoded copy for printing back to client when needed and raw/decoded copy to save out, otherwise I would be encoding/decoding things many times.

It was really this line that worried me....

$in{$key}=encode_entities($value=$in{$key});

But seems to work OK.

Annihilannic · Jan 4, 2012

What is the $value= assignment for? Does it work without it? (I haven't used encode_entities()).

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]

piken · Jan 4, 2012

HTML::Entities is used as part of a safety measure to help fight
against "cross site scripting"

It's possible for someone to enter unscrupulous HTML code and
have it written back to the client browser.

such as...

<IMG SRC="javascript:alert('XSS');">

HTML::Entities will convert the HTML in the input
to their HTML entities which browsers will still display
properly.

example

decoded/raw input = <IMG SRC="javascript:alert('XSS');">

input encoded to HTML entities and wriiten back to client =

<IMG SRC="javascript:alert('XSS&#39

;">

So basically I'm taking the raw form input, key value, encoding it and assigning the encoded value back to the key.
Which will make it safer to display back to client browser.

$in{$key}=encode_entities($value=$in{$key});

Hope that made sense.

piken · Jan 4, 2012

I guess you made me think about it a little more...

>What is the $value= assignment for? Does it work without it?

$in{$key}=encode_entities($value=$in{$key});

$in{$key} is a reference, to get the value of the
reference you need to assign it, thus $value=$in{$key}

once you have the value, you can encode the value
encode_entities($value, once encoded you can store the
value back to the reference, $in{$key}=

Errr...something like that.

prex1 · Jan 6, 2012

[tt]$in{$key}[/tt] is a string, not a reference. If it would, then [tt]$value[/tt] would be also. So the assignment in the function call is useless, unless [tt]$value[/tt] is used elsewhere.

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

rharsh · Jan 6, 2012

So I have a, perhaps silly, question - if the data needs to be encoded such that it won't be executable, why store it in a non-encoded form? Doesn't that create the possibility that someone could innocently use the data later but forget to encode it again? Why not convert it to the 'safe' version straight away and store the data that way for future use?

As far as your code goes, a few comments:[ul]
[li]Variable Naming[/li]

[ul][li]It probably doesn't seem like it now, but it would be pretty easy use the 'wrong' hash when printing your information[/li]
[li]I might suggest something more like %in_dirty, %in_clean[/li][/ul]

[li]Duplicating your hash[/li]
[ul][li]There doesn't need to be two steps to duplicate the hash. One to make a copy then a second to process each element. See the code below for an example.[/li][/ul]
[/ul]

Code:

foreach my $key (keys %in_dirty) {
  $in_clean{$key} = encode_entities($in_dirty{$key});
}

And, if you decide to store the 'encoded' data, you don't need the second hash at all. You could encode and store the data in a single hash while you're reading the data in the data.

piken · Jan 6, 2012

Well...... Thanks for all the great input
and the heads up.

$in{$key}=encode_entities($value=$in{$key});

This was the line that was troubling me.

Annihilannic;
It does work without the "$value"

rharsh;

This was a real good way to do it.

foreach my $key (keys %in_dirty) {
$in_clean{$key} = encode_entities($in_dirty{$key});
}

As everyone has been pointing out having duplicate
hashes can cause problems, err... well it has caused
some issues and I now believe it's not worth the trouble.

I'm going to keep one hash and only encode
the HTML entities where user input is being printed back
to the client.

print encode_entities($in{xx});

Makes it simpler, one hash and encoded where the
actual problem may exist.

I can see where people might opt to use "Apache::TaintRequest"
since it overrides the print request and handles it there.

prex1 · Jan 7, 2012

If you are still worried with the number of times you'll have to repeatedly encode (presumably many thousands, otherwise don't worry), there is an intermediate way to do it: you write your own encode function that, before going to [tt]encode_entities()[/tt] checks whether it has been already encoded, something along the lines of

Code:

sub my_encode_entities($key){
  if(exists$in_clean{$key}){
    return$in_clean{$key};
  }else{
    return$in_clean{$key}=encode_entities($in_dirty{$key});
  }
}

This requires [tt]%in_clean[/tt] and [tt]%in_dirty[/tt] to be global, but can be easily changed by using references if you mind.

Franco

http://www.xcalcs.com

: Online engineering calculations

http://www.megamag.it

: Magnetic brakes for fun rides

http://www.levitans.com

: Air bearing pads

piken · Jan 7, 2012

prex1, thanks for the input.

I decided to go with encoding HTML entities
at the source of concern....

example...

print <INPUT TYPE="hidden" NAME="AD1" VALUE='.encode_entities($in{AD1}).'>;

CGI param then does a good job at decoding
the HTML entities on input.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

HTML::Entites - hash copy

piken

IS-IT--Management

prex1

Programmer

piken

IS-IT--Management

Annihilannic

MIS

piken

IS-IT--Management

Annihilannic

MIS

piken

IS-IT--Management

piken

IS-IT--Management

prex1

Programmer

rharsh

Technical User

piken

IS-IT--Management

prex1

Programmer

piken

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor