Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Encoding problem (newbie question)

Status
Not open for further replies.

Arrowx7

Programmer
Feb 16, 2005
17
CA
Hello,
I am am using a python script that fetches raw data from the web. However, while processing information I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 17: ordinal not in range(128)

It's actually the sql escape method that fails. As far as I understand, the charset doesn't support over \u128. If that's the case, is there any way to extend it? or does anyone know another solution.
I use the SQL escape method part of the MySQLdb object.
I'm quite a newbie at this thing :)

Thanks in advance!
 
Code:
import unicodedata
import MySQLdb

SQL = "INSERT INTO table1 (field1, field2) VALUES (%s, %s)"
CONNCRED = {
    'host': 'localhost',
    'user': 'justin',
    'passwd': 'secret',
    'db': 'mydb',
    }
ENCODINGS = [
    'ascii', 'base64_codec', 'charmap', 'cp037', 'cp1006', 'cp1026',
    'cp1140', 'cp1250', 'cp1251', 'cp1252', 'cp1253', 'cp1254', 'cp1255',
    'cp1256', 'cp1257', 'cp1258', 'cp424', 'cp437', 'cp500', 'cp737',
    'cp775', 'cp850', 'cp852', 'cp855', 'cp856', 'cp857', 'cp860', 'cp861',
    'cp862', 'cp863', 'cp864', 'cp865', 'cp866', 'cp869', 'cp874', 'cp875',
    'hex_codec', 'idna', 'iso8859_1', 'iso8859_10', 'iso8859_13',
    'iso8859_14', 'iso8859_15', 'iso8859_2', 'iso8859_3', 'iso8859_4',
    'iso8859_5', 'iso8859_6', 'iso8859_7', 'iso8859_8', 'iso8859_9',
    'koi8_r', 'koi8_u', 'latin_1', 'mac_cyrillic', 'mac_greek',
    'mac_iceland', 'mac_latin2', 'mac_roman', 'mac_turkish', 'mbcs',
    'palmos', 'punycode', 'quopri_codec', 'raw_unicode_escape', 'rot_13',
    'string_escape', 'undefined', 'unicode_escape', 'unicode_internal',
    'utf_16', 'utf_16_be', 'utf_16_le', 'utf_7', 'utf_8', 'uu_codec',
    'zlib_codec',]

if __name__ == '__main__':
    ustr = u'pages 100\u2013200'
    vals = []
    for enc in ENCODINGS:
        try: bs = ustr.encode(enc)
        except (UnicodeEncodeError, UnicodeError, TypeError): pass
        else: vals.append((bs, enc))
        
    c = MySQLdb.connect(**CONNCRED)
    try:
        cur = c.cursor()
        cur.executemany(SQL, vals)
        c.commit()
        cur.execute('SELECT field1, field2 FROM table1')
        rows = cur.fetchall()
    finally: c.close()

    for (bs, enc) in rows:
        print enc, bs.decode(enc) == ustr
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top