'str' object is not callable

DSaba10 · Jun 23, 2020

Please pardon my newbie-ness to this thread. I've had to figure out on my own how to create a test environment for a spark server my company is using to handle data. I've jumped through various forums and have had to intall python itself using scoop via powershell, and set up PyCharm, spark itself, install and configure Hadoop, and probably a few other things. It's been a whirlwind of a few days.

So I've successfully done all the things listed above and am down to just testing out some scripting work. Here's the code:

Python:

import pyodbc
import pandas as pd
import numpy as np
import pyspark

from pyspark import SparkContext, SparkConf, SQLContext
from pyspark.sql.functions import *

appName = "PySpark SQL Server Example - via ODBC"
master = "local"
conf = SparkConf()     .setAppName(appName)     .setMaster(master)
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
spark = sqlContext.sparkSession


###########################################################################################################
# File Location and pulls
#  Update the file location where noted below
#  Update the query to connect to the proper sheet in excel and to select any necessary fields needed
#     to properly create the rule. Should help with runtime.
###########################################################################################################
conn_str = (
    r'DRIVER={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};'
    r'DBQ=C:\Users\DASaba\OneDrive - Ruffalo Noel Levitz\Desktop\DSC Load Files\filedata_model.xlsx;'  #Update File Name and Location Here
    r'ReadOnly=0'
    )
cnxn = pyodbc.connect(conn_str, autocommit=True)

sql = pd.read_sql_query('''select [_ID], [S_ID], [H_ID] from [EMSearchVendor$]''', cnxn)

for col in sql.columns:
   if ((sql[col].dtypes != np.int64) &
      (sql[col].dtypes != np.float64)):
    sql[col] = sql[col].fillna('')

inboundDF = spark.createDataFrame(sql)

#inboundDF.printSchema()


newDF = inboundDF.withColumn("H_ID", col("H_ID"))

newDF.show()

Please pardon any of the initial stupidity of the first part of the code. I'm sure there's a better way to handle the read-in of the data set and putting it into a spark dataframe, but this works(?) well enough for now.

The problem is with this line of code:

Python:

newDF = inboundDF.withColumn("H_ID", col("H_ID"))

I'm getting "TypeError: 'str' object is not callable" when I run this.

Right now all this should do is.. basically nothing. It's shoving the H_ID column back into itself. I had something mildly more complex in there, but it was giving the same error. I figured something I did was causing this to bomb, but now that's it's stripped down to nothing and I'm still getting the error, I'm at a loss.

I'm sure it's something dumb. I've checked various other forums to figure out if I'm doing something syntactically incorrect, but have found nothing definitive. Appreciate your patience and assistance!

DSaba10 · Jun 23, 2020

BAH!

It was my code to fix the column typings!

Python:

for col1 in sql.columns:
   if ((sql[col1].dtypes != np.int64) &
      (sql[col1].dtypes != np.float64)):
    sql[col1] = sql[col1].fillna('')

I pulled this code from elsewhere and didn't rename the "col" variable. This was then causing the col function from pyspark.sql.functions to work incorrectly. Changed it to col1 and now everything is hunky dory.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

'str' object is not callable

DSaba10

Programmer

DSaba10

Programmer

Similar threads

Part and Inventory Search

Sponsor