The text below is selected, press Ctrl+C to copy to your clipboard. (⌘+C on Mac) No line numbers will be copied.
Guest
Sample Python script using AWS Glue to load a CSV file from Amazon S3 to Amazon Redshift
By Guest on 5th April 2024 10:41:38 PM | Syntax: PYTHON | Views: 87



New Paste New paste | Download Paste Download | Toggle Line Numbers Show/Hide line no. | Copy Paste Copy text to clipboard
  1. import sys
  2. from awsglue.transforms import *
  3. from awsglue.utils import getResolvedOptions
  4. from pyspark.context import SparkContext
  5. from awsglue.context import GlueContext
  6. from pyspark.sql import SQLContext
  7. from pyspark.sql.functions import col
  8.  
  9. # Initialize Spark and Glue contexts
  10. sc = SparkContext()
  11. glueContext = GlueContext(sc)
  12. spark = glueContext.spark_session
  13. sqlContext = SQLContext(sc)
  14.  
  15. # Parameters
  16. args = getResolvedOptions(sys.argv, ['JOB_NAME', 's3_input_path', 'redshift_jdbc_url', 'redshift_table'])
  17.  
  18. # Read data from S3
  19. data_frame = spark.read.csv(args['s3_input_path'], header=True, inferSchema=True)
  20.  
  21. # Perform data cleaning
  22. cleaned_data_frame = data_frame.dropDuplicates().fillna(0)  # Example: Remove duplicates and fill missing values with 0
  23.  
  24. # Write cleaned data to Redshift
  25. cleaned_data_frame.write \
  26.     .format("jdbc") \
  27.     .option("url", args['redshift_jdbc_url']) \
  28.     .option("dbtable", args['redshift_table']) \
  29.     .option("user", "your_redshift_user") \
  30.     .option("password", "your_redshift_password") \
  31.     .option("driver", "com.amazon.redshift.jdbc42.Driver") \
  32.     .mode("append") \
  33.     .save()
  34.  
  35. # Stop Spark context
  36. sc.stop()












Python software and documentation are licensed under the PSF License Agreement.
Starting with Python 3.8.6, examples, recipes, and other code in the documentation are dual licensed under the PSF License Agreement and the Zero-Clause BSD license.
Some software incorporated into Python is under different licenses. The licenses are listed with code falling under that license. See Licenses and Acknowledgements for Incorporated Software for an incomplete list of these licenses.

Python and it's documentation is:
Copyright © 2001-2022 Python Software Foundation. All rights reserved.
Copyright © 2000 BeOpen.com. All rights reserved.
Copyright © 1995-2000 Corporation for National Research Initiatives. All rights reserved.
Copyright © 1991-1995 Stichting Mathematisch Centrum. All rights reserved.

See History and License for complete license and permissions information:
https://docs.python.org/3/license.html#psf-license
  • Recent Pastes