Pipedream Workflow in Python - CSV File Operations

Streamline CSV data handling with no-code automation using Pipedream’s intuitive workflow builder.

Ceyhun Enki Aksan
Ceyhun Enki Aksan Entrepreneur, Maker

We’ll continue from where we left off with Tally and Pipedream, exploring example workflows for solutions that require minimal or no coding, and can be used quickly with minimal maintenance requirements.

Previously, I published an article titled What is Pipedream? How to Use It? detailing a serverless runtime1 and workflow service called Pipedream. In the article, I also discussed a practical example using XML-JSON Operations on Pipedream. The service recently launched a new version via [Product Hunt]2, introducing new integrations alongside the existing ones, and now enabling the use of not only Node.js but also Python3, Bash, and Go within workflow processes. However, I haven’t had the opportunity to review it recently4.

In this article, I’ll also discuss another service, Tally,—which I detailed in depth in the article titled What is Tally.so?, an alternative form builder that functions like a documentation tool.

CSV File Operations with Python via Pipedream

In the example I’ll cover, we’ll perform basic checks and logical operations on two different CSV files, then merge them together and generate a final result sentence, which will be sent via email. Additionally, we’ll export the merged DataFrame-formatted tables into a Google Sheets document.

Pipedream Workflow: Tally, Python, Email, Google Sheets
Pipedream Workflow: Tally, Python, Email, Google Sheets

The relevant files will be uploaded via the Tally form application based on visitor preferences and will be summarized into a concise output. I will not delve into excessive detail in the code and will continue to improve and refine the relevant code over time.

note

You can view the code by using the GitHub link provided in the final paragraph of the article.

Tally Form Operations

Tally can be easily integrated with Pipedream. The data received via form can also be processed using Node.js and Python5. In this example context, API integration will be performed.

Pipedream Workflow: Tally Form
Pipedream Workflow: Tally Form

Depending on the user’s response to the question What action would you like to take? when viewing the form, the file sections will be displayed and the relevant files will be uploaded.

note

You can view the sample form at nrjgE5 using its ID.

The fields present in the form and the values they hold can be accessed via steps.trigger.event.fields. The output of the form shown in the above example visual (exports) will look as follows.

[
    {
        "key": "question_aBCd1E",
        "label": null,
        "type": "INPUT_EMAIL",
        "value": "user@domain.com"
    },
    {
        "key": "question_mJJkrR",
        "label": null,
        "type": "CHECKBOXES",
        "value": [
            "1fc00d88"
        ],
        "options": [
            {
                "id": "ee551db8",
                "text": "Source control"
            },
            {
                "id": "1fc00d88",
                "text": "Order ID comparison"
            }
        ]
    },
    {
        "key": "question_mJJkrR_ee551db8",
        "label": "Checkbox (Source control)",
        "type": "CHECKBOXES",
        "value": false
    },
    {
        "key": "question_mJJkrR_1fc00d88",
        "label": "Checkbox (Order ID comparison)",
        "type": "CHECKBOXES",
        "value": true
    },
    {
        "key": "question_wMYkzg",
        "label": null,
        "type": "FILE_UPLOAD",
        "value": [
            {
                "id": "n0PKAZ",
                "name": "file-1.csv",
                "url": "https://storage.googleapis.com/tally-response-assets/.../.../file-1.csv",
                "mimeType": "text/csv",
                "size": 467
            }
        ]
    },
    {
        "key": "question_mJDaAY",
        "label": null,
        "type": "FILE_UPLOAD",
        "value": [
            {
                "id": "w52JOM",
                "name": "file-2.csv",
                "url": "https://storage.googleapis.com/tally-response-assets/.../.../file-2.csv",
                "mimeType": "text/csv",
                "size": 42174
            }
        ]
    }
]

In the next step, we can access the file URLs via steps.trigger.event.fields[4].value[0].url and steps.trigger.event.fields[5].value[0].url, the file names via steps.trigger.event.fields[4].value[0].name and steps.trigger.event.fields[5].value[0].name, and the file types (mimeType) via steps.trigger.event.fields[4].value[0].mimeType and steps.trigger.event.fields[5].value[0].mimeType, and then use these values within Python.

CSV File Operations with Python

Excel files can also be used instead of CSV files. However, given the context of my example, the files in question are Shopify Orders and GA4 exploration report outputs, so I chose CSV format as the common format. In future, I will attempt to handle different file types in various example scenarios.

When performing operations on CSV files using Python, the read_csv() function from the Pandas library can be used to process the file based on its URL. However, since Google Analytics reports include summary data in addition to the header row, such data may appear within the CSV file, leading to potential errors during file reading and column misalignment. For this reason, in this example context, I will perform operations such as uploading the relevant files to the /tmp directory, verifying their contents, and removing irrelevant records if necessary.

note

Of course, using Pipedream’s online Python code is not the only option. Alternative solutions such as PythonAnywhere could also be considered for other API integrations and operations6.

I should also mention that in the Python operation step added to a Pipedream workflow, the predefined handler function is used by default, and the pd variable is passed as a parameter for data flow between steps.

def handler(pd: "pipedream"):

If you’re using the Pandas library, you can modify this variable, or you can specify a module alias (alias) for Pandas, or alternatively, you can define a different one.

def handler(pd: "pipedream"):

Assume the following metrics and dimensions are selected in the Google Analytics 4 Discovery Report.

# --------------------------------------- #
# GA4
# Free form 1
# 20220815-20220818
# --------------------------------------- #

Date,Transaction ID,Transactions,Ecommerce revenue,Event count
,,7,33975.92,7,Grand total
20220816,#51885,1,13690,1

Shopify orders are being passed through the Orders endpoint in a single format.

Name,Email,Financial Status,Paid at,Fulfillment Status,Fulfilled at,Accepts Marketing,Currency,Subtotal,Shipping,Taxes,Total,Discount Code,Discount Amount,Shipping Method,Created at,Lineitem quantity,Lineitem name,Lineitem price,Lineitem compare at price,Lineitem sku,Lineitem requires shipping,Lineitem taxable,Lineitem fulfillment status,Billing Name,Billing Street,Billing Address1,Billing Address2,Billing Company,Billing City,Billing Zip,Billing Province,Billing Country,Billing Phone,Shipping Name,Shipping Street,Shipping Address1,Shipping Address2,Shipping Company,Shipping City,Shipping Zip,Shipping Province,Shipping Country,Shipping Phone,Notes,Note Attributes,Cancelled at,Payment Method,Payment Reference,Refunded Amount,Vendor,Id,Tags,Risk Level,Source,Lineitem discount,Tax 1 Name,Tax 1 Value,Tax 2 Name,Tax 2 Value,Tax 3 Name,Tax 3 Value,Tax 4 Name,Tax 4 Value,Tax 5 Name,Tax 5 Value,Phone,Receipt Number,Duties,Billing Province Name,Shipping Province Name,Payment ID,Payment Terms Name,Next Payment Due At,Payment References
#51885,,paid,2022-08-01 09:53:46 -0400,unfulfilled,,no,USD,467.99,0,0,467.99,,0,Standard,2022-07-27 15:02:48 -0400,1,,467.99,,,TRUE,TRUE,pending,,,,,,,,,,,,,,,,,,,,,,,,PayPal Express Checkout,,0,,,,Low,web,0,,,,,,,,,,,,,,,,,,,

The 0th, 6th, and 7th rows from the GA4 report need to be cleaned. For the Shopify table, the columns Name, Currency, Total, Payment Method, and Source are sufficient for processing operations.

We can process the relevant columns to be appended to the specified file using the Sheets API in the next step.

Python - Sheets API
Python - Sheets API

Data can be appended starting from the second row, continuously updating the cells (cells), and preserving the formatting and properties of other table fields (filters, formulas, etc.).

note

ceaksan/ShopifyAccuracyRate for access to the relevant code.

You can access the Jupyter notebook and alternative Sheets API examples via the GitHub link above. In addition, if you’d like to further enhance this example, you may refer to the Google Sheets API documentation7, watch videos shared by Pipedream8, and review other developer blogs9.

*[CSV]: Comma-separated values

Footnotes

  1. The duration of time between when a computer program starts executing and when it terminates. See Runtime (program lifecycle phase). Wikipedia
  2. Pipedream v2 (Product Hunt)
  3. Python. Pipedream Quickstart, Python API Integrations. Pipedream Apps
  4. We announced @pipedream 2.0 on #ProductHunt today! Pravin Savkar (Twitter)
  5. Tally API Integrations. Pipedream App
  6. PythonAnywhere. Host, run, and code Python in the cloud!
  7. Method: spreadsheets.values.batchUpdate. Sheets API, Google Sheets API Overview. Sheets API, Requests. Sheets API
  8. Pipedream. Sentiment Analysis on /r/nfl Reddit posts (Youtube)
  9. Raymond Camden. 2022. Testing out the new Pipedream to Get Trance Releases