By Aaron LeBlanc, Founder & CEO, Hypelocal
➡️ https://zapier.com/experts/hypelocal
[https://docs.google.com/presentation/d/1Je5Wg75GvSvFc22X51YhPhK33_3jOFQRphwTZsDCi4Q/edit?usp=sharing](https://docs.google.com/presentation/d/1Je5Wg75GvSvFc22X51YhPhK33_3jOFQRphwTZsDCi4Q/preview?usp=sharing)
Process:

Code Step to Scrape Raw Website Content from URL
import requests
import re
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
def extract_text_from_html(html_content):
    """
    Extracts and cleans text from HTML content.
    """
    try:
        # Remove script and style elements
        html_content = re.sub(r'<(script|style).*?>.*?</\\1>', '', html_content, flags=re.DOTALL)
        # Remove HTML tags
        text = re.sub(r'<.*?>', ' ', html_content)
        # Collapse whitespace
        text = re.sub(r'\\s+', ' ', text)
        return text.strip()
    except Exception as e:
        logging.error(f"Failed to extract text from HTML: {e}")
        return ""
def scrape_website_content(website_url):
    """
    Scrapes raw content from the given website URL.
    """
    try:
        # Ensure the URL is prefixed with http:// or https://
        if not website_url.startswith(('http://', 'https://')):
            website_url = 'http://' + website_url
        response = requests.get(website_url)
        response.raise_for_status()
        raw_text = extract_text_from_html(response.content.decode('utf-8'))
        return raw_text
    except requests.RequestException as e:
        logging.error(f"Failed to fetch URL {website_url}: {e}")
        return ""
# Get the website URL from input data
website_url = input_data.get('websiteUrl')
if website_url:
    raw_content = scrape_website_content(website_url)
    if raw_content:
        output = {'raw_content': raw_content}
    else:
        output = {'error': 'Failed to extract content from the website.'}
else:
    output = {'error': 'No website URL provided.'}
return output
ChatGPT Prompt Template for Structuring Raw Unstructured Website Content
Review the raw content here:
----
Website url: {{249095056__fields__websiteUrl}}. # From Zapier Chrome Push Trigger Step
Content: {{249095057__raw_content}}. # From Zapier Chrome Push Trigger Step
----
Extract and structure the data based on the following schema:
---
- 'businessName': Extracted from the business name.
- 'businessStreet': Extracted from business address.
- 'businessCity': Extracted from business address.
- 'businessState': Extracted from business address (state or province).
- 'businessCountry': Extracted from business address.
- 'businessPostalCode': Extracted from business address.
- 'businessPhone': Extracted from the business phone number.
- 'businessEmail': Extracted from the business email address.
- 'websiteUrl': Extracted from the Company website URL.
- 'businessDescription': A detailed and specific description of the business and what they do. Three sentences or less.
- 'businessIndustryName': The business industry (Name only, no code) using NAICS standards.
---
- Once the data is gathered, check your response for accuracy against the provided instructions.
- Unless otherwise told in the schema, all fields are string fields.
- This data will be used to go into a CRM via automation. The data needs to needs to be accurate, clean, and contextual.
++++
Output response in JSON code format with no leading characters.  Your reply will be used as a JSON payload. Don't include (```json```).
++++
Model = gpt-4o
Memory Key = blank
Image = blank
User Name = Expert Sales Person
Assistant Name = Expert Sales Person
Assistant Instructions =
You are a helpful Sales assistant that specializes in parsing unstructured content into a structured format.
Max Tokens = 1024
Temperature = 0.5
Top P = 1
Parse JSON Payload Code Step
payLoad = ChatGPT Responsevar obj = JSON.parse(inputData.payLoad);
return obj;
Setup Action steps as needed from there.
Process:

ChatGPT Prompt Template for Structuring Unstructured Email Content
Review the raw content here:
----
Subject: {{249105134__raw__Subject}}. # From Zapier Email Trigger Step
Body plain: {{249105134__body_plain}}. # From Zapier Email Trigger Step
----
Extract and structure the data based on the following schema:
---
- 'businessName': Extracted from the business name.
- 'businessStreet': Extracted from business address.
- 'businessCity': Extracted from business address.
- 'businessState': Extracted from business address (state or province).
- 'businessCountry': Extracted from business address.
- 'businessPostalCode': Extracted from business address.
- 'businessPhone': Extracted from the business phone number.
- 'businessEmail': Extracted from the business email address.
- 'websiteUrl': Extracted from the URL in the prospects email.
- 'businessDescription': A detailed and specific description of the business and what they do. Three sentences or less.
- 'businessIndustryName': Infer the business industry using NAICS standards.
- 'contactFirstName': Extracted from the contact's name.
- 'contactLastName': Extracted from the contact's name.
- 'contactCellPhone': Extracted from the contact's cell phone number.
- 'contactEmail': Extracted from the contact's email address.
- 'contactTitle': Extracted from the contact's title or position.
- 'budget': Extracted budget information related to the business or project.
- 'prospectInterest': Extracted interest level or area of interest of the prospect.
- 'leadDescription': A brief description of the lead, summarizing the potential opportunity or interest.
- 'whenToContact': Extracted preferred time or date to contact the lead.
---
- Once the data is gathered, check your response for accuracy against the provided instructions.
- Unless otherwise told in the schema, all fields are string fields.
- This data will be used to go into a CRM via automation. The data needs to needs to be accurate, clean, and contextual.
++++
Output response in JSON code format with no leading characters.  Your reply will be used as a JSON payload. Don't include (```json```).
++++
Model = gpt-4
Memory Key = blank
Image = blank
User Name = Company Admin
Assistant Name = Company Admin Assistant and Email Parser
Assistant Instructions =
You are a helpful Company assistant that specializes in parsing unstructured content into a structured format.
Max Tokens = 1024
Temperature = 0.5
Top P = 1
Parse JSON Payload Code Step
payLoad = ChatGPT Responsevar obj = JSON.parse(inputData.payLoad);
return obj;
Setup Action steps as needed from there.