By Aaron LeBlanc, Founder & CEO, Hypelocal
➡️ https://zapier.com/experts/hypelocal
[https://docs.google.com/presentation/d/1Je5Wg75GvSvFc22X51YhPhK33_3jOFQRphwTZsDCi4Q/edit?usp=sharing](https://docs.google.com/presentation/d/1Je5Wg75GvSvFc22X51YhPhK33_3jOFQRphwTZsDCi4Q/preview?usp=sharing)
Process:
Code Step to Scrape Raw Website Content from URL
import requests
import re
import logging
# Configure logging
logging.basicConfig(level=logging.INFO)
def extract_text_from_html(html_content):
"""
Extracts and cleans text from HTML content.
"""
try:
# Remove script and style elements
html_content = re.sub(r'<(script|style).*?>.*?</\\1>', '', html_content, flags=re.DOTALL)
# Remove HTML tags
text = re.sub(r'<.*?>', ' ', html_content)
# Collapse whitespace
text = re.sub(r'\\s+', ' ', text)
return text.strip()
except Exception as e:
logging.error(f"Failed to extract text from HTML: {e}")
return ""
def scrape_website_content(website_url):
"""
Scrapes raw content from the given website URL.
"""
try:
# Ensure the URL is prefixed with http:// or https://
if not website_url.startswith(('http://', 'https://')):
website_url = 'http://' + website_url
response = requests.get(website_url)
response.raise_for_status()
raw_text = extract_text_from_html(response.content.decode('utf-8'))
return raw_text
except requests.RequestException as e:
logging.error(f"Failed to fetch URL {website_url}: {e}")
return ""
# Get the website URL from input data
website_url = input_data.get('websiteUrl')
if website_url:
raw_content = scrape_website_content(website_url)
if raw_content:
output = {'raw_content': raw_content}
else:
output = {'error': 'Failed to extract content from the website.'}
else:
output = {'error': 'No website URL provided.'}
return output
ChatGPT Prompt Template for Structuring Raw Unstructured Website Content
Review the raw content here:
----
Website url: {{249095056__fields__websiteUrl}}. # From Zapier Chrome Push Trigger Step
Content: {{249095057__raw_content}}. # From Zapier Chrome Push Trigger Step
----
Extract and structure the data based on the following schema:
---
- 'businessName': Extracted from the business name.
- 'businessStreet': Extracted from business address.
- 'businessCity': Extracted from business address.
- 'businessState': Extracted from business address (state or province).
- 'businessCountry': Extracted from business address.
- 'businessPostalCode': Extracted from business address.
- 'businessPhone': Extracted from the business phone number.
- 'businessEmail': Extracted from the business email address.
- 'websiteUrl': Extracted from the Company website URL.
- 'businessDescription': A detailed and specific description of the business and what they do. Three sentences or less.
- 'businessIndustryName': The business industry (Name only, no code) using NAICS standards.
---
- Once the data is gathered, check your response for accuracy against the provided instructions.
- Unless otherwise told in the schema, all fields are string fields.
- This data will be used to go into a CRM via automation. The data needs to needs to be accurate, clean, and contextual.
++++
Output response in JSON code format with no leading characters. Your reply will be used as a JSON payload. Don't include (```json```).
++++
Model = gpt-4o
Memory Key = blank
Image = blank
User Name = Expert Sales Person
Assistant Name = Expert Sales Person
Assistant Instructions =
You are a helpful Sales assistant that specializes in parsing unstructured content into a structured format.
Max Tokens = 1024
Temperature = 0.5
Top P = 1
Parse JSON Payload Code Step
payLoad
= ChatGPT Responsevar obj = JSON.parse(inputData.payLoad);
return obj;
Setup Action steps as needed from there.
Process:
ChatGPT Prompt Template for Structuring Unstructured Email Content
Review the raw content here:
----
Subject: {{249105134__raw__Subject}}. # From Zapier Email Trigger Step
Body plain: {{249105134__body_plain}}. # From Zapier Email Trigger Step
----
Extract and structure the data based on the following schema:
---
- 'businessName': Extracted from the business name.
- 'businessStreet': Extracted from business address.
- 'businessCity': Extracted from business address.
- 'businessState': Extracted from business address (state or province).
- 'businessCountry': Extracted from business address.
- 'businessPostalCode': Extracted from business address.
- 'businessPhone': Extracted from the business phone number.
- 'businessEmail': Extracted from the business email address.
- 'websiteUrl': Extracted from the URL in the prospects email.
- 'businessDescription': A detailed and specific description of the business and what they do. Three sentences or less.
- 'businessIndustryName': Infer the business industry using NAICS standards.
- 'contactFirstName': Extracted from the contact's name.
- 'contactLastName': Extracted from the contact's name.
- 'contactCellPhone': Extracted from the contact's cell phone number.
- 'contactEmail': Extracted from the contact's email address.
- 'contactTitle': Extracted from the contact's title or position.
- 'budget': Extracted budget information related to the business or project.
- 'prospectInterest': Extracted interest level or area of interest of the prospect.
- 'leadDescription': A brief description of the lead, summarizing the potential opportunity or interest.
- 'whenToContact': Extracted preferred time or date to contact the lead.
---
- Once the data is gathered, check your response for accuracy against the provided instructions.
- Unless otherwise told in the schema, all fields are string fields.
- This data will be used to go into a CRM via automation. The data needs to needs to be accurate, clean, and contextual.
++++
Output response in JSON code format with no leading characters. Your reply will be used as a JSON payload. Don't include (```json```).
++++
Model = gpt-4
Memory Key = blank
Image = blank
User Name = Company Admin
Assistant Name = Company Admin Assistant and Email Parser
Assistant Instructions =
You are a helpful Company assistant that specializes in parsing unstructured content into a structured format.
Max Tokens = 1024
Temperature = 0.5
Top P = 1
Parse JSON Payload Code Step
payLoad
= ChatGPT Responsevar obj = JSON.parse(inputData.payLoad);
return obj;
Setup Action steps as needed from there.