Getting Started with PyAirbyte (Beta)
PyAirbyte is a library that provides a set of utilities to use Airbyte connectors in Python. It is meant to be used in situations where setting up an Airbyte server or cloud account is not possible or desirable, for example in a Jupyter notebook or when iterating on early prototypes on a developer's workstation.
You can also check out this YouTube video on how to get started with PyAirbyte!
Installation
pip install airbyte
Or during the beta, you may want to install the latest from from source with:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git'
Usage
Data can be extracted from sources and loaded into caches:
import airbyte as ab
source = ab.get_source(
"source-faker",
config={"count": 5_000},
install_if_missing=True,
)
source.check()
source.select_all_streams()
result = source.read()
for name, records in result.streams.items():
print(f"Stream {name}: {len(list(records))} records")
Quickstarts
API Reference
For details on specific classes and methods, please refer to our PyAirbyte API Reference.
Architecture
PyAirbyte is a python library that can be run in any context that supports Python >=3.9. It contains the following main components:
- Source: A source object is using a Python connector and includes a configuration object. The configuration object is a dictionary that contains the configuration of the connector, like authentication or connection modalities. The source object is used to read data from the connector.
- Cache: Data can be read directly from the source object. However, it is recommended to use a cache object to store the data. The cache object allows to temporarily store records from the source in a SQL database like a local DuckDB file or a Postgres or Snowflake instance.
- Result: An object holding the records from a read operation on a source. It allows quick access to the records of each synced stream via the used cache object. Data can be accessed as a list of records, a Pandas DataFrame or via SQLAlchemy queries.
Available connectors
The following connectors are available:
- PGVector
- Adjust
- Airtable
- Alpha Vantage
- Amazon Ads
- Amazon Seller Partner
- Amazon SQS
- Amplitude
- Apify Dataset
- AppsFlyer
- Asana
- Avni
- Aws Cloudtrail
- Azure Blob Storage
- Azure Table Storage
- BambooHR
- Bing Ads
- Braintree
- Braze
- Cart.com
- Chargebee
- Close.com
- Commcare
- Commercetools
- Convex
- Facebook Marketing
- Facebook Pages
- Sample Data (Faker)
- Fastbill
- Fauna
- File (CSV, JSON, Excel, Feather, Parquet)
- Firebase Realtime Database
- Firebolt
- Freshcaller
- Freshdesk
- Google Cloud Storage (GCS)
- Genesys
- GitHub
- Gitlab
- GNews
- Google Ads
- Google Analytics 4 (GA4)
- Google Directory
- Google Drive
- Google Search Console
- Google Sheets
- Greenhouse
- Gridly
- Hardcoded Records
- Harvest
- HubSpot
- Instatus
- Intercom
- Iterable
- Jina AI Reader
- Jira
- Klaviyo
- Kyriba
- KYVE
- LinkedIn Ads
- Linnworks
- Looker
- Mailchimp
- Marketo
- Microsoft Dataverse
- Microsoft OneDrive
- Microsoft SharePoint
- Mixpanel
- Monday
- My Hours
- Netsuite
- Notion
- Okta
- Orb
- Orbit
- Outbrain Amplify
- Outreach
- Pardot
- PartnerStack
- Paypal Transaction
- Pipedrive
- PostHog
- PrestaShop
- Public Apis
- Qualaroo
- QuickBooks
- Railz
- Recharge
- Recurly
- Retently
- RKI Covid
- Rss
- S3
- Salesforce
- Salesloft
- Sendgrid
- Sentry
- SFTP Bulk
- Shopify
- Slack
- Smartsheets
- Snapchat Marketing
- Stripe
- SurveyCTO
- SurveyMonkey
- TikTok Marketing
- TPLcentral
- Trello
- Twilio
- Typeform
- US Census
- Webflow
- WooCommerce
- Xero
- Xkcd
- Yahoo Finance Price
- Yandex Metrica
- Younium
- YouTube Analytics
- Zendesk Chat
- Zendesk Sunshine
- Zendesk Support
- Zendesk Talk
- Zenloop
- ZohoCRM
- Zoom