Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ intuned dev run api <api-name> .parameters/api/<api-name>/default.json
|---------|-------------|
| [starter](./python-examples/starter/) | Starter template for new projects |
| [starter-auth](./python-examples/starter-auth/) | Starter template with Auth Sessions enabled |
| [starter-crawl4ai](./python-examples/starter-crawl4ai/) | Minimal Crawl4AI single-URL crawling starter |
| [starter-network-interception](./python-examples/starter-network-interception/) | Minimal network interception starter |
| [starter-rpa](./python-examples/starter-rpa/) | Minimal RPA starter for browser automation |
| [starter-scrapy](./python-examples/starter-scrapy/) | Minimal Scrapy starter |
Expand Down
1 change: 1 addition & 0 deletions python-examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ intuned dev run api <api-name> .parameters/api/<api-name>/default.json
| -------- | ------------- |
| [starter](./starter/) | Starter template for new projects |
| [starter-auth](./starter-auth/) | Starter template with Auth Sessions enabled |
| [starter-crawl4ai](./starter-crawl4ai/) | Minimal Crawl4AI single-URL crawling starter |
| [starter-network-interception](./starter-network-interception/) | Minimal network interception starter |
| [starter-rpa](./starter-rpa/) | Minimal RPA starter for browser automation |
| [starter-scrapy](./starter-scrapy/) | Minimal Scrapy starter |
Expand Down
1 change: 1 addition & 0 deletions python-examples/starter-crawl4ai/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
INTUNED_API_KEY=your_api_key_here
53 changes: 53 additions & 0 deletions python-examples/starter-crawl4ai/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Dependencies
node_modules/
.pnp
.pnp.js

# Production builds
/build
/dist
/.next/
/out/

# Environment variables
.env
.env.local
.env.development.local
.env.test.local
.env.production.local

# Logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*

# Runtime/temporary files
.cache
.parcel-cache
*.tsbuildinfo

# Coverage
/coverage
.nyc_output

# Python
__pycache__/
*.py[cod]
*.pyc
.Python
build/
*.egg-info/
.venv/
venv/
.env

# OS files
.DS_Store
Thumbs.db

# IDE
.vscode/
.idea/
.intuned
.intuned-agent
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"url": "https://playwright.dev/docs/intro"
}
31 changes: 31 additions & 0 deletions python-examples/starter-crawl4ai/Intuned.jsonc
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
// For more information, see our Intuned settings reference
// https://intunedhq.com/docs/main/05-references/intuned-json
{
"apiAccess": {
"enabled": true
},
"authSessions": {
"enabled": false
},
"replication": {
"maxConcurrentRequests": 1,
"size": "large"
},
"metadata": {
"template": {
"name": "starter-crawl4ai",
"description": "Minimal Crawl4AI starter that crawls a single URL to clean markdown",
"tags": [
"starter",
"crawling",
"crawl4ai"
]
},
"defaultRunPlaygroundInput": {
"apiName": "simple-crawl",
"parameters": {
"url": "https://playwright.dev/docs/intro"
}
}
}
}
75 changes: 75 additions & 0 deletions python-examples/starter-crawl4ai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# starter-crawl4ai (Python)

Minimal [Crawl4AI](https://crawl4ai.com) starter — crawls a single URL and returns the page content as clean markdown.

For deep crawling, multi-URL crawling, content selection, and adaptive crawling, see the [Crawl4AI documentation](https://docs.crawl4ai.com/).

<!-- IDE-IGNORE-START -->
<a href="https://app.intuned.io?repo=https://github.com/Intuned/cookbook/tree/main/python-examples/starter-crawl4ai" target="_blank" rel="noreferrer"><img src="https://cdn1.intuned.io/button.svg" alt="Run on Intuned"></a>
<!-- IDE-IGNORE-END -->

## APIs

| API | Description |
| --- | ----------- |
| `simple-crawl` | Crawls a single URL and returns the page content as clean markdown |

<!-- IDE-IGNORE-START -->
## Getting Started

### Install dependencies

```bash
uv sync
```

If the `intuned` CLI is not installed, install it globally:

```bash
npm install -g @intuned/cli
```

After installing dependencies, `intuned` command should be available in your environment.

### Run an API

```bash
intuned dev run api simple-crawl .parameters/api/simple-crawl/default.json
```

### Save project

```bash
intuned dev provision
```

### Deploy

```bash
intuned dev deploy
```
<!-- IDE-IGNORE-END -->

## Project Structure

```
starter-crawl4ai/
├── api/
│ └── simple-crawl.py # Crawl a single URL to markdown
├── intuned-resources/
│ └── jobs/
│ └── simple-crawl.job.jsonc # Job definition for simple-crawl API
├── .parameters/
│ └── api/
│ └── simple-crawl/
├── Intuned.jsonc
├── pyproject.toml
└── README.md
```

## Related

- [Crawl4AI Documentation](https://docs.crawl4ai.com/)
- [Intuned CLI](https://intunedhq.com/docs/main/05-references/cli/overview)
- [Intuned Browser SDK](https://intunedhq.com/docs/automation-sdks/overview)
- [Intuned llm.txt](https://intunedhq.com/docs/llms.txt)
62 changes: 62 additions & 0 deletions python-examples/starter-crawl4ai/api/simple-crawl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
"""
Crawls a single URL and returns the page content as clean markdown.

Based on: https://docs.crawl4ai.com/core/simple-crawling/
"""

from typing import TypedDict

from playwright.async_api import BrowserContext, Page

from crawl4ai import AsyncWebCrawler
from crawl4ai.async_configs import BrowserConfig, CacheMode, CrawlerRunConfig


class Params(TypedDict):
url: str


async def automation(
page: Page,
params: Params,
context: BrowserContext | None = None,
**_kwargs,
):
url = params.get("url")
if not url:
return {
"success": False,
"error": "URL parameter is required",
}

browser_config = BrowserConfig(verbose=True)
run_config = CrawlerRunConfig(
# Content filtering
word_count_threshold=10,
excluded_tags=["form", "header"],
exclude_external_links=True,
# Content processing
process_iframes=True,
remove_overlay_elements=True,
# Cache control
cache_mode=CacheMode.ENABLED, # Use cache if available
)

async with AsyncWebCrawler(config=browser_config) as crawler:
result = await crawler.arun(
url=url,
config=run_config,
)

if result.success:
return {
"success": True,
"markdown": result.markdown,
"images": result.media["images"],
"links": result.links["internal"],
}
else:
return {
"success": False,
"error": result.error_message,
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"configuration": {
"maxConcurrentRequests": 2,
"retry": {
"maximumAttempts": 3
}
},
"payload": [
{
"apiName": "simple-crawl",
"parameters": {
"url": "https://playwright.dev/docs/intro"
}
}
]
}
24 changes: 24 additions & 0 deletions python-examples/starter-crawl4ai/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "default"
version = "0.0.1"
description = "Empty Intuned project"
authors = [{ name = "Intuned", email = "service@intunedhq.com" }]
requires-python = ">=3.12,<3.13"
readme = "README.md"
keywords = [
"Python",
"intuned-browser-sdk",
]
dependencies = [
"playwright==1.56",
"intuned-runtime==1.3.33",
"intuned-browser==0.1.17",
"crawl4ai==0.8.6",
]

[tool.uv]
package = false
Loading