telescope
Telescope - A tool to observe distant (or local!) Airflow installations, and gather usage metadata
Usage:
Options:
--version Show the version and exit.
--local Airflow Reporting for local Airflow
--docker Autodiscovery and Airflow reporting for local
Docker
--kubernetes Autodiscovery and Airflow reporting for
Kubernetes
-l, --label-selector TEXT Label selector for Kubernetes Autodiscovery
[default: component=scheduler]
--dag-obfuscation Obfuscate DAG IDs and filenames, keeping first
and last 3 chars; my-dag-name => my-*****ame
--dag-obfuscation-fn TEXT Obfuscate DAG IDs, defining a custom function
that takes a string and returns a string;
'lambda x: x[-5:]' would return only the last
five letters of the DAG ID and fileloc
-f, --hosts-file PATH Hosts file to pass in various types of hosts
(ssh, kubernetes, docker) - See README.md for
sample
-p, --parallelism INTEGER How many cores to use for multiprocessing
[default: (Number CPU)]
-n, --organization-name TEXT Denote who this report belongs to, e.g. a
company name
-o, --data-file PATH Data file to write intermediate gathered data,
can be '-' for stdout
-u, --presigned-url TEXT URL to write data directly to - given by an
Astronomer Representative
--help Show this message and exit.
Presigned URL Upload
You have the option to upload the data payload via a presigned upload url. Please contact an Astronomer Representative to acquire a presigned url.
You can utilize this in the Telescope CLI as follows
telescope --kubernetes --organization-name <My Organization> --presigned-url https://storage.googleapis.com/astronomer-telescope............c32f043eae2974d847541bbaa1618825a80ed80f58f0ba3
--kubernetes
to the correct method of operation to access your Airflow,
and the contents of --presigned-url
to the actual URL supplied to you.
Note: Presigned URLs generally only last for up to a day, make sure to use yours soon after receiving it or request another when you are able.
Configuration
Local autodiscovery
Either use --local
or have an empty local
key in your hosts file to enable autodiscovery.
Autodiscovery simply runs the Airflow Report as a process, assuming that an Airflow Scheduler is being run
on the current node.
Docker autodiscovery
Either use --docker
or have an empty docker
key in your hosts file to enable autodiscovery.
Autodiscovery searches for containers running locally that contain "scheduler" in the name and returns
the container_id
hosts.yaml
Kubernetes autodiscovery
Either use --kubernetes
or an empty kubernetes
in your hosts file to enable autodiscovery.
Autodiscovery searches for pods running in the Kubernetes cluster defined by KUBEPROFILE
in any namespace, that contain the label component=scheduler
(or another label defined by --label-selector
),
and returns the namespace, name, and container (scheduler
)
hosts.yaml
Example hosts.yaml
input
use -f hosts.yaml
local:
docker:
- container_id: demo9b25c0_scheduler_1
kubernetes:
- namespace: astronomer-amateur-cosmos-2865
name: amateur-cosmos-2865-scheduler-bfcfbd7b5-dvqqr
container: scheduler
ssh:
- host: airflow.foo1.bar.com
- host: root@airflow.foo2.bar.com
- host: airflow.foo3.bar.com
user: root
connect_kwargs: {"key_filename":"/full/path/to/id_rsa"}
Label Selection
--label-selector
allows Kubernetes Autodiscovery to locate Airflow Deployments with alternate key/values.
The default is component=scheduler
, however, if your Airflows contain role=scheduler
instead, you would
use --label-selector "role=scheduler"
.
Airflow Report Command
TELESCOPE_AIRFLOW_REPORT_CMD
can be set, normally the default is
python -W ignore -c "import runpy,os;from urllib.request import urlretrieve as u;a='airflow_report.pyz';u('https://github.com/astronomer/telescope/releases/latest/download/'+a,a);runpy.run_path(a);os.remove(a)"
This can be used, for instance, if there is no access to Github on the remote box, or a custom directory is needed to run, or environment activation is required ahead of time.
If your python
is called something other than python
(e.g. python3
):
TELESCOPE_AIRFLOW_REPORT_CMD=$(cat <<'EOF'
python3 -W ignore -c "import runpy,os;from urllib.request import urlretrieve as u;a='airflow_report.pyz';u('https://github.com/astronomer/telescope/releases/latest/download/airflow_report.pyz',a);runpy.run_path(a);os.remove(a)"
EOF
) telescope -f hosts.yaml
or if you need to activate a python
(such as with RedHat Linux) prior to running, and want to copy the telescope Manifest up to the host independently:
scp airflow_report.pyz remote_user@remote_host:airflow_report.pyz
TELESCOPE_AIRFLOW_REPORT_CMD="scl enable rh-python36 python -W ignore -c 'import runpy;a=\'airflow_report.pyz\';runpy.run_path(a);os.remove(a)'" telescope -f hosts.yaml
DAG Obfuscation
DAG ID
and fileloc
can be obfuscated with the --dag-obfuscation
command.
The default obfuscation keeps the first 3 and last 3 characters and adds a fixed width of ******
. e.g.
Custom Obfuscation Function
If a different obfuscation function is desired, a --dag-obfuscation-function
can be passed,
which needs to be a python function that evaluates to (str) -> str
. E.g.
dag_id
and fileloc
. E.g.
Optional Environmental Variables
TELESCOPE_KUBERNETES_METHOD=kubectl
- to run with kubectl instead of the python SDK (often for compatibility reasons)TELESCOPE_REPORT_RELEASE_VERSION=x.y.z
- can be a separate telescope semver release number, to control which report gets runTELESCOPE_KUBERNETES_AIRGAPPED=true
- executes the airflow report in airgapped mode (i.e copies report binary from local to pod)LOG_LEVEL=DEBUG
- can be any support Python logging level[CRITICAL, FATAL, ERROR, WARN, WARNING, INFO, DEBUG, NOTSET]
TELESCOPE_SHOULD_VERIFY=false
- turn off helm chart collection - required to gather some data about Airflow in KubernetesTELESCOPE_REPORT_PACKAGE_URL
- sets the URL that both the local CLI ANDTELESCOPE_AIRFLOW_REMOTE_CMD
will use (unlessTELESCOPE_AIRFLOW_REMOTE_CMD
is set directly)
Compatibility Matrix
Telescope is tested with
Airflow versions
"apache/airflow:slim-2.8.1",
"apache/airflow:slim-2.7.3",
"apache/airflow:slim-2.6.0",
"apache/airflow:slim-2.5.3",
"apache/airflow:slim-2.4.0",
"apache/airflow:2.3.4",
"apache/airflow:2.2.4",
"apache/airflow:2.1.3",
"apache/airflow:2.0.0",
"apache/airflow:1.10.15",
"apache/airflow:1.10.10",
"bitnami/airflow:1.10.2",
Metadata Database Backends
- PostgreSQL
- SQLite
- MySQL (manual, infrequent testing)
- SQLServer (manual, infrequent testing)
Python Versions
Operating Systems