Skip to main content

Advanced Features

This guide covers the advanced features of BoCoFlow that allow you to create more powerful and flexible workflows.

Conda Environment Integration

BoCoFlow can integrate with Conda environments to manage Python dependencies and ensure reproducible workflows.

Setting Up Conda Integration

  1. In the Setup dialog, specify:

    • Conda Path: The path to your Conda installation (e.g., /home/user/miniconda3 or C:\Users\user\Anaconda3)
    • Default Conda Environment: The name of your default environment
  2. BoCoFlow will use this environment for executing nodes that require Python.

Benefits of Conda Integration

  • Dependency Management: Isolate dependencies for different projects
  • Reproducibility: Ensure workflows run consistently across environments
  • Compatibility: Avoid conflicts between packages needed by different nodes
  • Python Version Control: Use specific Python versions for compatibility

Custom Nodes Directory

The custom nodes directory allows you to extend BoCoFlow with your own node implementations.

Setting Up Custom Nodes

  1. Create a directory structure for your custom nodes:

    my_custom_nodes/
    ├── io/
    │ ├── my_reader.py
    │ └── my_writer.py
    ├── manipulation/
    │ └── my_processor.py
    └── visualization/
    └── my_plot.py
  2. In the Setup dialog, set the Custom Nodes Directory to the path of your my_custom_nodes directory.

  3. Restart BoCoFlow or click "Reload Nodes" to make your custom nodes available.

Writing Custom Nodes

Custom nodes are Python classes that inherit from BoCoFlow's node base classes. A simple example:

from bocoflow_core.node import ManipulationNode
from bocoflow_core.parameters import IntegerParameter, StringParameter
from bocoflow_core.node import NodeResult
import json

class MyCustomProcessor(ManipulationNode):
name = "My Custom Processor"
node_type = "manipulation" # Category in the node menu
num_in = 1 # Number of input ports
num_out = 1 # Number of output ports

OPTIONS = {
"multiplier": IntegerParameter(label="Multiplier", default=2),
"column": StringParameter(label="Column to Process", default="value")
}

def execute(self, predecessor_data, flow_vars):
# Get inputs
input_data = predecessor_data[0]
multiplier = flow_vars["multiplier"].get_value()
column = flow_vars["column"].get_value()

# Process data
result = NodeResult()

try:
# Process input data
if column in input_data:
input_data[column] = [x * multiplier for x in input_data[column]]
result.success = True
result.message = "Processing complete"
else:
result.success = False
result.message = f"Column '{column}' not found in input data"

# Set output data
result.data = input_data

# Return serialized result
return result.to_json()

except Exception as e:
result.success = False
result.message = f"Error: {str(e)}"
return result.to_json()

Remote Execution

Some node types in BoCoFlow support remote execution on computing clusters or servers.

Setting Up Remote Execution

  1. Add a RemoteConfigNode to your workflow

  2. Configure connection details:

    • Hostname
    • Username
    • Authentication method (password or key file)
    • Remote working directory
  3. Connect the RemoteConfigNode to nodes that support remote execution

Monitoring Remote Jobs

  • The node status will show "Running on Cluster" when a job is executing remotely
  • The log panel will display job IDs and status updates
  • Node visualization will show results once the remote job completes

Path Management

BoCoFlow offers advanced path management features to ensure workflow portability.

Path Prefixes

  • abs: denotes an absolute path (e.g., abs:/home/user/data.csv)
  • rel: denotes a path relative to the working directory (e.g., rel:data/results.csv)

Path Variables

For maximum flexibility, use flow variables to define paths:

  1. Create a StringNode with the path value
  2. Use this flow variable in other nodes' file path settings
  3. Update the path in one place to change it throughout the workflow

Workflow Optimization

For complex workflows, BoCoFlow provides several optimization features:

Force Run Control

Each node has a "Force to Run" option that controls execution behavior:

  • When disabled (default), nodes only execute if:

    • They haven't been executed before
    • Their configuration changed
    • Their inputs changed
  • When enabled, nodes always execute, ignoring cached results

Execution Profiling

The log panel shows execution times for each node, helping identify bottlenecks in your workflow.

Advanced Graph Features

For large workflows, use the search function in the toolbar:

  1. Enter a node ID in the search box
  2. Click "Find" to locate and highlight the node

Canvas Controls

  • Zoom in/out: Use the zoom buttons or mouse wheel
  • Reset view: Click the "Reset" button to recenter the canvas
  • Auto-layout: Use drag-and-drop to organize nodes

Working with Large Data

When dealing with large datasets:

Chunking Data

Some nodes support data chunking to process large files efficiently:

  • Configure chunk size in supported nodes
  • Enable streaming processing when available

Using Database Connectors

For very large datasets, use database connector nodes:

  • DatabaseReadNode for reading from databases
  • DatabaseWriteNode for writing to databases
  • SQLQueryNode for executing custom queries

Memory Optimization

Tips for handling memory-intensive workflows:

  • Execute nodes individually rather than the entire workflow
  • Use filtering and aggregation early in the workflow
  • Save intermediate results to disk using WriteCsvNode or similar

Next Steps

With these advanced features, you can create powerful, efficient, and flexible workflows in BoCoFlow. For more details: