Advanced Features
This guide covers the advanced features of BoCoFlow that allow you to create more powerful and flexible workflows.
Conda Environment Integration
BoCoFlow can integrate with Conda environments to manage Python dependencies and ensure reproducible workflows.
Setting Up Conda Integration
-
In the Setup dialog, specify:
- Conda Path: The path to your Conda installation (e.g.,
/home/user/miniconda3
orC:\Users\user\Anaconda3
) - Default Conda Environment: The name of your default environment
- Conda Path: The path to your Conda installation (e.g.,
-
BoCoFlow will use this environment for executing nodes that require Python.
Benefits of Conda Integration
- Dependency Management: Isolate dependencies for different projects
- Reproducibility: Ensure workflows run consistently across environments
- Compatibility: Avoid conflicts between packages needed by different nodes
- Python Version Control: Use specific Python versions for compatibility
Custom Nodes Directory
The custom nodes directory allows you to extend BoCoFlow with your own node implementations.
Setting Up Custom Nodes
-
Create a directory structure for your custom nodes:
my_custom_nodes/
├── io/
│ ├── my_reader.py
│ └── my_writer.py
├── manipulation/
│ └── my_processor.py
└── visualization/
└── my_plot.py -
In the Setup dialog, set the Custom Nodes Directory to the path of your
my_custom_nodes
directory. -
Restart BoCoFlow or click "Reload Nodes" to make your custom nodes available.
Writing Custom Nodes
Custom nodes are Python classes that inherit from BoCoFlow's node base classes. A simple example:
from bocoflow_core.node import ManipulationNode
from bocoflow_core.parameters import IntegerParameter, StringParameter
from bocoflow_core.node import NodeResult
import json
class MyCustomProcessor(ManipulationNode):
name = "My Custom Processor"
node_type = "manipulation" # Category in the node menu
num_in = 1 # Number of input ports
num_out = 1 # Number of output ports
OPTIONS = {
"multiplier": IntegerParameter(label="Multiplier", default=2),
"column": StringParameter(label="Column to Process", default="value")
}
def execute(self, predecessor_data, flow_vars):
# Get inputs
input_data = predecessor_data[0]
multiplier = flow_vars["multiplier"].get_value()
column = flow_vars["column"].get_value()
# Process data
result = NodeResult()
try:
# Process input data
if column in input_data:
input_data[column] = [x * multiplier for x in input_data[column]]
result.success = True
result.message = "Processing complete"
else:
result.success = False
result.message = f"Column '{column}' not found in input data"
# Set output data
result.data = input_data
# Return serialized result
return result.to_json()
except Exception as e:
result.success = False
result.message = f"Error: {str(e)}"
return result.to_json()
Remote Execution
Some node types in BoCoFlow support remote execution on computing clusters or servers.
Setting Up Remote Execution
-
Add a RemoteConfigNode to your workflow
-
Configure connection details:
- Hostname
- Username
- Authentication method (password or key file)
- Remote working directory
-
Connect the RemoteConfigNode to nodes that support remote execution
Monitoring Remote Jobs
- The node status will show "Running on Cluster" when a job is executing remotely
- The log panel will display job IDs and status updates
- Node visualization will show results once the remote job completes
Path Management
BoCoFlow offers advanced path management features to ensure workflow portability.
Path Prefixes
- abs: denotes an absolute path (e.g.,
abs:/home/user/data.csv
) - rel: denotes a path relative to the working directory (e.g.,
rel:data/results.csv
)
Path Variables
For maximum flexibility, use flow variables to define paths:
- Create a StringNode with the path value
- Use this flow variable in other nodes' file path settings
- Update the path in one place to change it throughout the workflow
Workflow Optimization
For complex workflows, BoCoFlow provides several optimization features:
Force Run Control
Each node has a "Force to Run" option that controls execution behavior:
-
When disabled (default), nodes only execute if:
- They haven't been executed before
- Their configuration changed
- Their inputs changed
-
When enabled, nodes always execute, ignoring cached results
Execution Profiling
The log panel shows execution times for each node, helping identify bottlenecks in your workflow.
Advanced Graph Features
Node Search
For large workflows, use the search function in the toolbar:
- Enter a node ID in the search box
- Click "Find" to locate and highlight the node
Canvas Controls
- Zoom in/out: Use the zoom buttons or mouse wheel
- Reset view: Click the "Reset" button to recenter the canvas
- Auto-layout: Use drag-and-drop to organize nodes
Working with Large Data
When dealing with large datasets:
Chunking Data
Some nodes support data chunking to process large files efficiently:
- Configure chunk size in supported nodes
- Enable streaming processing when available
Using Database Connectors
For very large datasets, use database connector nodes:
- DatabaseReadNode for reading from databases
- DatabaseWriteNode for writing to databases
- SQLQueryNode for executing custom queries
Memory Optimization
Tips for handling memory-intensive workflows:
- Execute nodes individually rather than the entire workflow
- Use filtering and aggregation early in the workflow
- Save intermediate results to disk using WriteCsvNode or similar
Next Steps
With these advanced features, you can create powerful, efficient, and flexible workflows in BoCoFlow. For more details:
- Custom Node Development for creating specialized nodes
- API Reference for programmatic access
- Troubleshooting for solving common issues