Iterators and Generators
Code Properties
- Language: Python
- Concept: Memory Optimization
Overview
Generators process data on-demand using yield, keeping memory usage low even with massive datasets. This is a core principle behind how large-scale data processing works.
Code
Memory-Intensive Approach (Avoid)
def process_large_file(filename):
all_lines = open(filename).readlines() # loads everything at once
results = []
for line in all_lines:
results.append(process_line(line))
return resultsGenerator Approach (Preferred)
def process_huge_file_generator(filename):
with open(filename) as f:
for line in f:
yield process_line(line) # process on demand
# using the generator
for result in process_huge_file_generator("massive_log.txt"):
# do something with each result
passUsage
# chain generators for data pipelines
def read_lines(filename):
with open(filename) as f:
for line in f:
yield line.strip()
def filter_errors(lines):
for line in lines:
if "ERROR" in line:
yield line
def parse_error(lines):
for line in lines:
yield {"message": line, "level": "error"}
# compose the pipeline
pipeline = parse_error(filter_errors(read_lines("app.log")))
for error in pipeline:
print(error)Appendix
Note created on 2024-04-15 and last modified on 2024-12-31.
See Also
Backlinks
(c) No Clocks, LLC | 2024