Python 리스트 컴프리헨션 — 완전한 실용 가이드

리스트 컴프리헨션이 이해되면, 리스트에 append하는 for 루프를 다시는 작성하지 않게 됩니다. 단순한 문법적 설탕이 아닙니다 — 의도를 명확하게 전달하고 바이트코드 최적화 덕분에 CPython의 동등한 루프보다 빠르게 실행됩니다. 공식 Python 튜토리얼에 따르면, 컴프리헨션은 기존 시퀀스나 다른 이터러블을 기반으로 리스트를 생성하는 간결한 방법을 제공합니다. 이 글은 먼저 정신적 모델을 구축한 다음, 모든 실제 패턴을 다룹니다: 필터링, 중첩, 딕셔너리 및 집합 컴프리헨션, 그리고 대신 일반 for 루프를 사용해야 하는 한 가지 경우.

기본 패턴

리스트 컴프리헨션의 구조는 [expression for item in iterable]입니다. 세 부분: 출력 표현식(각 요소가 되는 것), 루프 변수, 그리고 가져올 이터러블. 익숙한 루프로 시작하여 압축해 보세요.

python

# Plain for loop — building a list of file sizes in KB
file_sizes_bytes = [1024, 204800, 51200, 3145728, 8192]

sizes_kb = []
for size in file_sizes_bytes:
    sizes_kb.append(size / 1024)

# print(sizes_kb)  →  [1.0, 200.0, 50.0, 3072.0, 8.0]

# Same result as a list comprehension — one line, same meaning
sizes_kb = [size / 1024 for size in file_sizes_bytes]

컴프리헨션은 거의 영어처럼 읽힙니다: "file_sizes_bytes의 모든 size에 대해 size / 1024를 주세요." 그 명확성이 진정한 이점입니다 — 독자가 무엇을 만들고 있는지 이해하기 위해 append 호출을 추적할 필요가 없습니다.

python

# Another common pattern: deriving one list from another
usernames = ["alice", "bob", "carol"]

# Build a list of display names
display_names = [name.capitalize() for name in usernames]
# ['Alice', 'Bob', 'Carol']

# Or extract a single field from a list of dicts
users = [
    {"id": 1, "name": "Alice", "role": "admin"},
    {"id": 2, "name": "Bob",   "role": "viewer"},
    {"id": 3, "name": "Carol", "role": "editor"},
]

names = [user["name"] for user in users]
# ['Alice', 'Bob', 'Carol']

if로 필터링

끝에 조건을 추가하면 통과한 요소만 출력 리스트에 포함됩니다: [expression for item in iterable if condition]. 이 패턴은 for + if + append 조합을 대체합니다.

python

# Loop version — extract only active users
active_users = []
for user in users:
    if user["active"]:
        active_users.append(user["name"])

# Comprehension version — identical result
users = [
    {"name": "Alice", "active": True},
    {"name": "Bob",   "active": False},
    {"name": "Carol", "active": True},
    {"name": "Dave",  "active": False},
]

active_users = [user["name"] for user in users if user["active"]]
# ['Alice', 'Carol']

python

# Filtering a raw CSV row — drop blanks and whitespace-only values
raw_row = ["[email protected]", "", "  ", "editor", " "]

clean_row = [field.strip() for field in raw_row if field.strip()]
# ['[email protected]', 'editor']

# Filtering a list of log levels
log_lines = [
    "INFO  server started",
    "DEBUG loading config",
    "ERROR database timeout",
    "DEBUG query took 450ms",
    "ERROR disk space low",
]

errors = [line for line in log_lines if line.startswith("ERROR")]
# ['ERROR database timeout', 'ERROR disk space low']

문자열 다루기

문자열 처리는 컴프리헨션이 진가를 발휘하는 곳입니다. Python의 풍부한 문자열 메서드와 컴프리헨션 구문의 조합은 중간 변수 없이 변환 파이프라인을 읽기 쉽게 유지합니다.

python

# Strip whitespace from tags coming out of a form field
raw_tags = ["  python ", "data science", " machine-learning ", "API"]

tags = [tag.strip().lower() for tag in raw_tags]
# ['python', 'data science', 'machine-learning', 'api']

# Normalise email addresses from a signup CSV
raw_emails = ["[email protected]", "  [email protected]  ", "[email protected]"]

emails = [e.strip().lower() for e in raw_emails]
# ['[email protected]', '[email protected]', '[email protected]']

# Extract file extensions from a list of uploaded filenames
filenames = ["report.pdf", "avatar.PNG", "data.CSV", "archive.tar.gz", "notes.txt"]

extensions = [name.rsplit(".", 1)[-1].lower() for name in filenames if "." in name]
# ['pdf', 'png', 'csv', 'gz', 'txt']

팁: 사용자 입력을 정규화할 때, 소문자로 변환하기 전에 공백을 제거하세요 — 그렇지 않으면 " [email protected] "과 같은 값이 콘텐츠에 대한 .lower() 검사를 통과하지만 데이터베이스 조회나 JSON 키 일치를 깨뜨리는 선행 공백을 여전히 가지고 있습니다.

딕셔너리 및 집합 컴프리헨션

같은 아이디어가 딕셔너리와 집합으로 확장됩니다. 딕셔너리 컴프리헨션은 key: value 쌍과 함께 중괄호를 사용합니다: {key: value for item in iterable}. 집합 컴프리헨션은 콜론을 떼고 중복 제거된 컬렉션을 생성합니다: {expression for item in iterable}.

python

# Invert a dict — swap keys and values
permissions = {"alice": "admin", "bob": "viewer", "carol": "editor"}

by_role = {role: name for name, role in permissions.items()}
# {'admin': 'alice', 'viewer': 'bob', 'editor': 'carol'}

# Build a fast lookup dict from a list of user records
users = [
    {"id": 101, "name": "Alice", "active": True},
    {"id": 102, "name": "Bob",   "active": False},
    {"id": 103, "name": "Carol", "active": True},
]

# O(1) lookups by ID — much faster than scanning the list every time
user_by_id = {user["id"]: user for user in users}
# {101: {...}, 102: {...}, 103: {...}}

# Access a user directly
user_by_id[102]["name"]  # 'Bob'

python

# Set comprehension — deduplicate a list of file extensions
uploads = ["report.pdf", "data.csv", "summary.pdf", "export.CSV", "notes.txt"]

unique_extensions = {name.rsplit(".", 1)[-1].lower() for name in uploads if "." in name}
# {'pdf', 'csv', 'txt'}  — order not guaranteed

집합의 한 가지 주의사항: 순서가 보장되지 않습니다. 원래 순서를 유지하면서 리스트를 중복 제거해야 한다면, 집합 컴프리헨션은 잘못된 도구입니다 — Python 3.7+에서 딕셔너리의 삽입 순서 동작을 활용하는 list(dict.fromkeys(items))를 대신 사용하세요.

중첩 컴프리헨션

중첩된 구조를 반복하기 위해 컴프리헨션을 중첩할 수 있습니다. 가장 일반적인 사용 사례는 리스트의 리스트를 평탄화하는 것입니다 — CSV 파싱의 행렬, 청크된 API 응답 페이지, 또는 그룹화된 쿼리 결과.

python

# Flatten a 2D list (e.g. paginated API results)
pages = [
    [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}],
    [{"id": 3, "name": "Carol"}],
    [{"id": 4, "name": "Dave"}, {"id": 5, "name": "Eve"}],
]

all_users = [user for page in pages for user in page]
# [{'id': 1, ...}, {'id': 2, ...}, {'id': 3, ...}, {'id': 4, ...}, {'id': 5, ...}]

# The order mirrors what nested for loops would produce:
# for page in pages:
#     for user in page:
#         all_users.append(user)

중첩 컴프리헨션은 왼쪽에서 오른쪽으로 읽으세요 — 외부 루프가 먼저 오고, 내부 루프가 두 번째입니다. 이것은 중첩된 for 루프의 순서와 일치합니다.

python

# Two levels — fine and readable
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [cell for row in matrix for cell in row]
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

# Three levels — stop here. Use a plain loop or a helper function.
# This is the point where readability dies:
cube = [[[1,2],[3,4]],[[5,6],[7,8]]]

# Don't do this:
flat3 = [val for layer in cube for row in layer for val in row]

# Do this instead:
flat3 = []
for layer in cube:
    for row in layer:
        flat3.extend(row)

제너레이터 표현식은 괄호를 대신 사용하는 같은 구문입니다: (user["id"] for user in users). 지연적으로 평가됩니다 — 전체 리스트를 메모리에 만들지 않고 한 번에 하나씩 요소를 생성합니다. 한 번만 반복하거나 결과를 바로 sum(), max(), any() 또는 유사한 함수에 전달하는 경우 사용하세요. 전체 세부 사항은 제너레이터 표현식 참조를 확인하세요.

컴프리헨션 내의 if/else 삼항 연산자

요소를 삭제하는 대신 두 값 중 하나로 변환해야 할 때 — 출력 위치에서 삼항 표현식을 사용하세요. 위치가 중요합니다: 삼항은 이터러블 뒤가 아닌 시작 부분에 위치합니다.

python

# Correct: ternary is the output expression
users = [
    {"name": "Alice", "active": True},
    {"name": "Bob",   "active": False},
    {"name": "Carol", "active": True},
]

status = ["active" if user["active"] else "inactive" for user in users]
# ['active', 'inactive', 'active']

# Normalise a config value — replace None with a default
raw_config = ["/var/log", None, "/tmp", None, "/etc/app"]

paths = [path if path is not None else "/var/log/default" for path in raw_config]
# ['/var/log', '/var/log/default', '/tmp', '/var/log/default', '/etc/app']

python

# Common mistake — putting the ternary after the iterable (SyntaxError)
# status = ["active" for user in users if user["active"] else "inactive"]  # WRONG

# The trailing "if" is a filter — it drops non-matching items entirely.
# The ternary ("if ... else ...") in the output expression transforms all items.
# They serve different purposes. You can combine them:

# Keep only users, but show their status
status = ["active" if user["active"] else "inactive"
          for user in users
          if user["name"] != "Bob"]
# ['active', 'active']  — Bob was filtered out entirely

대신 for 루프를 사용해야 할 때

리스트 컴프리헨션은 리스트를 만들기 위한 것입니다. 실제로 하는 것이 부작용 실행이라면, 컴프리헨션이 아닌 for 루프를 사용하세요. 이것은 스타일 이상의 중요성이 있습니다: 결과를 버리는 컴프리헨션은 아무도 사용하지 않는 리스트를 만들어 메모리를 낭비하고, 독자에게서 의도를 숨깁니다.

python

# Bad — comprehension for side effects (writing to a file, printing, calling an API)
[print(f"Processing user {user['name']}") for user in users]   # don't do this
[requests.post("/api/notify", json=user) for user in users]    # definitely don't do this

# Good — plain for loop makes the intent obvious
for user in users:
    print(f"Processing user {user['name']}")

for user in users:
    requests.post("/api/notify", json=user)

python

# Bad — comprehension with logic complex enough to need comments or multiple steps
result = [
    user["name"].strip().lower()
    if user.get("active") and user.get("email_verified")
    else user["name"].strip().lower() + " (unverified)"
    for user in users
    if user.get("role") in ("admin", "editor") and user.get("last_login") is not None
]

# Good — break it out when the logic is this involved
result = []
for user in users:
    if user.get("role") not in ("admin", "editor"):
        continue
    if user.get("last_login") is None:
        continue
    name = user["name"].strip().lower()
    if not (user.get("active") and user.get("email_verified")):
        name += " (unverified)"
    result.append(name)

컴프리헨션을 사용하세요 — 명확한 표현식과 선택적 필터를 사용하여 기존 이터러블에서 새 리스트를 만들 때.
for 루프를 사용하세요 — 부작용을 실행할 때 — I/O, 네트워크 호출, 출력, 외부 상태 변경.
for 루프를 사용하세요 — 변환 로직이 이해하기 위해 여러 줄, 중간 변수, 또는 주석이 필요할 때.
제너레이터 표현식을 사용하세요 — 한 번만 반복하거나 sum(), any(), max()에 직접 전달할 때 — 메모리에 리스트가 필요하지 않을 때.

성능 참고

리스트 컴프리헨션은 CPython에서 동등한 for + append 루프보다 의미 있게 빠릅니다. 이유는 바이트코드에 있습니다: 컴프리헨션은 모든 반복에서 list.append의 속성 조회를 피하는 전용 LIST_APPEND 옵코드로 컴파일됩니다. Python 성능 팁 위키에서 이것을 다루며, 리스트 크기에 따라 순수 Python 작업 부하의 경우 일반적으로 10~40% 차이가 납니다.

python

import timeit

data = list(range(100_000))

# for + append
def with_loop():
    result = []
    for x in data:
        result.append(x * 2)
    return result

# list comprehension
def with_comprehension():
    return [x * 2 for x in data]

# generator expression — no list built at all
def with_generator():
    return sum(x * 2 for x in data)

# Typical results on CPython 3.12:
# with_loop():          ~7.2 ms
# with_comprehension(): ~4.8 ms  (~33% faster)
# with_generator():     ~4.1 ms  (and uses O(1) memory vs O(n))

구체화된 리스트가 필요하지 않다면 — 결과를 sum(), any(), max()에 전달하거나 한 번 반복하는 경우 — 제너레이터 표현식을 대신 사용하세요. 입력 크기에 관계없이 상수 메모리를 사용하며, 이는 타이트한 루프에서 대용량 CSV 내보내기나 JSON 페이로드를 처리할 때 중요합니다.

마무리

리스트 컴프리헨션은 약 일주일 동안 어색하다가 없이는 살 수 없게 되는 Python 기능 중 하나입니다. 정신적 모델은 간단합니다: 출력 표현식, 루프 변수, 이터러블, 선택적 필터. 그것을 고수하면 읽기 쉽고 관용적인 Python을 작성하게 됩니다. 로직이 복잡해지면, 그것이 일반 루프로 돌아가야 한다는 신호입니다 — 실패가 아니라 단지 적합한 도구를 사용하는 것입니다.

Python에서 JSON 데이터를 다루고 있다면 — API 응답을 값 목록으로 변환하고, 레코드에서 필드를 추출하고, 조회 딕셔너리를 만드는 것 — 이 사이트의 도구들이 방금 배운 것과 잘 어울립니다. JSON 포맷터를 사용하여 컴프리헨션으로 처리하기 전에 JSON 페이로드를 검사하고 보기 좋게 출력하거나, CSV 포맷터를 사용하여 행 목록으로 파싱하기 전에 CSV 데이터를 유효성 검사하세요. 전체 언어 참조는 리스트 컴프리헨션에 관한 Python 문서와 PEP 202(원래 제안)를 읽어볼 가치가 있습니다.

← All Python articles Browse all categories →