追踪 AppsFlyer ID 变化与再归因分析指南 #

在移动应用营销中，准确追踪用户的归因变化对于评估营销效果和优化用户获取策略至关重要。本指南将详细介绍如何处理 AppsFlyer 原始数据，识别 AF ID 变化，并进行深入的再归因分析。

概述 #

AppsFlyer ID（AF ID）是 AppsFlyer 为每个应用安装生成的唯一标识符。当用户卸载并重新安装应用时，或者发生再归因时，可能会产生新的 AF ID。理解和追踪这些变化对于：

营销效果评估 - 准确衡量不同渠道的用户质量
用户生命周期分析 - 了解用户的完整行为路径
归因模型优化 - 选择最适合业务的归因策略
反欺诈检测 - 识别异常的安装和归因模式

数据获取 #

原始数据获取方式 #

在开始分析前，需要从 AppsFlyer 获取完整的原始数据。AppsFlyer 提供多种数据获取方式：

1. 数据导出功能 #

在 AppsFlyer 后台选择"数据导出"（Export Data）
选择需要的报告类型：
- 安装报告（Install Report）
- 应用内事件报告（In-App Events Report）
- 重装报告（Reinstall Report）
设置时间范围和筛选条件
导出为 CSV 格式文件

2. Pull API #

使用 AppsFlyer 的 Pull API 获取原始数据
支持程序化数据获取和自动化处理
示例 API 端点：https://hq.appsflyer.com/export/{app_id}/{report_type}/v5
需要在请求中包含 API 令牌和相关参数

3. Data Locker #

AppsFlyer 的 Data Locker 服务将数据存储在 AWS S3 桶中
支持自动数据传输到您的存储系统
提供更大规模的数据访问和更高的灵活性
适合大型企业和高频数据处理需求

关键数据字段识别 #

处理原始数据时，需要重点关注以下字段来追踪 AF ID 变化和再归因情况：

基本标识字段 #

字段名	描述	重要性
`appsflyer_id`	AppsFlyer 生成的唯一标识符	⭐⭐⭐
`device_id`	设备标识符（IDFA、GAID、IDFV 等）	⭐⭐⭐
`customer_user_id`	应用设置的用户 ID	⭐⭐

归因相关字段 #

字段名	描述	重要性
`media_source`	媒体来源	⭐⭐⭐
`campaign`	广告系列名称	⭐⭐
`conversion_type`	转化类型（install、re-engagement、re-attribution）	⭐⭐⭐
`install_time`	安装时间（重装时为重装时间）	⭐⭐⭐
`event_time`	事件发生时间	⭐⭐⭐
`event_name`	事件名称（install、reinstall 等）	⭐⭐
`is_retargeting`	是否为再营销活动	⭐⭐
`re_attribution_counter`	再归因计数器	⭐⭐

高级归因字段 #

字段名	描述	重要性
`primary_source`	主要归因来源（再营销渠道）	⭐⭐
`secondary_source`	次要归因来源（首次安装渠道）	⭐⭐
`attributed_touch_time`	归因触点时间	⭐⭐
`attributed_touch_type`	归因触点类型（click/impression）	⭐

数据处理与分析 #

数据准备与清洗 #

首先进行数据的加载和基础清洗：

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# 加载数据
installs_df = pd.read_csv('appsflyer_installs.csv')
reinstalls_df = pd.read_csv('appsflyer_reinstalls.csv')
events_df = pd.read_csv('appsflyer_in_app_events.csv')

def clean_data(df):
    """
    数据清洗函数
    """
    # 转换时间戳
    time_columns = ['install_time', 'event_time', 'attributed_touch_time']
    for col in time_columns:
        if col in df.columns:
            df[col] = pd.to_datetime(df[col])

    # 处理缺失值
    if 'media_source' in df.columns:
        df['media_source'].fillna('organic', inplace=True)

    return df

# 应用数据清洗
installs_df = clean_data(installs_df)
reinstalls_df = clean_data(reinstalls_df)
events_df = clean_data(events_df)

AF ID 变化识别 #

识别同一设备上的多个 AF ID，这是追踪用户行为变化的关键步骤：

# 合并安装和重装数据以追踪 AF ID 变化
all_installs = pd.concat([
    installs_df[['appsflyer_id', 'device_id', 'media_source', 'install_time', 'conversion_type']],
    reinstalls_df[['appsflyer_id', 'device_id', 'media_source', 'install_time', 'conversion_type']]
])

# 按设备 ID 和时间排序
all_installs = all_installs.sort_values(['device_id', 'install_time'])

# 识别同一设备上的多个 AF ID
device_multiple_afids = all_installs.groupby('device_id').filter(
    lambda x: len(x['appsflyer_id'].unique()) > 1
)

# 创建 AF ID 变化记录
afid_changes = []
for device_id, group in device_multiple_afids.groupby('device_id'):
    records = group.sort_values('install_time').to_dict('records')
    for i in range(1, len(records)):
        time_diff = (records[i]['install_time'] - records[i-1]['install_time']).total_seconds() / 86400

        afid_changes.append({
            'device_id': device_id,
            'old_afid': records[i-1]['appsflyer_id'],
            'new_afid': records[i]['appsflyer_id'],
            'old_install_time': records[i-1]['install_time'],
            'new_install_time': records[i]['install_time'],
            'time_difference': time_diff,  # 天数
            'old_source': records[i-1]['media_source'],
            'new_source': records[i]['media_source'],
            'is_reattribution': records[i]['conversion_type'] == 're-attribution'
        })

afid_changes_df = pd.DataFrame(afid_changes)

再归因模式分析 #

深入分析再归因的时间模式和渠道分布：

# 筛选再归因记录
reattribution_analysis = afid_changes_df[afid_changes_df['is_reattribution'] == True]

# 计算再归因时间窗口分布
reattribution_analysis['time_window_bucket'] = pd.cut(
    reattribution_analysis['time_difference'],
    bins=[0, 7, 30, 60, 90, float('inf')],
    labels=['0-7天', '8-30天', '31-60天', '61-90天', '90天以上']
)

# 再归因渠道分析
reattribution_by_source = reattribution_analysis.groupby(
    ['old_source', 'new_source']
).size().reset_index(name='count')

用户行为路径追踪 #

构建完整的用户行为时间线，了解用户在不同 AF ID 下的活动模式：

# 选择一个示例设备进行用户路径分析
sample_device = afid_changes_df['device_id'].iloc[0]

# 获取该设备的所有安装和事件
device_installs = all_installs[all_installs['device_id'] == sample_device]
device_events = events_df[events_df['device_id'] == sample_device]

# 合并安装和事件数据，按时间排序
device_timeline = pd.concat([
    device_installs[['appsflyer_id', 'install_time', 'media_source', 'conversion_type']].rename(
        columns={'install_time': 'time', 'conversion_type': 'type'}
    ),
    device_events[['appsflyer_id', 'event_time', 'event_name', 'media_source']].rename(
        columns={'event_time': 'time', 'event_name': 'type'}
    )
])

device_timeline = device_timeline.sort_values('time')

提示：device_timeline 现在包含了用户的完整行为路径，包括安装、重装和应用内事件，可以用于深入分析用户的行为模式变化。

事件归因关联 #

为每个应用内事件关联正确的安装来源，这对于准确的收入归因至关重要：

def associate_events_with_installs(events_df, all_installs):
    """
    将应用内事件与对应的安装来源关联
    """
    # 创建 AF ID 到安装信息的映射
    afid_install_map = all_installs.sort_values('install_time').drop_duplicates(
        'appsflyer_id', keep='first'
    )
    afid_install_map = afid_install_map.set_index('appsflyer_id')

    # 为每个事件添加安装信息
    events_with_install = events_df.copy()
    events_with_install['install_source'] = events_with_install['appsflyer_id'].map(
        afid_install_map['media_source']
    )
    events_with_install['install_time'] = events_with_install['appsflyer_id'].map(
        afid_install_map['install_time']
    )

    return events_with_install

events_with_source = associate_events_with_installs(events_df, all_installs)

重装后事件分析 #

识别和分析重装后发生的事件，这对于理解再营销活动的效果很重要：

def identify_post_reinstall_events(events_df, reinstalls_df):
    """
    识别重装后的事件
    """
    # 获取所有重装记录
    reinstalls = reinstalls_df[['appsflyer_id', 'install_time']].rename(
        columns={'install_time': 'reinstall_time'}
    )

    # 将事件与重装关联
    events_with_reinstall = pd.merge(
        events_df,
        reinstalls,
        on='appsflyer_id',
        how='left'
    )

    # 标记重装后的事件
    events_with_reinstall['is_post_reinstall'] = (
        events_with_reinstall['reinstall_time'].notna() &
        (events_with_reinstall['event_time'] > events_with_reinstall['reinstall_time'])
    )

    return events_with_reinstall

post_reinstall_events = identify_post_reinstall_events(events_with_source, reinstalls_df)

高级分析场景 #

再归因效果分析 #

深入分析再归因用户的行为表现和留存情况：

def calculate_retention_for_reattributed_users(events_df, reattribution_df):
    """
    计算再归因用户的留存率和活跃度
    """
    # 获取再归因用户的 AF ID
    reattributed_afids = reattribution_df['new_afid'].unique()

    # 筛选这些用户的事件
    reattributed_events = events_df[events_df['appsflyer_id'].isin(reattributed_afids)]

    # 计算每个用户的活跃度指标
    user_activity = reattributed_events.groupby('appsflyer_id').agg({
        'event_time': ['min', 'max', 'count']
    })

    user_activity.columns = ['first_event', 'last_event', 'event_count']

    # 计算活跃天数
    user_activity['active_days'] = (
        user_activity['last_event'] - user_activity['first_event']
    ).dt.days + 1

    return user_activity

reattributed_user_activity = calculate_retention_for_reattributed_users(
    events_df,
    afid_changes_df[afid_changes_df['is_reattribution']]
)

归因模型对比 #

比较不同归因模型对收入分配的影响：

def compare_attribution_models(events_df, afid_changes_df):
    """
    比较"当前归因"和"首次安装归因"两种模型
    """
    # 获取所有发生过 AF ID 变化的设备
    changed_devices = afid_changes_df['device_id'].unique()
    device_events = events_df[events_df['device_id'].isin(changed_devices)]

    # 模型1：当前归因模型（使用事件发生时的 AF ID）
    model1_attribution = device_events.copy()

    def find_active_afid(row, afid_changes):
        """找到事件发生时有效的 AF ID"""
        device = row['device_id']
        event_time = row['event_time']

        device_changes = afid_changes[afid_changes['device_id'] == device]
        valid_changes = device_changes[device_changes['new_install_time'] <= event_time]

        if valid_changes.empty:
            return row['appsflyer_id']  # 使用原始 AF ID
        else:
            latest_change = valid_changes.sort_values('new_install_time').iloc[-1]
            return latest_change['new_afid']

    model1_attribution['attributed_afid'] = model1_attribution.apply(
        lambda row: find_active_afid(row, afid_changes_df), axis=1
    )

    # 模型2：首次安装归因模型
    model2_attribution = device_events.copy()

    # 为每个设备找到首次安装的 AF ID
    first_installs = all_installs.sort_values('install_time').drop_duplicates(
        'device_id', keep='first'
    )
    first_afid_map = dict(zip(first_installs['device_id'], first_installs['appsflyer_id']))

    model2_attribution['attributed_afid'] = model2_attribution['device_id'].map(first_afid_map)

    return model1_attribution, model2_attribution

model1, model2 = compare_attribution_models(events_df, afid_changes_df)

用户生命周期价值分析 #

分析用户在多次安装过程中的总价值贡献：

def analyze_user_lifetime_value(events_df, afid_changes_df):
    """
    分析用户跨多个 AF ID 的生命周期价值
    """
    changed_devices = afid_changes_df['device_id'].unique()

    # 筛选购买事件
    purchase_events = events_df[
        (events_df['device_id'].isin(changed_devices)) &
        (events_df['event_name'] == 'af_purchase')
    ]

    # 按设备和 AF ID 分组计算收入
    revenue_by_afid = purchase_events.groupby(['device_id', 'appsflyer_id']).agg({
        'event_revenue': 'sum',
        'event_time': ['min', 'max', 'count']
    })

    revenue_by_afid.columns = ['total_revenue', 'first_purchase', 'last_purchase', 'purchase_count']

    # 计算每个设备的总收入（跨所有 AF ID）
    total_device_revenue = purchase_events.groupby('device_id').agg({
        'event_revenue': 'sum',
        'event_time': ['min', 'max', 'count']
    })

    total_device_revenue.columns = ['total_revenue', 'first_purchase', 'last_purchase', 'purchase_count']

    return revenue_by_afid, total_device_revenue

afid_revenue, device_revenue = analyze_user_lifetime_value(events_df, afid_changes_df)

数据可视化与报告 #

趋势分析 #

可视化 AF ID 变化的时间趋势：

import matplotlib.pyplot as plt
import seaborn as sns

# 按月分析 AF ID 变化趋势
afid_changes_df['month'] = afid_changes_df['new_install_time'].dt.to_period('M')
monthly_changes = afid_changes_df.groupby('month').size()

plt.figure(figsize=(12, 6))
monthly_changes.plot(kind='bar')
plt.title('AF ID 变化月度趋势')
plt.xlabel('月份')
plt.ylabel('AF ID 变化数量')
plt.tight_layout()
plt.savefig('afid_changes_trend.png')

漏斗分析 #

创建再归因用户的行为漏斗：

def create_reattribution_funnel(events_df, reattribution_df):
    """
    创建再归因用户的行为漏斗分析
    """
    reattributed_users = reattribution_df['device_id'].unique()
    user_events = events_df[events_df['device_id'].isin(reattributed_users)]

    # 定义关键事件
    key_events = ['reinstall', 'af_login', 'af_purchase']

    # 计算每个事件的用户数
    funnel_data = []
    for event in key_events:
        users_with_event = user_events[user_events['event_name'] == event]['device_id'].nunique()
        funnel_data.append({
            'event': event,
            'users': users_with_event,
            'percentage': users_with_event / len(reattributed_users) * 100
        })

    funnel_df = pd.DataFrame(funnel_data)

    # 绘制漏斗图
    plt.figure(figsize=(10, 6))
    sns.barplot(x='event', y='users', data=funnel_df)
    plt.title('再归因用户行为漏斗')
    plt.xlabel('事件类型')
    plt.ylabel('用户数量')

    # 添加百分比标签
    for i, row in enumerate(funnel_df.itertuples()):
        plt.text(i, row.users + 5, f'{row.percentage:.1f}%', ha='center')

    plt.tight_layout()
    plt.savefig('reattribution_funnel.png')

    return funnel_df

reattribution_funnel = create_reattribution_funnel(
    events_df,
    afid_changes_df[afid_changes_df['is_reattribution']]
)

综合报告生成 #

生成包含关键指标的综合分析报告：

def generate_comprehensive_report(afid_changes_df, events_df):
    """
    生成 AF ID 变化和再归因的综合分析报告
    """
    report = {
        'total_afid_changes': len(afid_changes_df),
        'unique_devices_with_changes': afid_changes_df['device_id'].nunique(),
        'reattribution_count': afid_changes_df['is_reattribution'].sum(),
        'reattribution_percentage': afid_changes_df['is_reattribution'].mean() * 100,
        'avg_time_between_changes': afid_changes_df['time_difference'].mean(),
        'median_time_between_changes': afid_changes_df['time_difference'].median(),
        'top_reattribution_sources': afid_changes_df[
            afid_changes_df['is_reattribution']
        ]['new_source'].value_counts().head(5).to_dict(),
        'time_window_distribution': afid_changes_df['time_window_bucket'].value_counts().to_dict()
    }

    # 添加收入分析（如果有收入数据）
    if 'event_revenue' in events_df.columns:
        reattributed_afids = afid_changes_df[
            afid_changes_df['is_reattribution']
        ]['new_afid'].unique()

        reattributed_revenue = events_df[
            (events_df['appsflyer_id'].isin(reattributed_afids)) &
            (events_df['event_name'] == 'af_purchase')
        ]['event_revenue'].sum()

        report['reattributed_users_revenue'] = reattributed_revenue
        report['avg_revenue_per_reattributed_user'] = (
            reattributed_revenue / len(reattributed_afids) if len(reattributed_afids) > 0 else 0
        )

    return report

comprehensive_report = generate_comprehensive_report(afid_changes_df, events_df)

实际应用建议 #

数据集成与自动化 #

建立高效的数据处理流程是成功分析的基础：

自动化数据获取 #

Data Locker 集成：使用 AppsFlyer 的 Data Locker 服务，设置自动数据传输到您的数据仓库
API 调度：创建定时任务，通过 Pull API 定期获取最新数据
增量更新：实现增量数据更新机制，避免重复处理历史数据

监控与警报 #

def setup_monitoring_alerts(afid_changes_df):
    """
    设置 AF ID 变化监控警报
    """
    # 计算基准指标
    baseline_daily_changes = afid_changes_df.groupby(
        afid_changes_df['new_install_time'].dt.date
    ).size().mean()

    baseline_reattribution_rate = afid_changes_df['is_reattribution'].mean()

    # 设置阈值
    change_threshold = baseline_daily_changes * 2  # 变化数量超过基准2倍
    reattribution_threshold = baseline_reattribution_rate * 1.5  # 再归因率超过基准1.5倍

    return {
        'daily_change_threshold': change_threshold,
        'reattribution_rate_threshold': reattribution_threshold
    }

业务应用场景 #

营销优化 #

应用场景	分析方法	业务价值
渠道质量评估	比较不同渠道的再归因用户留存率	识别高质量再营销渠道
预算分配	分析再归因用户的 LTV	优化再营销预算分配
创意优化	追踪创意素材与再归因效果的关系	提升广告创意效果

用户生命周期管理 #

流失预警：识别即将卸载的用户模式
回流策略：针对不同流失时长的用户制定差异化回流策略
个性化推荐：基于用户历史行为优化应用内体验

归因模型选择 #

重要提示：选择合适的归因模型对收入分配和渠道评估至关重要。

模型对比框架：

def evaluate_attribution_models(model1_data, model2_data, actual_revenue):
    """
    评估不同归因模型的准确性
    """
    models_comparison = {
        'current_attribution': {
            'total_attributed_revenue': model1_data['attributed_revenue'].sum(),
            'accuracy_score': calculate_accuracy(model1_data, actual_revenue)
        },
        'first_install_attribution': {
            'total_attributed_revenue': model2_data['attributed_revenue'].sum(),
            'accuracy_score': calculate_accuracy(model2_data, actual_revenue)
        }
    }

    return models_comparison

隐私合规考虑 #

iOS 14.5+ 适配 #

随着 iOS 14.5 引入的 ATT（App Tracking Transparency）框架，需要特别处理：

IDFA 可用性下降：更多依赖 IDFV 和其他标识符
归因窗口调整：考虑 SKAdNetwork 的 24-48 小时归因窗口
概率性匹配：结合 AppsFlyer 的概率性匹配技术

数据合规框架 #

法规	主要要求	实施建议
GDPR	用户同意、数据最小化	实施数据保留策略，定期清理过期数据
CCPA	用户知情权、删除权	建立用户数据删除机制
LGPD	数据处理透明度	记录数据处理活动日志

def implement_data_retention_policy(events_df, retention_days=730):
    """
    实施数据保留策略
    """
    cutoff_date = datetime.now() - timedelta(days=retention_days)

    # 删除超过保留期的数据
    filtered_events = events_df[events_df['event_time'] >= cutoff_date]

    return filtered_events

总结与最佳实践 #

核心要点 #

通过系统化处理 AppsFlyer 原始数据，我们可以：

全面追踪用户旅程 - 从首次安装到多次重装的完整行为路径
精确评估营销效果 - 区分自然增长和付费获取的真实贡献
优化归因策略 - 选择最适合业务的归因模型
提升用户价值 - 通过深入分析提高用户留存和 LTV

实施路线图 #

第一阶段：基础建设（1-2周） #

建立数据获取流程
实现基础的 AF ID 变化识别
创建基本的监控仪表板

第二阶段：深度分析（2-3周） #

实施高级分析场景
建立归因模型对比框架
开发自动化报告系统

第三阶段：业务应用（持续） #

集成到营销决策流程
优化用户获取策略
持续监控和改进

关键成功因素 #

数据质量是分析成功的基础

完整性：确保获取所有必要的数据字段
准确性：建立数据验证和清洗机制
时效性：实现近实时的数据更新
一致性：统一数据格式和定义标准

未来发展方向 #

随着移动应用生态的不断演进，AF ID 变化分析也需要持续优化：

机器学习集成：使用 ML 模型预测用户行为和流失风险
实时分析：构建实时数据处理和分析能力
跨平台整合：整合多个归因平台的数据进行综合分析
隐私增强技术：采用差分隐私等技术保护用户隐私

通过深入的 AF ID 变化和再归因分析，我们不仅能够更好地理解用户行为，还能为营销策略优化提供数据驱动的洞察，最终实现业务增长和用户价值的双重提升。