Python 统计分析

Python统计分析是使用Python编程语言对数据进行描述性统计、推断性统计及可视化分析的过程。它是Python数据处理的核心组成部分，广泛应用于科学研究、商业分析、金融建模等领域。Python通过强大的库（如`pandas`、`numpy`、`scipy`和`statsmodels`）提供了高效的统计工具。

核心概念[编辑 | 编辑源代码]

描述性统计[编辑 | 编辑源代码]

描述性统计通过概括性指标（如均值、中位数、标准差）描述数据的基本特征。以下是一个使用`pandas`的示例：

  
import pandas as pd  

data = {'Age': [25, 30, 35, 40, 45], 'Income': [50000, 60000, 70000, 80000, 90000]}  
df = pd.DataFrame(data)  

# 计算描述性统计  
print(df.describe())

输出：

  
             Age        Income  
count   5.000000      5.000000  
mean   35.000000  70000.000000  
std     7.905694  15811.388301  
min    25.000000  50000.000000  
25%    30.000000  60000.000000  
50%    35.000000  70000.000000  
75%    40.000000  80000.000000  
max    45.000000  90000.000000

推断性统计[编辑 | 编辑源代码]

推断性统计通过假设检验（如t检验、卡方检验）从样本推断总体特征。以下是一个独立样本t检验的示例：

  
from scipy import stats  

group1 = [20, 22, 19, 18, 21]  
group2 = [25, 27, 24, 23, 26]  

t_stat, p_value = stats.ttest_ind(group1, group2)  
print(f"t统计量: {t_stat}, p值: {p_value}")

输出：

  
t统计量: -6.708203932499369, p值: 0.0001505848259367857

实际案例[编辑 | 编辑源代码]

销售数据分析[编辑 | 编辑源代码]

假设某电商平台需分析月度销售额的分布与趋势：

  
import matplotlib.pyplot as plt  

sales = [120, 150, 90, 200, 180, 210]  
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']  

plt.bar(months, sales)  
plt.title('Monthly Sales Analysis')  
plt.xlabel('Month')  
plt.ylabel('Sales (in thousands)')  
plt.show()

高级主题[编辑 | 编辑源代码]

线性回归[编辑 | 编辑源代码]

使用`statsmodels`进行线性回归分析：

  
import statsmodels.api as sm  

X = [1, 2, 3, 4, 5]  
y = [2, 4, 5, 4, 6]  
X = sm.add_constant(X)  # 添加截距项  

model = sm.OLS(y, X).fit()  
print(model.summary())

输出摘要：

  
                            OLS Regression Results                              
==============================================================================  
Dep. Variable:                      y   R-squared:                       0.800  
Model:                            OLS   Adj. R-squared:                  0.733  
Method:                 Least Squares   F-statistic:                     12.00  
Date:                ... (省略部分输出)

概率分布[编辑 | 编辑源代码]

常见分布（如正态分布）的概率密度函数： $f (x) = \frac{1}{σ \sqrt{2 π}} e^{- \frac{(x - μ)^{2}}{2 σ^{2}}}$

总结[编辑 | 编辑源代码]

Python统计分析结合了数据处理与数学建模，适合从基础描述到复杂推断的全流程分析。通过案例与代码实践，读者可快速掌握核心工具链。