R-Kurs · Kapitel 13 · Grafische Verteilungsanalysen in R

Grafische Verteilungsanalysen in R

Dichtehistogramm

Damit sich Daten mit einer theoretischen Dichte vergleichen lassen, zeichnet man das Histogramm flächennormiert (freq = FALSE):

Rfreq = FALSE: y-Achse = Dichte, Gesamtfläche = 1.

set.seed(1)
x <- rnorm(200, mean = 10, sd = 2)
hist(x, freq = FALSE, main = "Dichtehistogramm", xlab = "x")

Modellparameter schätzen

Die Parameter schätzt man aus der Stichprobe (Momentenmethode): Normal $\hat\mu=\bar x,\ \hat\sigma=s$ ; Exponential $\hat\lambda=1/\bar x$ .

mean(x); sd(x)         # Normal: mu, sigma schätzen
1 / mean(x)            # Exponential: rate schätzen

Ausgabe

# mean(x) ~ 10, sd(x) ~ 2 (Schätzer nahe den wahren Werten)
# 1/mean(x) ~ 0.1

Daten und Modell vergleichen

Mit curve(..., add = TRUE) legt man die angepasste Dichte über das Histogramm — passt die Kurve gut, ist das Modell plausibel:

RDeckt sich die rote Kurve mit den Balken, passt die Normalverteilung.

hist(x, freq = FALSE, main = "Daten vs. Modell")
curve(dnorm(x, mean(x), sd(x)), add = TRUE, col = "red", lwd = 2)

Mini-Aufgabe

Zeichne für werte ein Dichtehistogramm und lege die angepasste Normalverteilungs-Dichte (blau) darüber.

💡 Tipp

Erst hist(..., freq = FALSE), dann curve(dnorm(x, mean, sd), add = TRUE).

Lösung zeigen

werte <- c(2.1, 2.4, 2.4, 2.6, 2.8, 3.0, 3.1, 3.3, 3.5, 3.9)
hist(werte, freq = FALSE)
curve(dnorm(x, mean(werte), sd(werte)),
    add = TRUE, col = "blue", lwd = 2)

Merke: Nur bei freq = FALSE sind Histogramm und Dichte auf derselben Skala — sonst „verschwindet” die Kurve am unteren Rand.

Abruf-Quiz

Frage 1 / 2

Welches Argument macht aus hist() ein Dichtehistogramm (Fläche 1)?

x <- c(3, 1, 4, 1, 5) # Vektor anlegen y <- rep(c(0, 1), times = c(26, 9)) # 26x 0, 9x 1 s <- seq(0, 2*pi, length.out = 100) # gleichmäßige Folge 1:10 # Ganzzahlfolge length(x); sort(x); rev(x) # Länge, sortieren, umkehren x[x > 2] # logische Indizierung

barplot(table(x), main = "Titel", ylab = "h(a)") # Säulen pie(table(x), labels = c("nein", "ja")) # Kreis hist(x) # Histogramm (Auto-Klassen) hist(x, breaks = seq(50, 110, by = 5), # eigene Klassen col = heat.colors(12), xlab = "kg") plot(s, sin(2*s), type = "l", col = "blue", lwd = 2) # Linie lines(s, cos(s)) # weitere Linie ergänzen boxplot(x) # Box-Plot

mean(x) # arithmetisches Mittel median(x) # Median quantile(x, c(.25, .5, .75)) # Quartile quantile(x, 0.9) # 90%-Quantil # Modus: Ausprägung mit größter Häufigkeit names(which.max(table(x)))

var(x) # Stichprobenvarianz (/ (n-1)) sd(x) # Standardabweichung IQR(x) # Interquartilsabstand range(x); diff(range(x)) # Min/Max bzw. Spannweite sd(x) / mean(x) # Variationskoeffizient

dbinom(k, size = n, prob = p) # Binomial P(X = k) pbinom(k, n, p) # P(X <= k) dpois(k, lambda); ppois(k, lambda) dnorm(x, mean, sd); pnorm(x, mean, sd) # Normal qnorm(0.975) # Quantil (z-Wert) rnorm(100, mean = 0, sd = 1) # Zufallszahlen

t.test(x, mu = 0) # t-Test / KI für mu t.test(x, conf.level = 0.95)$conf.int # Konfidenzintervall prop.test(k, n) # Anteilstest chisq.test(table(a, b)) # Unabhängigkeitstest