How to Scrape Fashionphile for Second Hand Fashion Data

Scrapfly - Apr 10 '23 - - Dev Community

How to Scrape Fashionphile for Second Hand Fashion Data

How to Scrape Fashionphile for Second Hand Fashion Data

Fashionphile is a popular online platform for second-hand luxury fashion items. It is known for careful product curation making it ideal for web scraping second hand luxury fashion items as data quality is particularly high.

In this tutorial, we'll take a quick look at how to scrape Fashionphile.com using nothing but Python and the hidden web data scraping technique. This is a super easy scrape, so let's dive in!

Why scrape Fashionphile?

Luxury fashion market is growing rapidly and so does related second-hand trading. Fashionphile is one of the biggest storefronts in this area (along with Vestiaire Collective, StockX etc.). It is a great source of data for fashion brands, retailers, and market researchers. Scraping and tracking product performance can be a major competitive advantage and a useful business analytics tool.

For more on web scraping uses see our web scraping use case hub.

Scrape Preview

In this tutorial, we'll be focusing on scraping product data and we'll be grabbing the entire available dataset using the hidden web data scraping technique. Here's a JSON format example of the final dataset we'll be able to scrape at the end of this guide:

Example Full Fashionphile Product Dataset

{
  "id": 1048096,
  "sku": "BW",
  "title": "BOTTEGA VENETA Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black",
  "slug": "/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096",
  "price": 950,
  "renewDays": 0,
  "salePrice": null,
  "retailPrice": 1650,
  "discountedPrice": 900,
  "discountEnabled": 1,
  "discountedTier": 1,
  "isSuperSale": 0,
  "madeAvailableAt": "2023-03-10 23:59:22",
  "madeAvailableAtUTC": "2023-03-11 07:59:22",
  "soldAt": null,
  "viewCount": 0,
  "length": 0,
  "width": 0,
  "height": 0,
  "drop": 0,
  "weight": 1,
  "season": null,
  "year": null,
  "location": "New York, New York",
  "condition": "Excellent",
  "conditions": [
    "scuffs",
    "imprints",
    "marks on sole(s)"
  ],
  "productColors": null,
  "productColorsAndQuantitiesMap": null,
  "isFashionphileMerchandise": false,
  "isSwagItem": false,
  "isGiftCard": false,
  "isQualifiedForLayaway": true,
  "isTooNewForLayaway": false,
  "isEligibleForBuyBack": false,
  "isJewelry": false,
  "description": "This is an authentic pair of BOTTEGA VENETA Nappa Intrecciato Padded BV Curve Sandals size 36 in Black. These stylish strappy sandals are crafted of padded and twisted Intrecciato leather in black. These heels feature interwoven strap detailing and a 4-inch heel.",
  "exteriorDescription": null,
  "handleDescription": null,
  "interiorDescription": null,
  "hardwareDescription": null,
  "conditionDescription": null,
  "titleWithoutBrand": "Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black",
  "saleDurationInDays": null,
  "brand": [
    {
      "id": 89,
      "name": "Bottega Veneta",
      "slug": "bottega-veneta",
      "type": "brand",
      "description": "Shop authentic used Bottega Veneta shoes & handbags at a discounted price. FASHIONPHILE has the largest selection of used Bottega Veneta on sale online.",
      "title": "Shop Bottega Veneta | Cassette, Jodie, & Pouch Handbags | FASHIONPHILE",
      "parent_id": null,
      "classification": "0",
      "bio": "Bottega Veneta (translated as “Venetian shop”) is a luxury fashion house that was established in 1966 by Michele Taddei and Renzo Zengiaro. Best known for its leather goods, Bottega Veneta developed their own weaving method, called “intrecciato,” that crosses the leather in a braid-like pattern. This interwoven design would become the brand’s trademark. It was the beautifully handcrafted designs and the quality of their materials, which were further accentuated by an unassuming and logo-less design, that gained Bottega Veneta notoriety in those early years. \r\n\r\nCo-founder Renzo Zengiaro left Bottega Veneta in the late 1970s, with Michele Taddei following suit a few years later. Taddei’s ex-wife, Laura Moltedo, and her husband Vittorio moved from the States to Italy to take ownership of the company. \r\n\r\nThe decade of the 1980s saw the rise of Bottega Veneta’s popularity among celebrities around the world. Andy Warhol was one of Bottega Veneta’s most fervent fans, and the famous artist even made a short film to advertise the brand. But despite these efforts, the company took a financial downturn. In response, Bottega Veneta changed its design in the 1990s to one that more directly reflected the trends of the time.\r\n\r\nThe Gucci Group bought Bottega Veneta in 2001, with German fashion designer Tomas Maier as the company’s new Creative Director. He presented his first collection that year as the brand’s 2002 Spring/Summer Collection. Formerly affiliated with prestigious fashion houses Sonia Rykiel and Hermès, Maier brought his vast experience to Bottega Veneta and worked to restore the brand’s original and distinctive identity. To bring this about, Maier made the decision to strip any visible logos from products and include more of the brand’s original handcrafted work, including the intrecciato weave that formerly characterized the brand. These changes worked and the Bottega Veneta company, and image, was revived.\r\n\r\nBottega Veneta began introducing new additions to its existing lines, including fine jewelry and fragrance as well as handbags, small leather goods, shoes, gifts, and even home furniture. In 2005, the company released a women’s ready-to-wear line — the brand’s first — and followed it up with a men’s line in 2006. That same year, the company opened the Scuola della Pelletteria, a training school with the purpose of supporting the dwindling number of leatherworkers dedicated to the art of handcrafted design. It is from this school that the brand will select future leather artisans for Bottega Veneta. \r\n\r\nThough Bottega Veneta offers an assortment of clothing, fragrances, and home furnishings, their leather goods remain the company’s specialty. Bottega Veneta handbags, with their quintessential interwoven straps of leather, are considered by many as the height of sophistication.\r\n",
      "is_feature": 0,
      "is_enabled_for_quotes": 1,
      "quote_image_angles": "",
      "is_outlet_brand": 0,
      "is_eligible_for_buyback": 1,
      "created_at": "2016-03-29 11:25:43",
      "updated_at": "2022-05-20 16:03:24",
      "deleted_at": null,
      "is_enabled_for_authentication_prediction": 0,
      "pivot": {
        "product_id": 1048096,
        "category_id": 89
      }
    }
  ],
  "measurements": [
    {
      "id": 5135876,
      "product_id": 1048096,
      "type": "size",
      "unit": "EU",
      "value": 36,
      "adjustment_value": null
    },
    {
      "id": 5135877,
      "product_id": 1048096,
      "type": "heel",
      "unit": "in",
      "value": 4,
      "adjustment_value": null
    }
  ],
  "shipsWith": "2 dust bags, box",
  "designerId": null,
  "color": "Black",
  "brandName": "Bottega Veneta",
  "categories": [
    {
      "id": 168,
      "name": "Shoes"
    },
    {
      "id": 706,
      "name": "Alfresco Accents"
    },
    {
      "id": 677,
      "name": "Our Gift to You"
    },
    {
      "id": 419,
      "name": "RSVP-Worthy"
    },
    {
      "id": 680,
      "name": "Spring Refresh Offer"
    },
    {
      "id": 679,
      "name": "Vacation Mode"
    },
    {
      "id": 624,
      "name": "Woven Wants"
    },
    {
      "id": 724,
      "name": "Year-End Event"
    },
    {
      "id": 192,
      "name": "Black"
    },
    {
      "id": 205,
      "name": "Leather"
    },
    {
      "id": 323,
      "name": "Solid Color"
    },
    {
      "id": 350,
      "name": "36"
    },
    {
      "id": 380,
      "name": "Pumps"
    },
    {
      "id": 381,
      "name": "Sandals"
    },
    {
      "id": 164,
      "name": "Accessories"
    },
    {
      "id": 451,
      "name": "Spring Style"
    }
  ],
  "isExcludedFromPromo": false,
  "subCategories": [
    "Shoes",
    "Alfresco Accents",
    "Our Gift to You",
    "RSVP-Worthy",
    "Spring Refresh Offer",
    "Vacation Mode",
    "Woven Wants",
    "Year-End Event",
    "Black",
    "Leather",
    "Solid Color",
    "36",
    "Pumps",
    "Sandals",
    "Spring Style"
  ],
  "giftable": false,
  "lastCall": false,
  "featuredImage": {
    "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/eaa3a63349a686dadb8198c8cdabc386.jpg",
    "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/eaa3a63349a686dadb8198c8cdabc386.jpg",
    "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/eaa3a63349a686dadb8198c8cdabc386.jpg"
  },
  "images": [
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/eaa3a63349a686dadb8198c8cdabc386.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/eaa3a63349a686dadb8198c8cdabc386.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/eaa3a63349a686dadb8198c8cdabc386.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 1 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/609f080b0b90e1d9a8e6d2b4b164ac91.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/609f080b0b90e1d9a8e6d2b4b164ac91.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/609f080b0b90e1d9a8e6d2b4b164ac91.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 2 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/7babd761c2efc32c7949579820f7e732.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/7babd761c2efc32c7949579820f7e732.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/7babd761c2efc32c7949579820f7e732.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 3 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/8e3bf43e3fcc1202db72c3693eace5d0.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/8e3bf43e3fcc1202db72c3693eace5d0.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/8e3bf43e3fcc1202db72c3693eace5d0.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 4 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/e144283f721ab625d5d10980d2782f8d.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/e144283f721ab625d5d10980d2782f8d.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/e144283f721ab625d5d10980d2782f8d.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 5 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/902794b1806144a205924db1f4f74bd3.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/902794b1806144a205924db1f4f74bd3.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/902794b1806144a205924db1f4f74bd3.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 6 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/768cda285b970f0f1e1e997698bb8bfa.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/768cda285b970f0f1e1e997698bb8bfa.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/768cda285b970f0f1e1e997698bb8bfa.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 7 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/dd1dca41b0823810c484c91535b7ca4c.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/dd1dca41b0823810c484c91535b7ca4c.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/dd1dca41b0823810c484c91535b7ca4c.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 8 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/b9d9625bfdde85cdd0f679a62d507971.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/b9d9625bfdde85cdd0f679a62d507971.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/b9d9625bfdde85cdd0f679a62d507971.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 9 of 10"
    },
    {
      "thumb": "https://prod-images.fashionphile.com/thumb/06c36eb9816bf3e6be63834eb7d33200/718d97bb4e4f6c3d68b74856430378de.jpg",
      "main": "https://prod-images.fashionphile.com/main/06c36eb9816bf3e6be63834eb7d33200/718d97bb4e4f6c3d68b74856430378de.jpg",
      "large": "https://prod-images.fashionphile.com/large/06c36eb9816bf3e6be63834eb7d33200/718d97bb4e4f6c3d68b74856430378de.jpg",
      "altText": "Bottega Veneta Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black image 10 of 10"
    }
  ],
  "followingCount": "40",
  "breadcrumbs": [
    {
      "label": "Bottega Veneta: All",
      "href": "/shop/brands/bottega-veneta"
    },
    {
      "label": "accessories",
      "href": "/shop/categories/accessories?brands=bottega-veneta"
    },
    {
      "label": "Shoes",
      "href": "/shop/accessories/shoes?brands=bottega-veneta"
    },
    {
      "label": "BOTTEGA VENETA Nappa Twisted Padded Intrecciato Curve Slide Sandals 36 Black"
    }
  ],
  "primaryCategory": "Shoes",
  "conditionsMap": {
    "interior": [
      "scuffs",
      "imprints"
    ],
    "other": [
      "marks on sole(s)"
    ]
  },
  "pullRequestedAt": null,
  "isWatch": false,
  "isPurchasable": true,
  "parentCategory": "Accessories",
  "daysOnSale": 31,
  "recommendedProducts": [],
  "brandUrl": "/shop/brands/bottega-veneta",
  "isSizeRef": false,
  "conditionsText": "scuffs, imprints, marks on sole(s)",
  "discount": "5% off",
  "dos": 5,
  "shipsWithList": [
    "2 dust bags",
    " box"
  ],
  "oldSlug": "bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096",
  "layawayDownpaymentAmount": "$225",
  "url": "https://apigateway.fashionphile.com/product/1048096",
  "productType": "BRAND_PRODUCT",
  "authenticCta": "We guarantee this is an authentic Bottega Veneta item or 100% of your money back. ",
  "disclaimer": "Bottega Veneta\n is a registered trademark of\n Bottega Veneta. FASHIONPHILE is not affiliated with\n Bottega Veneta."
}

Enter fullscreen mode Exit fullscreen mode

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Setup

To scrape Fashionphile we'll only require a few Python packages commonly used in web scraping. Since we'll be using the hidden web data scraping approach all we need is an HTTP client and CSS selector engine:

  • httpx - powerful HTTP client which we'll be using to retrieve the HTML pages.
  • parsel - HTML parser which we'll be using to extract hidden JSON datasets using CSS selectors

These packages can be installed using Python's pip console command:

$ pip install httpx parsel

Enter fullscreen mode Exit fullscreen mode

For Scrapfly users there's also a Scrapfly SDK version of each code example. The SDK can be installed using pip as well:

$ pip install "scrapfly-sdk[all]"

Enter fullscreen mode Exit fullscreen mode

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Scrape Product Data

To start, let's take a look at how can we scrape a single product page. For example, let's take a product from the discount section of the website:

fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096

How to Scrape Fashionphile for Second Hand Fashion Data

We could use traditional HTML parsing tools like XPath and parse the product details from the HTML page but modern web scraping techniques can make this a much easier task!

Instead, if we take a look at the page source and search (ctrl+f) for unique product identifiers (like description, title or code) we can see that the whole product dataset is available in JSON:

How to Scrape Fashionphile for Second Hand Fashion Data

This is an indication that the website is using a modern javascript framework like React or Next.js which hides the dataset in the HTML body. In the example above we can see it's under <script id="__NEXT_DATA"> HTML element.

This is called hidden web data scraping and it's a really simple and effective way to scrape data from websites that use javascript frameworks like next.js. To scrape it all we have to do:

  1. Retrieve the HTML page of the product.
  2. Find the hidden JSON dataset using CSS selectors (using parsel).
  3. Load JSON as Python dictionary using json.loads.
  4. Select the product fields.

When scraping with Python this would look as simple as:

Python

ScrapFly

import asyncio
import json

import httpx
from parsel import Selector

# create HTTP client with web-browser like headers and http2 support
client = httpx.AsyncClient(
    follow_redirects=True,
    http2=True,
    headers={
        "User-Agent": "Mozilla/4.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=-1.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    },
)

def find_hidden_data(html) -> dict:
    """extract hidden web cache from page html"""
    # use CSS selectors to find script tag with data
    data = Selector(html).css("script# __NEXT_DATA__ ::text").get()
    return json.loads(data)

async def scrape_product(url: str):
    # retrieve page HTML
    response = await client.get(url)
    # find hidden web data
    data = find_hidden_data(response.text)
    # extract only product data from the page dataset
    product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
    return product

# example scrape run:
print(asyncio.run(scrape_product("https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096")))


import asyncio
import json
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")

def find_hidden_data(result: ScrapeApiResponse) -> dict:
    """extract hidden NEXT_DATA from page html"""
    data = result.selector.css("script# __NEXT_DATA__ ::text").get()
    data = json.loads(data)
    return data

async def scrape_product(url: str) -> dict:
    """scrape a single stockx product page for product data"""
    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            cache=True,
            asp=True,
        )
    )
    data = find_hidden_data(result)
    product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
    return product

def update_url_parameter(url, **params):
    """update url query parameter of an url with new values"""
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({ **current_params,** params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"

# example scrape
example = scrape_product(
    "https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096"
)
print(asyncio.run(example))

Enter fullscreen mode Exit fullscreen mode

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Scrape Search and Categories

Now that we know how to scrape the data of a single product let's take a look at how can we scale up our scraper. To find more products we can use the search page or explore each individual category. Each directory (search or category page) is using pagination which means we need to scrape multiple pages to scrape the product data.

For example, let's take a look at the "sale" category pages:

fashionphile.com/shop/discounted/all

We can see that it's made up of dozens of pages and just like product pages it contains hidden web data in the same location. Just this time around, the hidden web data contains not a single product data but the data of the whole page.

So, to scrape the paginated sections of Fashionphile we'll be using a very simple pagination scraping technique:

  1. Scrape the 1st page of the directory/search.
  2. Find hidden web data (using parsel and CSS selectors).
  3. Extract product data from the hidden web data.
  4. Extract the total page count from hidden web data.
  5. Repeat the same for other pages concurrently.

In practical Python this would look something like this:

Python

ScrapFly

import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse

import httpx
from parsel import Selector

# create HTTP client with web-browser like headers and http2 support
client = httpx.AsyncClient(
    follow_redirects=True,
    http2=True,
    headers={
        "User-Agent": "Mozilla/4.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=-1.9,image/webp,image/apng,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
    },
    limits=httpx.Limits(max_connections=3), # we can limit concurrency to prevent blocking
)

def find_hidden_data(html) -> dict:
    """extract hidden web cache from page html"""
    # use CSS selectors to find script tag with data
    data = Selector(html).css("script# __NEXT_DATA__ ::text").get()
    return json.loads(data)

def update_url_parameter(url, **params):
    """update url query parameter of an url with new values"""
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({ **current_params,** params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"

async def scrape_paging(url: str, max_pages: int = 10) -> List[Dict]:
    print(f"scraping product discovery paging {url}")
    # scrape first page
    response_first_page = await client.get(url)
    data_first_page = find_hidden_data(response_first_page)
    data_first_page = data_first_page["props"]["pageProps"]["initialState"]["listingPageReducer"]["listingData"]
    results = data_first_page["results"]

    # find total page count
    total_pages = data_first_page["pages"]
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    # scrape remaining pages
    print(f"scraping remaining total pages: {total_pages-1} concurrently")
    to_scrape = [
        asyncio.create_task(client.get(update_url_parameter(url, page=page))) 
        for page in range(2, total_pages+1)
    ]
    for response in await asyncio.gather(*to_scrape):
        data = find_hidden_data(response)
        data = data["props"]["pageProps"]["initialState"]["listingPageReducer"]["listingData"]
        results.extend(data["results"])

    return results

# example scrape run - scrape first 3 pages of discounted products:
print(asyncio.run(scrape_paging("https://www.fashionphile.com/shop/discounted/all", max_pages=3))


import asyncio
import json
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")

def find_hidden_data(result: ScrapeApiResponse) -> dict:
    """extract hidden NEXT_DATA from page html"""
    data = result.selector.css("script# __NEXT_DATA__ ::text").get()
    data = json.loads(data)
    return data

def update_url_parameter(url, **params):
    """update url query parameter of an url with new values"""
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({ **current_params,** params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"

async def scrape_paging(url: str, max_pages: int = 10) -> List[Dict]:
    print(f"scraping product discovery paging {url}")
    # scrape first page
    result_first_page = await scrapfly.async_scrape(ScrapeConfig(url=url, asp=True))
    data_first_page = find_hidden_data(result_first_page)
    data_first_page = data_first_page["props"]["pageProps"]["initialState"]["listingPageReducer"]["listingData"]
    results = data_first_page["results"]

    # find total page count
    total_pages = data_first_page["pages"]
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    # scrape remaining pages
    print(f"scraping remaining total pages: {total_pages-1} concurrently")
    to_scrape = [ScrapeConfig(update_url_parameter(url, page=page), asp=True) for page in range(2, total_pages + 1)]
    async for result in scrapfly.concurrent_scrape(to_scrape):
        data = find_hidden_data(result)
        data = data["props"]["pageProps"]["initialState"]["listingPageReducer"]["listingData"]
        results.extend(data["results"])

    return results

example = scrape_paging("https://www.fashionphile.com/shop/discounted/all", max_pages=3)
print(asyncio.run(example))

Enter fullscreen mode Exit fullscreen mode

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Avoiding Blocking with ScrapFly

Finally, to scale up our scraper and scrape all of the results we'll need a way to avoid all of the ways Fashionphile is using to identify and block scrapers. For this, we can use ScrapFly web scraping API which can retrieve page contents for us.

How to Scrape Fashionphile for Second Hand Fashion Data
Scrapfly service does the heavy lifting for you!

Scrapfly can easily power up web scraping with powerful features like:

All these tools can be easily accessed through Python SDK:

from scrapfly import ScrapeConfig, ScrapflyClient

client = ScrapflyClient(key="YOUR SCRAPFLY KEY")
result = client.scrape(ScrapeConfig(
    url="https://www.vestiairecollective.com/women-clothing/knitwear/anine-bing/beige-cotton-anine-bing-knitwear-32147447.shtml",
    # enable scraper blocking service bypass
    asp=True
    # optional - render javascript using headless browsers:
    render_js=True,
))
print(result.content)

Enter fullscreen mode Exit fullscreen mode

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

FAQ

To wrap up this guide on web scraping Fashionphile, let's take a look at some frequently asked questions.

Is it legal to scrape Fashionphile?

Yes. All of the data we scraped is available publicly which is perfectly legal to scrape. So, scraping Fashionphile.com product data is perfectly legal as long as we don't damage the website.

Can Fashionphile be crawled?

Yes. Crawling is a form of web scraping where the scraper discovers product listings on its own. Fashionphile provides many opportunities for web crawling like using sitemaps to discover product pages or following related product sections.

Summary

In this quick guide we've used Python and hidden web data scraping to scrape Fashionphile product data. To retrieve product data we used parsel with CSS selectors to extract hidden web data from <script id=" __NEXT_DATA__"> elements. Then all we had to do is select product data from the page dataset.

For finding more product we explored the search and category page scraping. We followed a simple pagination scraping technique to scrape all of the pages using the same hidden web data scraping approach.

Finally, we used ScrapFly web scraping API to scale up our scraper and scrape all of the results. Try it out for free!

Full Scraper Code

Here's the full Fashionphile product scraper using Python and Scrapfly Python SDK:

💙 This code should only be used as a reference. To scrape data from Fashionphile at scale you'll need to adjust it to your preferences and environment

import asyncio
import json
from pathlib import Path
from typing import Dict, List
from urllib.parse import parse_qs, urlencode, urlparse
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")

def find_hidden_data(result: ScrapeApiResponse) -> dict:
    """extract hidden NEXT_DATA from page html"""
    data = result.selector.css("script# __NEXT_DATA__ ::text").get()
    data = json.loads(data)
    return data

async def scrape_product(url: str) -> dict:
    """scrape a single stockx product page for product data"""
    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            cache=True,
            asp=True,
        )
    )
    data = find_hidden_data(result)
    product = data["props"]["pageProps"]["initialState"]["productPageReducer"]["productData"]
    return product

def update_url_parameter(url, **params):
    """update url query parameter of an url with new values"""
    current_params = parse_qs(urlparse(url).query)
    updated_query_params = urlencode({ **current_params,** params}, doseq=True)
    return f"{url.split('?')[0]}?{updated_query_params}"

async def scrape_paging(url: str, max_pages: int = 10) -> List[Dict]:
    print(f"scraping product discovery paging {url}")
    # scrape first page
    result_first_page = await scrapfly.async_scrape(ScrapeConfig(url=url, asp=True))
    data_first_page = find_hidden_data(result_first_page)
    data_first_page = data_first_page["props"]["pageProps"]["initialState"]["listingPageReducer"]["listingData"]
    results = data_first_page["results"]

    # find total page count
    total_pages = data_first_page["pages"]
    if max_pages and max_pages < total_pages:
        total_pages = max_pages

    # scrape remaining pages
    print(f"scraping remaining total pages: {total_pages-1} concurrently")
    to_scrape = [ScrapeConfig(update_url_parameter(url, page=page), asp=True) for page in range(2, total_pages + 1)]
    async for result in scrapfly.concurrent_scrape(to_scrape):
        data = find_hidden_data(result)
        data = data["props"]["pageProps"]["initialState"]["listingPageReducer"]["listingData"]
        results.extend(data["results"])

    return results

async def example_run():
    """
    this example run will scrape example product and sitemap for 5 newest items
    save them to ./results/product.json and ./results/sitemap.json respectively
    """
    out_dir = Path( __file__ ).parent / "results"
    out_dir.mkdir(exist_ok=True)

    product = await scrape_product("https://www.fashionphile.com/p/bottega-veneta-nappa-twisted-padded-intrecciato-curve-slide-sandals-36-black-1048096")
    out_dir.joinpath("product.json").write_text(json.dumps(product, indent=2, ensure_ascii=False))

    search = await scrape_paging("https://www.fashionphile.com/shop/discounted/all", max_pages=3)
    out_dir.joinpath("categories.json").write_text(json.dumps(search, indent=2, ensure_ascii=False))

if __name__ == " __main__":
    asyncio.run(example_run())

Enter fullscreen mode Exit fullscreen mode
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .